My immediate goal was improving performance without radically changing the system architecture (I'm well aware of better technologies to use for such an application).
While profiling the application I noticed that while a bunch of new threads start processing data they get in blocking state one after the other and remain like this for 30-60 seconds.
Looking at the their stacks I immediately saw the their are all blocking on Hibernate's ReadWriteCache put/get methods.
Apparently most of the entities where cached with a read/write strategy.
A read/write cache should prevent two thread from updating the same cache element concurrently or updating and reading concurrently - so it makes sense to see locks. But it turns out Hibernate uses method level synchronization which also prevent two threads for reading the same cache element concurrently.
Now, when using a local cache this issue is probably less noticed, but when using a distributed caching solution such as Memcached, cache access time is longer and so more thread are waiting for each other.
The cache access time is even longer when you ask for an entity which is not in the cache, then you have to wait for the cache to say the entity is not there, get it from the database and put it into the cache. For whole this time the thread keep the monitor preventing other thread from working with the cache.
A better way to handle this, would have been using java.util.concurrent.locks.ReentrantReadWriteLock which enables more fine grained locking (read lock for the get method and write lock for the put method).
Another issue is cache regions. Hibernate creates a ReadWriteCache instance per region, if not regions are defined than only a single instance of ReadWriteCache is used which makes the synchronization even a bigger problem.
The solution for this issue was switching to a nonstrict read/write strategy wherever possible and creating a cache region per entity. This reduced the locking effect dramatically.