Friday, February 5, 2010

Beware of Hibernate's read/write cache startegy with memcached

Recently I've been working on improving the performance of an application which involves massive data processing. The application is a JavaEE application using Hibernate for persistence and Memcached as its 2nd level cache. Almost all of the entities are cached to reduce the load on the database.
My immediate goal was improving performance without radically changing the system architecture (I'm well aware of better technologies to use for such an application).

While profiling the application I noticed that while a bunch of new threads start processing data they get in blocking state one after the other and remain like this for 30-60 seconds.
Looking at the their stacks I immediately saw the their are all blocking on Hibernate's ReadWriteCache put/get methods.


Apparently most of the entities where cached with a read/write strategy.
A read/write cache should prevent two thread from updating the same cache element concurrently or updating and reading concurrently - so it makes sense to see locks. But it turns out Hibernate uses method level synchronization which also prevent two threads for reading the same cache element concurrently.
Now, when using a local cache this issue is probably less noticed, but when using a distributed caching solution such as Memcached, cache access time is longer and so more thread are waiting for each other.
The cache access time is even longer when you ask for an entity which is not in the cache, then you have to wait for the cache to say the entity is not there, get it from the database and put it into the cache. For whole this time the thread keep the monitor preventing other thread from working with the cache.
A better way to handle this, would have been using java.util.concurrent.locks.ReentrantReadWriteLock which enables more fine grained locking (read lock for the get method and write lock for the put method).

Another issue is cache regions. Hibernate creates a ReadWriteCache instance per region, if not regions are defined than only a single instance of ReadWriteCache is used which makes the synchronization even a bigger problem.

The solution for this issue was switching to a nonstrict read/write strategy wherever possible and creating a cache region per entity. This reduced the locking effect dramatically.


Alex Miller said...

We ran into this with the Terracotta Hibernate cache provider (which is distributed) too. In the first rev of the Terracotta product we used bytecode manipulation to rewrite the concurrency strategies for greater concurrency. In the new "Darwin" version, Terracotta is providing custom cache concurrency strategy classes for use with 3.2.x (since it's a pluggable part of Hibernate) as well as a 3.3.x cache provider that can take care of it in the provider.

andrew said...

So I have been reading about hibernate-memcahced, spymemcached, etc. All of these solutions to make Hibernate work with memcached. The catch is you need to defined all of your queries and entities as cacheable, and yada yada yada. In your opinion isnt there a simpler way? Cant we ditch hibernate all together and use a different ORM framework that lends itself better to memcached? The php memcached examples are simple. Cant we expand upon that simplicity in java?

Anirban said...

A nice explanation,thanks!