Friday, February 19, 2010

Migrating a Spring/Hibernate application to MongoDB - Part 1

Backgorund

For the past few years ORM have been the de facto solution for bridging the gap between object oriented programming languages and relational databases.Well, most of the developers using ORM care about writing less persistence code and SQL than they care about the object-relational impedance mismatch. As time passed and more experience gained, some people started to claim that maybe ORM is not the best solution available.
Another option for storing your objects, which have been there for quite some time, are non-SQL databases. With recent explosion of non-relational databases and the NoSQL movement ("Not Only SQL") this option is becoming more and more viable.

There are a lot of examples showing how to develop a new application based on a non-relational database. But what if you already have an application using a relational database + ORM  and you want to migrate it to a non-relational database?

In the next few posts I will try to suggest a migration path for a Spring/Hibernate (JPA) application to MongoDB.
MongoDB is a scalable, high-performance, open source, schema-free, document-oriented database and is one of the interesting non-relational databases available today (together with Cassandra, HBase, Redis and others).

The application

The example application is a very simple blogging engine implemented using the Spring/Hibernate (JPA) stack.
The two main entities are Blogger and BlogPost. There are data access objects (DAO) with matching interfaces for both entities.

Setting up and connecting to MongoDB

Setting up MongoDB is a pretty simple procedure. The MongoDB quickstart and getting started pages contains all the required details.
In order to connect to MongoDB we will need to use the Mongo Java driver :

<dependency>
    <groupId>org.mongodb</groupId>
    <artifactId>mongo-java-driver</artifactId>
    <version>1.2.1</version>
<dependency>


Next step is adding a new Spring service that will provide MongoDB connections. This is a basic implementation which uses the default configuration.

3    import com.mongodb.DB;
4    import com.mongodb.Mongo;
5    import java.net.UnknownHostException;
6    
7    public class MongoService {
8        private final Mongo mongo;
9        private final DB db;
10   
11       public MongoService(final String dbName) throws UnknownHostException {
12           mongo = new Mongo(); // MongoDB server (localhost:27017) 
13           db = mongo.getDB(dbName); // Connect to database
14       }
15   
16       public Mongo getMongo() {
17           return mongo;
18       }
19   
20       public DB getDb() {
21           return db;
22       }
23   }


The Mongo class is responsible for the database connection and contains a connection pool. The default pool size has 10 connections per host. You can configure the pool size by using the MONGO.POOLSIZE system property or by passing a MongoOptions parameter to the Mongo constructor.
The DB class represents a logical database on the MongoDB server. We will use a database names "blog" for the blogging application.

<bean id="mongo" class="my.demo.blog.services.MongoService">
        <constructor-arg index="0" value="blog"/>
    </bean>

Entities and Documents

MongoDB stores data in collections of BSON documents. Documents may contain any number of fields of any length and type. Usually you should store documents of the same structure within collections. MongoDB collections are essentially named groupings of documents.
The Mongo Java driver provides a DBObject interface to save custom objects to the database.
The DBOject is very similar to a Map with String keys. You can put/get document element by their String key and get a list of all available keys.
In order to save our entities in MongoDB we will create an adapter which implements the DBObject interface.

public class DbObjectAdapter implements DBObject {
    private final BeanUtilsBean beanUtils;
    private final Object entity;
    private final Set<String> keySet;

    public DbObjectAdapter(Object entity) {
        if (entity == null) {
            throw new IllegalArgumentException("Entity must not be null");
        }
        if (!entity.getClass().isAnnotationPresent(Entity.class)) {
            throw new IllegalArgumentException("Entity class must have annotation javax.persistence.Entity present");
        }
        this.entity = entity;
        this.beanUtils = new BeanUtilsBean();
        this.keySet = new HashSet<String>();
        initKeySet();
    }

    @Override
    public Object put(String name, Object value) {
        try {
            beanUtils.setProperty(entity, name, value);
        } catch (Exception e) {
            return null;
        }
        return value;
    }
    
    @Override
    public Object get(String name) {
        try {
            return beanUtils.getProperty(entity, name);
        } catch (Exception e) {
            return null;
        }
    }

In order to decide which members of the entity we would like to store in MongoDB, we create and new annotation - @MongoElement - and annotate the selected getter methods.

    @MongoElement
    public String getDisplayName() {
        return displayName;
    }

The DbObjectAdapter creates the document key set by looking for the annotated methods.

    @Override
    public Set<String> keySet() {
        return keySet;
    }

    private void initKeySet() {
        final PropertyDescriptor[] descriptors = beanUtils.getPropertyUtils().getPropertyDescriptors(entity);
        for (PropertyDescriptor desc : descriptors) {
            final Method readMethod = desc.getReadMethod();
            MongoElement annotation;
            if ((annotation = readMethod.getAnnotation(MongoElement.class)) != null) {
                keySet.add(desc.getName());
            }
        }
    }

After having the DbObjectAdapter we can create a base DAO class for storing entities in MongoDB

public abstract class BaseMongoDao<S> implements BaseDao<S> {
    private MongoService mongo;
    private DBCollection collection;

    @Override
    public S find(Object id) {
        final DBObject dbObject = collection.findOne(new ObjectId((String) id));
        final DbObjectAdapter adapter = new DbObjectAdapter(getEntityClass());
        adapter.putAll(dbObject);
        return (S) adapter.getEntity();
    }

    @Override
    public void save(S entity) {
        collection.save(new DbObjectAdapter(entity));
    }

    @Autowired
    public void setMongo(MongoService mongo) {
        this.mongo = mongo;
        // use the entity class name as the collection name
        this.collection = mongo.getDb().getCollection(getEntityClass().getSimpleName());
    }

    /**
     * 
     * @return the entity class this DAO handles
     */
    public abstract Class getEntityClass();
}

Notice that there is no need to create the collections, the database creates it automatically on the first insert.

Next part

So far we've seen how to setup MongoDB and how to store our entities in it.
In the next parts of this series we will discuss the following migration topics:
  • Identifiers
  • Relations
  • Queries
  • Data migration

Wednesday, February 17, 2010

Why you should look at the exceptions tab when profiling

When profiling an application I always like to take a look at the Exceptions tab (I use Yourkit Java profiler). Frequent exceptions may show that something is going wrong and you don't know about it since someone preferred to swallow the exception and hope for the best.
Today, while trying to figure out a performance issue related to Classloader synchronization in Weblogic I noticed that IndexOutOfBoundsException is frequently thrown by the business layer of the application.

 

The code clearly speaks for itself:

   public Object getObject1() {
        try{
            return (Object )getObjectList().get(0);
        }
        catch(IndexOutOfBoundsException e){
            return null;
        }
    }

  public Object  getObject2() {
        try{
            return (Object )getObjectList().get(1);
        }
        catch(IndexOutOfBoundsException e){
            return null;
        }
    }

* Method and class names where altered  

Sunday, February 7, 2010

JBoss, Java6, InstanceNotFoundException and Yourkit profiler

Today I spent a few hours trying to realize why all of a sudden JBoss starts with annoying InstanceNotFoundException messages.
There where two changes done from the previous working configuration:
  • Switched to Java 6.0 (found out I was using 5.0 by mistake)
  • A new MBean was added
The problem did you reoccur when switching back to Java 5.0, but it did not help me much.
Cursing the entire worlds and blaming the guy who wrote the new MBean did not solve the issue so I started checking other things. It turned out that when removing the Yourkit profiler agent from the JBoss start script prevented this issue.
Digging into the Yourkit startup options I've found out that the profiler light-weight telemetry may clash with some JavaEE application servers MBeans implementation. The Yourkit J2EE integration wizard (which I did not use) adds a startup option which starts the telemetry with a delay - "delay=10000".
Adding the delay option solved my problems !

Friday, February 5, 2010

Beware of Hibernate's read/write cache startegy with memcached

Recently I've been working on improving the performance of an application which involves massive data processing. The application is a JavaEE application using Hibernate for persistence and Memcached as its 2nd level cache. Almost all of the entities are cached to reduce the load on the database.
My immediate goal was improving performance without radically changing the system architecture (I'm well aware of better technologies to use for such an application).

While profiling the application I noticed that while a bunch of new threads start processing data they get in blocking state one after the other and remain like this for 30-60 seconds.
Looking at the their stacks I immediately saw the their are all blocking on Hibernate's ReadWriteCache put/get methods.


 

Apparently most of the entities where cached with a read/write strategy.
A read/write cache should prevent two thread from updating the same cache element concurrently or updating and reading concurrently - so it makes sense to see locks. But it turns out Hibernate uses method level synchronization which also prevent two threads for reading the same cache element concurrently.
Now, when using a local cache this issue is probably less noticed, but when using a distributed caching solution such as Memcached, cache access time is longer and so more thread are waiting for each other.
The cache access time is even longer when you ask for an entity which is not in the cache, then you have to wait for the cache to say the entity is not there, get it from the database and put it into the cache. For whole this time the thread keep the monitor preventing other thread from working with the cache.
A better way to handle this, would have been using java.util.concurrent.locks.ReentrantReadWriteLock which enables more fine grained locking (read lock for the get method and write lock for the put method).

Another issue is cache regions. Hibernate creates a ReadWriteCache instance per region, if not regions are defined than only a single instance of ReadWriteCache is used which makes the synchronization even a bigger problem.

The solution for this issue was switching to a nonstrict read/write strategy wherever possible and creating a cache region per entity. This reduced the locking effect dramatically.