Thursday, February 7, 2013

Hibernate fetching optimizations – Batch prefetching


There are many optimizations for data fetching in Hibernate.
One of the not so common method used is batch prefetching.
By default Hibernate utilizes proxy objects as placeholders for associations and collections.
For collections Hibernate uses its own collection wrapper implementations which acts as smart collections but the entities loaded in the collections are by default also proxies.
These proxy objects just have their id set and they only load their all properties by send the query to database only on property access other than their id.
Let’s say we have a Department entity and Employee entity with one to many relations. Each Department has many employees and for sake of simplicity each employee belongs to only one department
We now load all departments like this

List allDeps= session.createQuery("from Department").list();

Imagine if we have 10 departments, then this will result in a list of size 10. Further assume that in each of the department we have 10 employees each and we need to access the salary of each employee in this list.

for (Department dep : allDeps){

printSalaryForEachEmployees(dep);

}
printSalaryForEachEmployees(Department dep){
for (Employee emp : dep.getEmployees()){
System.out.println(emp.getSalary);
}

The print method will iterate through each of the employee in each department and call the getSalary() method. This call will initialize the Employee proxy and each call will send a SQL select to database.
All in all, this one use-case will send 1+ (10 x 10) =101 selects to database which is horrible number.

This is also worst case for infamous problem of n+1 selects. In our use case it becomes  (n x n+1).
There are many ways to optimize this.
Today we are looking into one of the methods given by Hibernate called batch prefetching.

It works like this.

Hibernate can prefetch the employee by initializing its proxies beforehand. This is how it is mentioned in the configuration.


...

This tells Hibernate that if it is using proxies for Employees( which by default Hibernate does), then on the the first initialization of single proxy, automatically initialize upto 10 proxies even before their property access. If there are more than 10 proxies then on the access of 11th proxy preload another 10 proxies until the there are no proxies left.
Understandably this kind of optimization is referred to as blind-guess optimization by Hibernate as you don’t know beforehand how many proxies are there.

This optimization can also be applied for collections:

...






So for our use case since we have 10 departments in the list, the moment we initialize collection of employees for one of the department object it now initializes 10 more employees’ collections of 10 Departments, all by using a single select something this:
select e.* from Employee e 
where e.DEPARTMENT_ID  in (?, ?, ?,?,?,?,?,?,?,?)