There are many optimizations for data fetching in Hibernate.
One of the not so common method used is batch prefetching.
By default Hibernate utilizes proxy objects as placeholders
for associations and collections.
For collections Hibernate uses its own collection wrapper
implementations which acts as smart collections but the entities loaded in the
collections are by default also proxies.
These proxy objects just have their id set and they only load
their all properties by send the query to database only on property access
other than their id.
Let’s say we have a Department entity and Employee entity
with one to many relations. Each Department has many employees and for sake of
simplicity each employee belongs to only one department
We now load all departments like this
List allDeps= session.createQuery("from Department").list();
Imagine if we have 10 departments, then this will result in
a list of size 10. Further assume that in each of the department we have 10
employees each and we need to access the salary of each employee in this list.
for (Department dep : allDeps){ printSalaryForEachEmployees(dep); }
printSalaryForEachEmployees(Department dep){ for (Employee emp : dep.getEmployees()){ System.out.println(emp.getSalary); }
The print method will iterate through each of the employee
in each department and call the getSalary() method. This call will initialize
the Employee proxy and each call will send a SQL select to database.
All in all, this one use-case will send 1+ (10 x 10) =101 selects to database which is horrible number.
This is also worst case for infamous problem of n+1 selects. In our use
case it becomes (n x n+1).
There are many ways to optimize this.
Today we are looking into one of the methods given by
Hibernate called batch prefetching.
It works like this.
Hibernate can prefetch the employee by initializing its
proxies beforehand. This is how it is mentioned in the configuration.
...
This tells Hibernate that if it is using proxies for
Employees( which by default Hibernate does), then on the the first
initialization of single proxy, automatically initialize upto 10 proxies even before their property access. If there are more
than 10 proxies then on the access of 11th proxy preload another 10
proxies until the there are no proxies left.
Understandably this kind of optimization is referred to as
blind-guess optimization by Hibernate as you don’t know beforehand how many
proxies are there.
...
So for our use case since we have 10 departments in the
list, the moment we initialize collection of employees for one of the department
object it now initializes 10 more employees’ collections of 10 Departments, all
by using a single select something this:
select e.* from Employee e where e.DEPARTMENT_ID in (?, ?, ?,?,?,?,?,?,?,?)