data, hibernate, java, programming

Achieving good performance when updating collections attached to a Hibernate object

Have you ever found that you have a Hibernate @OneToMany or @ElementCollection performs poorly when you’re modifying collection obejcts?
In this case, the intuitive way to implement in Java gives poor performance, but is easily fixed.

You have a database of an number of items holding collections of other items, for example, a database of airline schedules holding a list of flights.
Your database model would have a 2 tables, where a child row references a parent to build a one-to-many relationship, like so


You could represent these with two Java Beans in Hibernate like this

classs AirlineSchedule {

private Integer airlineID;

@OneToMany(fetch = FetchType.LAZY, mappedBy = "schedule", cascade = CascadeType.ALL, orphanRemoval = true)
private Collection<Flight> flights;

// Airline Schedule details...

and an item which extends an embeddable ID, like so

@ Embeddable
class FlightID{

private AirlineSchedule schedule;

private Integer flightID;


class Flight{

private FlightID flightID;

//Flight details and an implementation of equals based on the ID only!...


Your schedule might have 5000 flights, but when you modify the schedule on any given day, only change 10 or 20 flights might change. But, following best practices, you use a RESTful API, and PUT a new schedule each time, something like this

public AirlineSchedule updateSchedule(String airlineID, AirlineSchedule newSchedule){
    return jpaRepository.saveAndFlush(newSchedule);

When you do this, Hibernate takes minutes to respond. If you look at the SQL it diligently updates every row in the database in thousands of individual SQL statements. Why does it do that?

The answer is in Hibernate’s PersistentCollection. Even though they implement the Collection interface, Hibernate Collections aren’t the same as Java Collections. They are not backed by a storage array, but by a database. When you replaced the persisted airline object with a new one, or if you set the whole collection of flights in an existing airline object, Hibernate can’t figure out what changed. So it blindly replaces all of the flight child objects of the parent, even though the values are the same.

It can, however, track changes that you make to the Persistent Collection. So if you tell Hibernate what you’re changing by adding and remove from the existing collection, it’s smart enough to write only the objects you changed back to the database.

If we update the schedule like this (using Apache’s CollectionUtils)

public AirlineSchedule updateSchedule(String airlineID, AirlineSchedule newSchedule){
 AirlineSchedule schedule = jpaRepository.getByID(airlineID);

 //Change other properties of the schedule

 List<Flight> toRemove = CollectionUtils.subtract(schedule.getFlights(), newSchedule.getFlights());

 List<Flight> toAdd = CollectionUtils.subtract(newSchedule.getFlights(), schedule.getFlights());

 return jpaRepository.saveAndFlush(schedule);

suddenly, 5000 updates will be replaced with just 10 or 20, and your minutes of updates will become seconds.


Full details on StackOverflow