data, hibernate, java, programming

Achieving good performance when updating collections attached to a Hibernate object

Have you ever found that you have a Hibernate @OneToMany or @ElementCollection performs poorly when you’re modifying collection obejcts?
In this case, the intuitive way to implement in Java gives poor performance, but is easily fixed.

You have a database of an number of items holding collections of other items, for example, a database of airline schedules holding a list of flights.
Your database model would have a 2 tables, where a child row references a parent to build a one-to-many relationship, like so


You could represent these with two Java Beans in Hibernate like this

classs AirlineSchedule {

private Integer airlineID;

@OneToMany(fetch = FetchType.LAZY, mappedBy = "schedule", cascade = CascadeType.ALL, orphanRemoval = true)
private Collection<Flight> flights;

// Airline Schedule details...

and an item which extends an embeddable ID, like so

@ Embeddable
class FlightID{

private AirlineSchedule schedule;

private Integer flightID;


class Flight{

private FlightID flightID;

//Flight details and an implementation of equals based on the ID only!...


Your schedule might have 5000 flights, but when you modify the schedule on any given day, only change 10 or 20 flights might change. But, following best practices, you use a RESTful API, and PUT a new schedule each time, something like this

public AirlineSchedule updateSchedule(String airlineID, AirlineSchedule newSchedule){
    return jpaRepository.saveAndFlush(newSchedule);

When you do this, Hibernate takes minutes to respond. If you look at the SQL it diligently updates every row in the database in thousands of individual SQL statements. Why does it do that?

The answer is in Hibernate’s PersistentCollection. Even though they implement the Collection interface, Hibernate Collections aren’t the same as Java Collections. They are not backed by a storage array, but by a database. When you replaced the persisted airline object with a new one, or if you set the whole collection of flights in an existing airline object, Hibernate can’t figure out what changed. So it blindly replaces all of the flight child objects of the parent, even though the values are the same.

It can, however, track changes that you make to the Persistent Collection. So if you tell Hibernate what you’re changing by adding and remove from the existing collection, it’s smart enough to write only the objects you changed back to the database.

If we update the schedule like this (using Apache’s CollectionUtils)

public AirlineSchedule updateSchedule(String airlineID, AirlineSchedule newSchedule){
 AirlineSchedule schedule = jpaRepository.getByID(airlineID);

 //Change other properties of the schedule

 List<Flight> toRemove = CollectionUtils.subtract(schedule.getFlights(), newSchedule.getFlights());

 List<Flight> toAdd = CollectionUtils.subtract(newSchedule.getFlights(), schedule.getFlights());

 return jpaRepository.saveAndFlush(schedule);

suddenly, 5000 updates will be replaced with just 10 or 20, and your minutes of updates will become seconds.


Full details on StackOverflow

data, general, programming

Who wants to write a 200 page report that no-one will ever read?

I was at the Data Without Borders DC Data Dive a couple of weeks back, definitely a trip out of my usual worlds of Java and travel IT and visit into something a bit different.

A couple of quotes stuck in my head from that weekend – the first was from one of the participating charities, DC Action for Kids whose representative, in their initial presentation, said that one of their major motivators for taking part in the event was (and I paraphrase)

we wanted to produce something more interesting with a two-hundred page report that no-one will ever read.

One of Saturday’s presenters picked this up and ran with it, showing how they believed that we were in the process of trying to find a new methods of deep data presentation at events like the DC Datadive. While this may be somewhat of an exaggeration, DC Action for Kids achieved their aim, check out the beautiful app that came from their datadive team.

It made me realise what a powerful difference you can make with such an app – reading a 200 page report is usually a pretty tiresome process, but exploring the data set in a visual form is not only a joy, but allows the app writer to guide the end users to the same conclusion from the data that they would like them to draw, while at the same time allowing the user to feel that they are verifying this conclusion themselves. In today’s media landscape where discerning readers place full trust in few, if any, publications, this is a pretty powerful tool! While I believe that app was coded by hand in the usual mix of CSS and HTML held together by Javascript, there’s surely a gap in the market for a powerpoint-style application to generate visual data exploration mini-apps.

The second observation that struck me over the weekend was again a quote that went something along the lines of

In the coming 2 decades, a statistician will be the trendy job to have, in the same way that it software engineer has been the trendy job for the previous two decades

Again, while I feel this is somewhat of an exaggeration (and as a software engineer, I’ll confess to bias), the event certainly outlined to me the power of inference from data… and the importance of keeping your data and keeping your data clean. Single examples from data are rarely sufficient to indicate an extrapolation and a zero value field is not the same as an empty one… they may sound like simple observations but as programmers it’s not unusual for us to treat these single examples as general trends and zero and null as being equivalent. But they’re not, and it’s really easy to mess up your conclusions as a result of these and many other simple but potentially incorrect assumptions. The weekend served as a reminder of these simple values to me – remembering these at coding time can make the difference between a piece of code that lasts 5 years instead of suffering death by maintenance in the first year of its life.

Finally, in addition to my languages-of-the-future predictions of javascript and erlang, I think I’ll have to add R. It’s such a powerful and easy language, I was really blown away by it