data, hibernate, java, programming

Achieving good performance when updating collections attached to a Hibernate object

Have you ever found that you have a Hibernate @OneToMany or @ElementCollection performs poorly when you’re modifying collection obejcts?
In this case, the intuitive way to implement in Java gives poor performance, but is easily fixed.

You have a database of an number of items holding collections of other items, for example, a database of airline schedules holding a list of flights.
Your database model would have a 2 tables, where a child row references a parent to build a one-to-many relationship, like so

AIRLINE_SCHEDULE:   AIRLINE_ID, AIRLINE_NAME....
FLIGHT_SCHEDULE:  FLIGHT_ID, AIRLINE_ID, FLIGHT_DETAILS...

You could represent these with two Java Beans in Hibernate like this

classs AirlineSchedule {

@Id
@Column(name="airline_id:")
private Integer airlineID;

@OneToMany(fetch = FetchType.LAZY, mappedBy = "schedule", cascade = CascadeType.ALL, orphanRemoval = true)
private Collection<Flight> flights;

// Airline Schedule details...
}

and an item which extends an embeddable ID, like so

@ Embeddable
class FlightID{

@ManyToOne
@JoinColumn("airline_id")
private AirlineSchedule schedule;

@Column(name="flight_id")
private Integer flightID;

...
}

class Flight{

@EmbeddedID
private FlightID flightID;

//Flight details and an implementation of equals based on the ID only!...
}

 

Your schedule might have 5000 flights, but when you modify the schedule on any given day, only change 10 or 20 flights might change. But, following best practices, you use a RESTful API, and PUT a new schedule each time, something like this

@PUT
@Path(airline/{airlineID})
public AirlineSchedule updateSchedule(String airlineID, AirlineSchedule newSchedule){
    validateSchedule(newSchedule);
    newSchedule.setAirlineID(airlineID);
    return jpaRepository.saveAndFlush(newSchedule);
}

When you do this, Hibernate takes minutes to respond. If you look at the SQL it diligently updates every row in the database in thousands of individual SQL statements. Why does it do that?

The answer is in Hibernate’s PersistentCollection. Even though they implement the Collection interface, Hibernate Collections aren’t the same as Java Collections. They are not backed by a storage array, but by a database. When you replaced the persisted airline object with a new one, or if you set the whole collection of flights in an existing airline object, Hibernate can’t figure out what changed. So it blindly replaces all of the flight child objects of the parent, even though the values are the same.

It can, however, track changes that you make to the Persistent Collection. So if you tell Hibernate what you’re changing by adding and remove from the existing collection, it’s smart enough to write only the objects you changed back to the database.

If we update the schedule like this (using Apache’s CollectionUtils)

@PUT
@Path(airline/{airlineID})
public AirlineSchedule updateSchedule(String airlineID, AirlineSchedule newSchedule){
 validateSchedule(newSchedule);
 AirlineSchedule schedule = jpaRepository.getByID(airlineID);

 //Change other properties of the schedule

 List<Flight> toRemove = CollectionUtils.subtract(schedule.getFlights(), newSchedule.getFlights());
 schedule.getFlights().removeAll(toRemove);

 List<Flight> toAdd = CollectionUtils.subtract(newSchedule.getFlights(), schedule.getFlights());
 schedule.getFlights().addAll(toAdd);

 return jpaRepository.saveAndFlush(schedule);
}

suddenly, 5000 updates will be replaced with just 10 or 20, and your minutes of updates will become seconds.

Enjoy!

Full details on StackOverflow

Advertisements
java, programming, Uncategorized

Exception Antipatterns

Java has pretty good exception handling, but it’s something that a surprising number of programmers get wrong.
The point of an exception is to provide detailed information to software maintainers about the reason for a program’s failure.
The three key points here are:

  • You should only throw an exception in cases where the program fails. If you’re frequently seeing the same exception in the same place – you should question whether your program should be allowed to fail under those conditions
  • Exceptions are for software maintainers only. The exception should be logged at the the appropriate point in the code, and a friendlier message should be displayed to the user
  • Detailed information should be provided (within the limits of data security, of course). The minimum should be a stack trace locating the exact point where things went wrong, ideally also with an informative message.

Its worthwhile spending a little time when you code to produce useful exceptions, it saves time when something goes wrong with your code in production because the source of the problem can easily be found and traced.

Bearing that in mind, here’s some anti-patterns I’ve recently seen around exceptions. Lets look at why they’re bad practice:

 try{
 stuff that contacts an external system;
 }
 catch(APIException e){
 throw new BusinessException("Failure in external system");
 }

The good stuff here is that the developer is catching only the appropriate exception, he’s describing what’s wrong in his new exception, and that he’s allowing the program to fail.
The problem is that, in the process he’s just destroyed the whole stack trace that would inform us where the external system failed during the reply. Without a debugger or a detailed functional knowledge of how the external system works, there’s nothing more you can do with this exception.
Some better solutions here would be (in order of my preference):

  • Don’t catch and rethrow – maybe it’s fine to just let the APIException continue up the stack
  • Aggregate the APIException to the BusinessException… and make sure to log the whole trace when you catch the business exception
  • Log the result of the APIException now before rethrowing

Here’s another one

 BusinessResponse processAPIResponse() throws TimeoutException, ConcurrentModificationException, TechnicalException, FunctionalException{
 ...
 Response r = result of call to external system;
 int code = r.getErrorCode();
 if(code == 1234){
 throw new ConcurrentModificationException();
 }
 if(code >= 8000 && code < 9000){
 throw new TechnicalException();
 }
 if(code > 0 && code < 8000){
 throw new FunctionalException();
 }
 ...
 }

Here, the problem is that we’re throwing a bunch of checked exceptions, based on the API response, that the client may not be able to handle.
A good rule of thumb for exception handling (and it’s open to debate), as mentioned by O’Reilly
is that, Unless there’s something the client code can do to recover from this failure, it’s better to throw a runtime exception.

Finally, when’s an exception not an exception?

There’s two common approaches to avoiding throwing exceptions when you should have thrown one… and they are:

  •  if without else — Of course you code if statements all the time. But do you know what should happen when your if statement is not true?  Should the default behaviour really be nothing? (Hint: The default behavior is frequently not nothing!)

Like, for example, a UI event handler… if you just coded

if(condition){
 fireEvent();
 }
 ...

then without the condition, you risk the possibility that UI may just stay in the same place, un-updated forever.

  • Returning null — This can basically be a null pointer exception that hasn’t happened yet, in someone else’s code.

Two great ways to avoid this problem are:

  • If you’re returning null, ask does null have a meaning here (like getFirstSomething() can return null if the list of something has 0 elements), or am I just returning null because I don’t know what to do otherwise?
  • Put a javadoc return tag on every method that returns an object. Document under what conditions this method can return null.

Putting this stuff in when you code can sometimes feel like a hassle – it’s code not targeted at solving the issue you’re coding to solve… but it can pay great dividends later when your code, or some system it’s connected to, starts behaving unexpectedly!

java, programming

Never check null

OK, that’s maybe a bit of an exaggeration. I mean never check null for mandatory input.

I am thoroughly sick of code that doesn’t know what to do returning null. If you don’ t know what to do you should throw an exception. There are only 2 exceptions to this rule:

  • Converters
  • Wrappers

Here, you can return null and let the calling class figure out to do. But as soon as you have logic in you code, you should stop checking.

If you need something, it can’t be null. Simple. So it’s an exception. Get over it, get a stack trace ASAP, and we can find out what’s wrong.

Don’t cover it with some if null then null crap and pass the buck to the caller. The bug is still there, you’re only making it harder to find.

And if anyone tells you NullPointerException is wrong, don’t worry about them. They haven’t yet realised the truth that is

NullPointerException tells you what’s wrong, where and if you throw it early enough, why.

It’s infinitely prefereable to have a NullPointerException to a mysterious oh where’s my output scenario.

Oh, and I nearly forgot to mention the benefits:

  • Your code becomes shorter and more readable, because you stop checking what’s null (no more checking before your call to Collection.iterator())
  • You become forced to document that your input shouldn’t be null in your Javadoc, which helps your users and covers your ass
  • You only have to handle one type of exceptions: real exceptions, and not exceptions dressed as nulls.
java, programming

You can, but should you?

I have read on previous occasions that there are three stages to learning a new language, whether it is related to computers or human language.

The first is comprehending. You might need a book beside you, but basically during this stage you are getting to grips with the functionality of the language, learning how to translate any existing language you know into these new constructs. You can make the language work, but you are still mapping to some previous concepts you already have.

The second is understanding. You can make the language function and now you start to understand how to express yourself in the natural way of that language. You form constructs naturally and learn to appreciate good constructs, and criticise bad ones .

Finally you master a language. You begin to see the limitations of the language and start conceiving new constructs. Not only do you know many possible constructs, but you can generally pick the best one for each situation, manipulating how your language will be understood by others.

I don’t think I could reasonably claim mastery of Java, but I realised I was well on the road to it today, as I saw two complex constructs. One was an elegant and original solution that lacked finishing touches; and the second was a stupid exhibition of code machoism, certain to confuse functional users, and generally be a pain in the ass. The good example deserves a post of its own, but I can plant the bad one here 🙂

Just because you can declare a new, anonymous inner class in a method call does not mean you should. I mean, maybe there is a practical use for this, but I’ve just never found it. As far as I can see:

  • It makes your code a complete bitch to read. Classes in classes, sure. Classes in methods, maybe, if you really have to. Classes in methods calls though…
  • It’s impossible for version control. OK, maybe Mercurial or Git could cope, but Clearcase and friends haven’t a hope
  • Eclipse chokes. The correction works OK, but the search doesn’t find these classes.
  • You end up writing it again and again. Write-once code occurs less often than we imagine, maybe aside from request/response type classes. Many candidates for small classes get reused frequently, so writing them over and over again isn’t efficient.

Anyway, I spent an hour today, writing some guy’s method-call-classes because someone updated an interface they implement, eclipse didn’t find, and version control couldn’t cope.

It’s ten times nicer to put them in a lovely little class file of their own, because frankly, little class files are the best ones. (Unless you create 50,000 of them.)

java, programming

Removing Collection items using an iterator

So, I was getting a rather mysterious Java coding bug this week, which turned out to be the result of some Virtual Machine optimisation I hadn’t previously been aware of.

The code looked a bit like this:

public void deleter(GroupSet groups){
    GroupSet removalQueue = new GroupsSet();

    for(Iterator groupsIt = groups.iterator(); 
                          groupsIt.hasNext(); ){
        Group g = (Group) groupsIt.next()
        ItemSet itemsNeedingRemoval = g.doCheck();
        removalQueue.addAll(itemsNeedingRemoval);

    }
    for(Iterator removalIt = removalQueue.iterator(); 
                              removalIt.hasNext(); ){
        Group g = (Group) removalQueue.next();
        schrodingerRemove(g); //Shrodinger remove might 
                   //or might not remove the group from 
                   //the original group set
    }
}

This seems like a quite simple it of code. We take the original set, we iterate through it to find the objects within that we might need to remove, and queue them. Then, afterwards we iterate through the queue, and call the removal method that might do the removal. (We couldn’t have called iterator.remove() in the original method because we’re not sure whether the removal will occur or not, and anyway it’s not good practice.)

So, this code looks clean and straight-forward, but guess what happens when you run it?

Error: ConcurrentModificationException

Not so good. It took a while to find out why this happened, turns out the offending line is

removalQueue.addAll(itemsNeedingRemoval);

The addAll method, in Java 1.4’s implementation of HashSet at least, rather than adding the items one-at-a-time to your new collection (like you might imagine), in fact simply adds a pointer that causes the collection you’re adding to become part of the original set. So if you remove an item from the set your added, you also remove it from the set you added too. The reverse also applies. If you modify it iterate around all the objects and call add each time, the exception goes away.

In my case, I figured I didn’t really need the set of items anyway, that 1 was enough, and changed doCheck() to return just 1 item instead of a set. Solution by design is better than solution by force 😉