data, hibernate, java, programming

Achieving good performance when updating collections attached to a Hibernate object

Have you ever found that you have a Hibernate @OneToMany or @ElementCollection performs poorly when you’re modifying collection obejcts?
In this case, the intuitive way to implement in Java gives poor performance, but is easily fixed.

You have a database of an number of items holding collections of other items, for example, a database of airline schedules holding a list of flights.
Your database model would have a 2 tables, where a child row references a parent to build a one-to-many relationship, like so


You could represent these with two Java Beans in Hibernate like this

classs AirlineSchedule {

private Integer airlineID;

@OneToMany(fetch = FetchType.LAZY, mappedBy = "schedule", cascade = CascadeType.ALL, orphanRemoval = true)
private Collection<Flight> flights;

// Airline Schedule details...

and an item which extends an embeddable ID, like so

@ Embeddable
class FlightID{

private AirlineSchedule schedule;

private Integer flightID;


class Flight{

private FlightID flightID;

//Flight details and an implementation of equals based on the ID only!...


Your schedule might have 5000 flights, but when you modify the schedule on any given day, only change 10 or 20 flights might change. But, following best practices, you use a RESTful API, and PUT a new schedule each time, something like this

public AirlineSchedule updateSchedule(String airlineID, AirlineSchedule newSchedule){
    return jpaRepository.saveAndFlush(newSchedule);

When you do this, Hibernate takes minutes to respond. If you look at the SQL it diligently updates every row in the database in thousands of individual SQL statements. Why does it do that?

The answer is in Hibernate’s PersistentCollection. Even though they implement the Collection interface, Hibernate Collections aren’t the same as Java Collections. They are not backed by a storage array, but by a database. When you replaced the persisted airline object with a new one, or if you set the whole collection of flights in an existing airline object, Hibernate can’t figure out what changed. So it blindly replaces all of the flight child objects of the parent, even though the values are the same.

It can, however, track changes that you make to the Persistent Collection. So if you tell Hibernate what you’re changing by adding and remove from the existing collection, it’s smart enough to write only the objects you changed back to the database.

If we update the schedule like this (using Apache’s CollectionUtils)

public AirlineSchedule updateSchedule(String airlineID, AirlineSchedule newSchedule){
 AirlineSchedule schedule = jpaRepository.getByID(airlineID);

 //Change other properties of the schedule

 List<Flight> toRemove = CollectionUtils.subtract(schedule.getFlights(), newSchedule.getFlights());

 List<Flight> toAdd = CollectionUtils.subtract(newSchedule.getFlights(), schedule.getFlights());

 return jpaRepository.saveAndFlush(schedule);

suddenly, 5000 updates will be replaced with just 10 or 20, and your minutes of updates will become seconds.


Full details on StackOverflow

apis, programming

JSON API is a poor standard for RESTful APIs

Recently, I was asked for my opinion on JSON-API as a potential standard for RESTful APIs. While I like the idea of some standardization for RESTful JSON responses, I feel that JSON API woefully misses the mark, and here’s why.

Let’s take the simple example from JSON API’s site

  "articles" : [{
      ¨id¨: 1,
      "title": "JSON API paints my bikeshed!",
      "body": "The shortest article. Ever.",
      "created": "2015-05-22T14:56:29.000Z",
      "updated": "2015-05-22T14:56:28.000Z",
      ¨author¨ : {
          ¨id¨ : 42,
          "name": "John",
          "age": 80,
          "gender": "male"

and let’s take a look at how that could be briefly rewritten as regular JSON to achieve the same functionality

  "articles" : [{
      ¨id¨: 1,
      "title": "JSON API paints my bikeshed!",
      "body": "The shortest article. Ever.",
      "created": "2015-05-22T14:56:29.000Z",
      "updated": "2015-05-22T14:56:28.000Z",
      ¨author¨ : {
          ¨id¨ : 42,
          "name": "John",
          "age": 80,
          "gender": "male"

Now, let’s give a usage example of the JSON API response above, to print all the article titles and author names, which I would imagine is a typical use case for data that looks like this.

for(var data :{
   if(data.type == ¨articles¨){
    print data.attributes.title;
    var author_id =;
    var author_type =;
  for(var inc : response.included){
    if( == author_id && inc.type == author_type){

compared to the regular JSON, where you can print the variables simply by writing

for(var article : response.articles){
  print article.title;

JSON API is the obvious poor choice here. We can see

  • 3x the implementation cost of regular JSON (12 line vs 4 lines)
  • O(n) execution time vs O(1) execution time, with no temporary variables required
  • Increased code complexity, poor readability and higher maintenance cost vs regular JSON

Ultimately, I get the impression that JSON-API’s approach is trying to treat JSON as a message-passing format, but in doing so it misses the biggest advantage of JSON, which is that it’s an Object Notation.

The structures in JSON – numbers, booleans, strings, keys and objects – can be read natively by any modern, object-oriented programming language. A well-designed JSON response can be parsed directly into an object model by the receiving application at low cost, and then used directly by that program’s business logic. This has been well-understood by the creators of JSON-schema and Open APIs (formerly Swagger), which effectively add type-safe behavior to JSON’s constructs which are dynamic-by-default.

Thus, if you want to describe hypermedia in JSON, you should do so in context, the same way we do on the web.
Hyperlinks on the internet are identified by convention – either by being underlined or by text coloring. A similar approach is valid for JSON responses. Hypermedia is already quite identifiable in a JSON response because it’s a string starting with http:// 😉  Machines can find it by the url type in a corresponding Open API specification. The benefit of re-enforcing it with conventions like keywords (such as href, used by the HAL specification) or a prefix (like starting the field-name with an underscore, like _author,) is that it adds clarity for those reading code that handles the response model.

I think this is the type of clarify we should be aiming for designing standard for RESTful JSON APIs.

apis, programming, travel

Innovation starts with good APIs

(Written on September 21st, but only got around to publishing now)

I’ve spent much of the last two weeks, along with several of my colleagues, developing an entry for tHACK SFO 2012. I love these hack competitions – they can sometimes seem like a waste of time but they’re an important driver of solid technical skills, an opportunity to play with some new APIs or tools, and a driver of that overused buzz-word innovation.

The travel industry has an enormous variety of APIs – some are great, but most of which stink, and the APIs made available to us for tHACK really came from all ends of that spectrum. (Well, actually, to be fair, there were no real stinkers!)

One remarkable point that stuck me while coding was how much easier it is to include a great API in an application, than a poor one. We wrote our prototype as a Facebook app – I’d already used Facebook’s API before and I knew it was good, but writing the prototype was a pointed reminder of how much better it is than any in the travel industry. Accessing the features of Facebook within our app was simple, the requests and responses were easy to understand, and the results were fast and reliable.

This made the contrast between many of the competition-offered APIs brutally stark. Many travel industry APIs are intensely frustrating to use – and it was evident during the presentation of the hacks which ones were easy to use, and which ones were  trouble. There were 9 APIs provided, and 18 hacks. Of the 9 APIs provided, only 4 were used by more than one of the hacks. Of the remaining 5, 2 were used by independent competitors, 2 were used only in hacks published by the same company as published the API, and one was not used at all (1).

Here were some of the issues I encountered with those APIs – take it as a list of red flags to be noted when releasing a new API

  • Hard to connect. Soap wrappers are hard to use and rarely necessary these days – if your API is not RESTful, there better be a good reason for that.
  • Login to the API can be a tricky problem, but it should not require a persistent data store on my behalf. In particular, many APIs make a meal of OAuth. I know that OAuth’s a difficult paradigm, but with a bit of thought and understanding, a sensible implementation is possible.
  • If your API model doesn’t send or recieve in either XML or JSON, forget it. Simple tools are available to serialize/deserialize both XML and JSON APIs as constructs in virtually any programming language. Almost every other data transfer mechanism will require serialization by hand, a massive time constraint.
  • Your API should send and receive in the same data format. Sounds obvious, but it’s often ignored. If replies are in JSON, send should also be in JSON
  • Sensible data models with few surprises. A hallmark of an excellent API is rarely having to refer to the documentation. Interface objects should be sensibly named and constrained. Encoded data should use clearly named enumerations, ideally with a full-text translation, not integer codes.
  • Less chit-chat. Saving data, for example, should be a one-call operation – no excuses. For a simple flow, I should be able to send or receive all the data I need in no more than 3 API calls. More calls mean that you’re passing your orchestration and performance problems to me.
  • If a given API call is not fast, it should be ansynchonous. Give me an answer, or give me a token

Providing language SDKs is not enough! Not everyone wants to program in your programming language of choice. In any case, by pushing the hard work to the API means that clients of that API don’t have to implement those features individually, or in a variety of SDKs.

The proof was in the results. With only a couple of exceptions, the quality of the entries in tHACK SFO was unanimously high. (Check out iPlanTrip or Concierge for starters!) However, the judges gave no rewards to the users of difficult APIs, all the winning and highly commended hacks were based on the 5 easy APIs, and all of them used either Geckgo or Vayant, which struck me as being the two APIs that best combined content and ease-of-use. Those who took the time to figure out and use the more complex APIs had less time to produce high quality integrations.

Finally, a shout out to Vayant, the Amadeus Hotel Team and Rezgo whose APIs helped us produce a slick and innovate app – hopefully the people at Amadeus will let us publish tVote for you sooner rather than later.

(1) These figures come from my recollection of the event the following day. If you have official figures to confirm them, that would be nice.

java, programming, Uncategorized

Exception Antipatterns

Java has pretty good exception handling, but it’s something that a surprising number of programmers get wrong.
The point of an exception is to provide detailed information to software maintainers about the reason for a program’s failure.
The three key points here are:

  • You should only throw an exception in cases where the program fails. If you’re frequently seeing the same exception in the same place – you should question whether your program should be allowed to fail under those conditions
  • Exceptions are for software maintainers only. The exception should be logged at the the appropriate point in the code, and a friendlier message should be displayed to the user
  • Detailed information should be provided (within the limits of data security, of course). The minimum should be a stack trace locating the exact point where things went wrong, ideally also with an informative message.

Its worthwhile spending a little time when you code to produce useful exceptions, it saves time when something goes wrong with your code in production because the source of the problem can easily be found and traced.

Bearing that in mind, here’s some anti-patterns I’ve recently seen around exceptions. Lets look at why they’re bad practice:

 stuff that contacts an external system;
 catch(APIException e){
 throw new BusinessException("Failure in external system");

The good stuff here is that the developer is catching only the appropriate exception, he’s describing what’s wrong in his new exception, and that he’s allowing the program to fail.
The problem is that, in the process he’s just destroyed the whole stack trace that would inform us where the external system failed during the reply. Without a debugger or a detailed functional knowledge of how the external system works, there’s nothing more you can do with this exception.
Some better solutions here would be (in order of my preference):

  • Don’t catch and rethrow – maybe it’s fine to just let the APIException continue up the stack
  • Aggregate the APIException to the BusinessException… and make sure to log the whole trace when you catch the business exception
  • Log the result of the APIException now before rethrowing

Here’s another one

 BusinessResponse processAPIResponse() throws TimeoutException, ConcurrentModificationException, TechnicalException, FunctionalException{
 Response r = result of call to external system;
 int code = r.getErrorCode();
 if(code == 1234){
 throw new ConcurrentModificationException();
 if(code >= 8000 && code < 9000){
 throw new TechnicalException();
 if(code > 0 && code < 8000){
 throw new FunctionalException();

Here, the problem is that we’re throwing a bunch of checked exceptions, based on the API response, that the client may not be able to handle.
A good rule of thumb for exception handling (and it’s open to debate), as mentioned by O’Reilly
is that, Unless there’s something the client code can do to recover from this failure, it’s better to throw a runtime exception.

Finally, when’s an exception not an exception?

There’s two common approaches to avoiding throwing exceptions when you should have thrown one… and they are:

  •  if without else — Of course you code if statements all the time. But do you know what should happen when your if statement is not true?  Should the default behaviour really be nothing? (Hint: The default behavior is frequently not nothing!)

Like, for example, a UI event handler… if you just coded


then without the condition, you risk the possibility that UI may just stay in the same place, un-updated forever.

  • Returning null — This can basically be a null pointer exception that hasn’t happened yet, in someone else’s code.

Two great ways to avoid this problem are:

  • If you’re returning null, ask does null have a meaning here (like getFirstSomething() can return null if the list of something has 0 elements), or am I just returning null because I don’t know what to do otherwise?
  • Put a javadoc return tag on every method that returns an object. Document under what conditions this method can return null.

Putting this stuff in when you code can sometimes feel like a hassle – it’s code not targeted at solving the issue you’re coding to solve… but it can pay great dividends later when your code, or some system it’s connected to, starts behaving unexpectedly!

data, general, programming

Who wants to write a 200 page report that no-one will ever read?

I was at the Data Without Borders DC Data Dive a couple of weeks back, definitely a trip out of my usual worlds of Java and travel IT and visit into something a bit different.

A couple of quotes stuck in my head from that weekend – the first was from one of the participating charities, DC Action for Kids whose representative, in their initial presentation, said that one of their major motivators for taking part in the event was (and I paraphrase)

we wanted to produce something more interesting with a two-hundred page report that no-one will ever read.

One of Saturday’s presenters picked this up and ran with it, showing how they believed that we were in the process of trying to find a new methods of deep data presentation at events like the DC Datadive. While this may be somewhat of an exaggeration, DC Action for Kids achieved their aim, check out the beautiful app that came from their datadive team.

It made me realise what a powerful difference you can make with such an app – reading a 200 page report is usually a pretty tiresome process, but exploring the data set in a visual form is not only a joy, but allows the app writer to guide the end users to the same conclusion from the data that they would like them to draw, while at the same time allowing the user to feel that they are verifying this conclusion themselves. In today’s media landscape where discerning readers place full trust in few, if any, publications, this is a pretty powerful tool! While I believe that app was coded by hand in the usual mix of CSS and HTML held together by Javascript, there’s surely a gap in the market for a powerpoint-style application to generate visual data exploration mini-apps.

The second observation that struck me over the weekend was again a quote that went something along the lines of

In the coming 2 decades, a statistician will be the trendy job to have, in the same way that it software engineer has been the trendy job for the previous two decades

Again, while I feel this is somewhat of an exaggeration (and as a software engineer, I’ll confess to bias), the event certainly outlined to me the power of inference from data… and the importance of keeping your data and keeping your data clean. Single examples from data are rarely sufficient to indicate an extrapolation and a zero value field is not the same as an empty one… they may sound like simple observations but as programmers it’s not unusual for us to treat these single examples as general trends and zero and null as being equivalent. But they’re not, and it’s really easy to mess up your conclusions as a result of these and many other simple but potentially incorrect assumptions. The weekend served as a reminder of these simple values to me – remembering these at coding time can make the difference between a piece of code that lasts 5 years instead of suffering death by maintenance in the first year of its life.

Finally, in addition to my languages-of-the-future predictions of javascript and erlang, I think I’ll have to add R. It’s such a powerful and easy language, I was really blown away by it

java, programming

Never check null

OK, that’s maybe a bit of an exaggeration. I mean never check null for mandatory input.

I am thoroughly sick of code that doesn’t know what to do returning null. If you don’ t know what to do you should throw an exception. There are only 2 exceptions to this rule:

  • Converters
  • Wrappers

Here, you can return null and let the calling class figure out to do. But as soon as you have logic in you code, you should stop checking.

If you need something, it can’t be null. Simple. So it’s an exception. Get over it, get a stack trace ASAP, and we can find out what’s wrong.

Don’t cover it with some if null then null crap and pass the buck to the caller. The bug is still there, you’re only making it harder to find.

And if anyone tells you NullPointerException is wrong, don’t worry about them. They haven’t yet realised the truth that is

NullPointerException tells you what’s wrong, where and if you throw it early enough, why.

It’s infinitely prefereable to have a NullPointerException to a mysterious oh where’s my output scenario.

Oh, and I nearly forgot to mention the benefits:

  • Your code becomes shorter and more readable, because you stop checking what’s null (no more checking before your call to Collection.iterator())
  • You become forced to document that your input shouldn’t be null in your Javadoc, which helps your users and covers your ass
  • You only have to handle one type of exceptions: real exceptions, and not exceptions dressed as nulls.
java, programming

You can, but should you?

I have read on previous occasions that there are three stages to learning a new language, whether it is related to computers or human language.

The first is comprehending. You might need a book beside you, but basically during this stage you are getting to grips with the functionality of the language, learning how to translate any existing language you know into these new constructs. You can make the language work, but you are still mapping to some previous concepts you already have.

The second is understanding. You can make the language function and now you start to understand how to express yourself in the natural way of that language. You form constructs naturally and learn to appreciate good constructs, and criticise bad ones .

Finally you master a language. You begin to see the limitations of the language and start conceiving new constructs. Not only do you know many possible constructs, but you can generally pick the best one for each situation, manipulating how your language will be understood by others.

I don’t think I could reasonably claim mastery of Java, but I realised I was well on the road to it today, as I saw two complex constructs. One was an elegant and original solution that lacked finishing touches; and the second was a stupid exhibition of code machoism, certain to confuse functional users, and generally be a pain in the ass. The good example deserves a post of its own, but I can plant the bad one here 🙂

Just because you can declare a new, anonymous inner class in a method call does not mean you should. I mean, maybe there is a practical use for this, but I’ve just never found it. As far as I can see:

  • It makes your code a complete bitch to read. Classes in classes, sure. Classes in methods, maybe, if you really have to. Classes in methods calls though…
  • It’s impossible for version control. OK, maybe Mercurial or Git could cope, but Clearcase and friends haven’t a hope
  • Eclipse chokes. The correction works OK, but the search doesn’t find these classes.
  • You end up writing it again and again. Write-once code occurs less often than we imagine, maybe aside from request/response type classes. Many candidates for small classes get reused frequently, so writing them over and over again isn’t efficient.

Anyway, I spent an hour today, writing some guy’s method-call-classes because someone updated an interface they implement, eclipse didn’t find, and version control couldn’t cope.

It’s ten times nicer to put them in a lovely little class file of their own, because frankly, little class files are the best ones. (Unless you create 50,000 of them.)