apis, programming, travel

Innovation starts with good APIs

(Written on September 21st, but only got around to publishing now)

I’ve spent much of the last two weeks, along with several of my colleagues, developing an entry for tHACK SFO 2012. I love these hack competitions – they can sometimes seem like a waste of time but they’re an important driver of solid technical skills, an opportunity to play with some new APIs or tools, and a driver of that overused buzz-word innovation.

The travel industry has an enormous variety of APIs – some are great, but most of which stink, and the APIs made available to us for tHACK really came from all ends of that spectrum. (Well, actually, to be fair, there were no real stinkers!)

One remarkable point that stuck me while coding was how much easier it is to include a great API in an application, than a poor one. We wrote our prototype as a Facebook app – I’d already used Facebook’s API before and I knew it was good, but writing the prototype was a pointed reminder of how much better it is than any in the travel industry. Accessing the features of Facebook within our app was simple, the requests and responses were easy to understand, and the results were fast and reliable.

This made the contrast between many of the competition-offered APIs brutally stark. Many travel industry APIs are intensely frustrating to use – and it was evident during the presentation of the hacks which ones were easy to use, and which ones were  trouble. There were 9 APIs provided, and 18 hacks. Of the 9 APIs provided, only 4 were used by more than one of the hacks. Of the remaining 5, 2 were used by independent competitors, 2 were used only in hacks published by the same company as published the API, and one was not used at all (1).

Here were some of the issues I encountered with those APIs – take it as a list of red flags to be noted when releasing a new API

  • Hard to connect. Soap wrappers are hard to use and rarely necessary these days – if your API is not RESTful, there better be a good reason for that.
  • Login to the API can be a tricky problem, but it should not require a persistent data store on my behalf. In particular, many APIs make a meal of OAuth. I know that OAuth’s a difficult paradigm, but with a bit of thought and understanding, a sensible implementation is possible.
  • If your API model doesn’t send or recieve in either XML or JSON, forget it. Simple tools are available to serialize/deserialize both XML and JSON APIs as constructs in virtually any programming language. Almost every other data transfer mechanism will require serialization by hand, a massive time constraint.
  • Your API should send and receive in the same data format. Sounds obvious, but it’s often ignored. If replies are in JSON, send should also be in JSON
  • Sensible data models with few surprises. A hallmark of an excellent API is rarely having to refer to the documentation. Interface objects should be sensibly named and constrained. Encoded data should use clearly named enumerations, ideally with a full-text translation, not integer codes.
  • Less chit-chat. Saving data, for example, should be a one-call operation – no excuses. For a simple flow, I should be able to send or receive all the data I need in no more than 3 API calls. More calls mean that you’re passing your orchestration and performance problems to me.
  • If a given API call is not fast, it should be ansynchonous. Give me an answer, or give me a token

Providing language SDKs is not enough! Not everyone wants to program in your programming language of choice. In any case, by pushing the hard work to the API means that clients of that API don’t have to implement those features individually, or in a variety of SDKs.

The proof was in the results. With only a couple of exceptions, the quality of the entries in tHACK SFO was unanimously high. (Check out iPlanTrip or Concierge for starters!) However, the judges gave no rewards to the users of difficult APIs, all the winning and highly commended hacks were based on the 5 easy APIs, and all of them used either Geckgo or Vayant, which struck me as being the two APIs that best combined content and ease-of-use. Those who took the time to figure out and use the more complex APIs had less time to produce high quality integrations.

Finally, a shout out to Vayant, the Amadeus Hotel Team and Rezgo whose APIs helped us produce a slick and innovate app – hopefully the people at Amadeus will let us publish tVote for you sooner rather than later.

(1) These figures come from my recollection of the event the following day. If you have official figures to confirm them, that would be nice.

java, programming, Uncategorized

Exception Antipatterns

Java has pretty good exception handling, but it’s something that a surprising number of programmers get wrong.
The point of an exception is to provide detailed information to software maintainers about the reason for a program’s failure.
The three key points here are:

  • You should only throw an exception in cases where the program fails. If you’re frequently seeing the same exception in the same place – you should question whether your program should be allowed to fail under those conditions
  • Exceptions are for software maintainers only. The exception should be logged at the the appropriate point in the code, and a friendlier message should be displayed to the user
  • Detailed information should be provided (within the limits of data security, of course). The minimum should be a stack trace locating the exact point where things went wrong, ideally also with an informative message.

Its worthwhile spending a little time when you code to produce useful exceptions, it saves time when something goes wrong with your code in production because the source of the problem can easily be found and traced.

Bearing that in mind, here’s some anti-patterns I’ve recently seen around exceptions. Lets look at why they’re bad practice:

 stuff that contacts an external system;
 catch(APIException e){
 throw new BusinessException("Failure in external system");

The good stuff here is that the developer is catching only the appropriate exception, he’s describing what’s wrong in his new exception, and that he’s allowing the program to fail.
The problem is that, in the process he’s just destroyed the whole stack trace that would inform us where the external system failed during the reply. Without a debugger or a detailed functional knowledge of how the external system works, there’s nothing more you can do with this exception.
Some better solutions here would be (in order of my preference):

  • Don’t catch and rethrow – maybe it’s fine to just let the APIException continue up the stack
  • Aggregate the APIException to the BusinessException… and make sure to log the whole trace when you catch the business exception
  • Log the result of the APIException now before rethrowing

Here’s another one

 BusinessResponse processAPIResponse() throws TimeoutException, ConcurrentModificationException, TechnicalException, FunctionalException{
 Response r = result of call to external system;
 int code = r.getErrorCode();
 if(code == 1234){
 throw new ConcurrentModificationException();
 if(code >= 8000 && code < 9000){
 throw new TechnicalException();
 if(code > 0 && code < 8000){
 throw new FunctionalException();

Here, the problem is that we’re throwing a bunch of checked exceptions, based on the API response, that the client may not be able to handle.
A good rule of thumb for exception handling (and it’s open to debate), as mentioned by O’Reilly
is that, Unless there’s something the client code can do to recover from this failure, it’s better to throw a runtime exception.

Finally, when’s an exception not an exception?

There’s two common approaches to avoiding throwing exceptions when you should have thrown one… and they are:

  •  if without else — Of course you code if statements all the time. But do you know what should happen when your if statement is not true?  Should the default behaviour really be nothing? (Hint: The default behavior is frequently not nothing!)

Like, for example, a UI event handler… if you just coded


then without the condition, you risk the possibility that UI may just stay in the same place, un-updated forever.

  • Returning null — This can basically be a null pointer exception that hasn’t happened yet, in someone else’s code.

Two great ways to avoid this problem are:

  • If you’re returning null, ask does null have a meaning here (like getFirstSomething() can return null if the list of something has 0 elements), or am I just returning null because I don’t know what to do otherwise?
  • Put a javadoc return tag on every method that returns an object. Document under what conditions this method can return null.

Putting this stuff in when you code can sometimes feel like a hassle – it’s code not targeted at solving the issue you’re coding to solve… but it can pay great dividends later when your code, or some system it’s connected to, starts behaving unexpectedly!

data, general, programming

Who wants to write a 200 page report that no-one will ever read?

I was at the Data Without Borders DC Data Dive a couple of weeks back, definitely a trip out of my usual worlds of Java and travel IT and visit into something a bit different.

A couple of quotes stuck in my head from that weekend – the first was from one of the participating charities, DC Action for Kids whose representative, in their initial presentation, said that one of their major motivators for taking part in the event was (and I paraphrase)

we wanted to produce something more interesting with a two-hundred page report that no-one will ever read.

One of Saturday’s presenters picked this up and ran with it, showing how they believed that we were in the process of trying to find a new methods of deep data presentation at events like the DC Datadive. While this may be somewhat of an exaggeration, DC Action for Kids achieved their aim, check out the beautiful app that came from their datadive team.

It made me realise what a powerful difference you can make with such an app – reading a 200 page report is usually a pretty tiresome process, but exploring the data set in a visual form is not only a joy, but allows the app writer to guide the end users to the same conclusion from the data that they would like them to draw, while at the same time allowing the user to feel that they are verifying this conclusion themselves. In today’s media landscape where discerning readers place full trust in few, if any, publications, this is a pretty powerful tool! While I believe that app was coded by hand in the usual mix of CSS and HTML held together by Javascript, there’s surely a gap in the market for a powerpoint-style application to generate visual data exploration mini-apps.

The second observation that struck me over the weekend was again a quote that went something along the lines of

In the coming 2 decades, a statistician will be the trendy job to have, in the same way that it software engineer has been the trendy job for the previous two decades

Again, while I feel this is somewhat of an exaggeration (and as a software engineer, I’ll confess to bias), the event certainly outlined to me the power of inference from data… and the importance of keeping your data and keeping your data clean. Single examples from data are rarely sufficient to indicate an extrapolation and a zero value field is not the same as an empty one… they may sound like simple observations but as programmers it’s not unusual for us to treat these single examples as general trends and zero and null as being equivalent. But they’re not, and it’s really easy to mess up your conclusions as a result of these and many other simple but potentially incorrect assumptions. The weekend served as a reminder of these simple values to me – remembering these at coding time can make the difference between a piece of code that lasts 5 years instead of suffering death by maintenance in the first year of its life.

Finally, in addition to my languages-of-the-future predictions of javascript and erlang, I think I’ll have to add R. It’s such a powerful and easy language, I was really blown away by it


Are we there yet?

After years with dual boot Windows XP / Fedora, with a new machine I’ve also hauled myself up a notch in OS stakes, and now I’m dual booting Ubuntu / Windows 7.   Putting Linux first in the new system is not an accident – I hadn’t planned it that way but I’m finally pleased to be defaulting to Ubuntu. It’s taken some time, but finally it seems that Linux has got beyond being my developer and server OS, and into the realms of something that is actually as easy (maybe easier) for every day use compared to Windows.

Don’t get me wrong, I’ve been a Linux advocate for years, and have been running Mandrake / Debian / Fedora / Ubuntu for 10 years now. But it’s never really been my everyday OS, just a dev platform or a server environment… until now.

Both Windows 7 and Ubuntu are substantial improvements over Windows XP. But since Windows 7 is closed, and Ubuntu is free, there’s no reason to buy Windows unless you need it – which to me is a tipping point. What’s Ubuntu got over Windows 7 ? I’d say

  • Flexibility – Traditionally I’d start with stability here, but amazingly I’ve never managed to crash Windows 7. But if something goes wrong with Ubuntu, I have the tools to fix it. If something is broken, I can have a go at repairing it. Windows 7 signature system brings stability at the cost of flexibility, but Ubuntu retains both.
  • Performance – Now here’s one I thought I’d never say… Ubuntu starts faster than Windows 7! After years of a 5 minute Debian boot time in just 2006, this one blows my mind, it takes less than 5 seconds to start Ubuntu. Equally surprisingly, both sound and power management worked out-of-the-box. And on top of all that, Gnome is actually responsive in a way that’s comparable to Windows.
  • Software install – Aptitude has been way better than any other software install platform for years, and it just keeps giving. No wondering where and why at install time, ask for it and it is done… and done without viruses!
  • Media – This one’s been brewing for some time. Amarok is still the best music manager out there, and VLC the best playback software, but until recently the poor rendering under Linux has been cutting them short. Not any more. 2 clicks after install takes you to a great media platform.

I’d be lying if I said it was all good though. There are some good reasons to keep Windows 7

  • Fonts – Yeah yeah, I know you can steal the actual fonts, but Windows somehow makes the text more readable. It’s so important and so obvious, and clearly so hard (or expensive) to do right. But it’s in your face as soon as you reboot in Windows, text in Ubuntu is still a bit square.
  • Graphics and Games – DirectX is a locked platform you can’t avoid if you want to do 3D-rendering or play games on your PC. And DirectX 10 is really very very shiny.
  • Aero – There once was a vast gulf between the usability of Gnome, and that of Windows. Then sometime around 2007, Gnome overtook Windows XP. And so now Windows 7 has surpassed Gnome… mostly by stealing a bunch of stuff from Apple OS. Windows 7 makes better use of the screen, and has a marginally better User Experience, if you ask me.
  • Compatibility – Lets face it, if you have some obscure program you need, or some piece of hardware that’s not supported, you still have no choice. You’ll need Windows.

But now, they’ll have to make me reboot to use it!


Keyboard navigation

For the last six months or so, I’ve been working on a team designing a pretty nifty web-app for travel agents. I’ve always thought myself that keyboard navigation was important, since most of us don’t use both hands for the mouse, and I’m struck by the efforts that my UI coder and User Experience colleagues make to ensure the app works smoothly for keyboard navigators. (This is particularly important in the travel agency business, where some agents have been typing on a green screen since the eighties.)

And now I find myself stuck, with no mouse and two hours to kill, navigating the internet on a television with a keyboard, and I’m quickly finding out which of my favourite sites have thought about keyboard navigation!

In heavily JS-based web apps, this is not something that comes for free. Watching good UI coders suffer for an hour or so to set up a good tabbing order on a screen does not scream fun work…. and that goes double for fixing keyboard navigation bugs that can sometimes leave the tabbing order off the screen! But if you want to see who’s made the effort and who hasn’t, you don’t have to look far. Unplug your mouse and fire up facebook.com, and you’ll see the extent that the developers have gone to to ensure that their site browses easily using only the keyboard. Not only have they set up a very easy and logical tabbing order (including custom button highlights), they’ve also started using keyboard shortcuts as part of the design. Nice. On the other hand, some sites have just made no effort, which can lead to dead-end tabs, and looking to see the link target in order to determine where your current tab position will take you… try logging into the wordpress.com administrator panel if you want to see what I mean!

The good news is that regular websites, and webapps that reload the page,  get away scot-free, as long as they’re well designed. Looking at some old sites I wrote, they proved to be dead easy to navigate using just they keyboard, since the divs are well lined out. However, trying one older table-hell websites or ones with wierdly positioned divs (delicious.com, surprisingly!), and the hacks become quickly apparent.

What’s the point? Well,  since most PC users always have one hand on the keyboard, it seems a waste not to take advantage. I’m hoping to see more sites take up keypress navigation in the future.

PS. WordPress.com, my keyboard is stuck in this box. You suck!


BigDecimal: When is zero not equal to zero?

While unit testing during the week, I came across a pretty nasty code artefact that had come about through a refactoring, where someone had decided to replace Double with BigDecimal in a class for handling monetary amounts. (Because it’s better for rounding, apparantly. I like floating-point numbers, but apparantly I’m in the minority!)

There was a unit test to see if the results of an arithmetic computation that should result zero, did result in zero. And with floating point numbers, this of course worked — thanks to the sensible implementation of FP numbers, and the fact that the relevant java classes were pretty well designed.

In comparison, BigDecimal feels a bit like a management-pleasing hack. A simplified way of imagining it would be a class containing a long (in base 10) and an int, with the int (called the scale) telling you at which figure in the long the decimal point should go. Then java.equals runs a comparison on these two numbers, if they’re both true, then they’re equal. This means that

3.99 – 3.99 != 0

because the scale of the left-hand side will be 2, and the scale of the right hand side, will be 0. The method I should use to see if these are equal is BigDecimal.compareTo(x) which will return 0 if they have the same literal value. Obviously, this makes my unit tests look ugly, but there’s much bigger problems than that.

What about this?

BigDecimal x = new BigDecimal(1);

BigDecimal y = new BigDecimal(1.00);

BigDecimal z = new BigDecimal(1.0000000000);

Set a = new HashSet();

Set b = new TreeSet();

Can you see where this is going? Set a is going to accept 3 different values of 1, despite the fact that sets are designed to contain unique elements. Wheras Set b is going to run into trouble because BigDecimal’s defintion of compareTo is not consistant with it’s definition of equals.

Honestly…. give me floating point maths any day!


SSDs to change the face of DBs forever

A few months ago, I stumbled upon the existance of a mySQL project called ReThink DB that re-writes the MySQL storage engine to be optimized for SSDs.

Really cool, really really cool. I’m surprised it’s not causing more of a fuss.

If you’ve ever used an SSD, you’ll understand the difference they make to your PC. Somehow, it’s like a weight has been lifted – there is no more crunching sound when you try to load stuff, it just appears quickly and silently. Removing the hard drive has made my day-to-day much nicer: when I start eclipse, it just loads, and when I do a search, within about 5 silent seconds it has found what I’m looking for. (With no noisy background indexing.) It basically improves your PCs performance for file intensive applications by an order of magnitude, and databases are no exception.

But Rethink’s argument is that, aside from the order of magnitude gain from the raw performance of SSDs, there is a second order of magnitude gain to be had by the rewriting of database algorithms. And the more I read into it, the more I know they are right.

Think about it. Think of all the hacks we do to make DBs faster for disks that just aren’t relevant any more.

For example: SSDs can instantly append. So there is no more database transaction log. Just append stuff, and you relieve both the job of logging and appending. And you don’t need to worry about consistency, because you’re just appending. If you want to go back, just go back. Once you relive this, you also relieve a huge concurrency constraint, meaning you can actually use all 4 of those cores, all of the time. Not only does it improves the performance of the disk, but by removing a rotational element it also improves the consistency of that performance.

But really, Rethink is just the beginning. Rethink is building on the existing paradigms of databases to make things faster, but the real benefit will come from embracing new paradigms. I think there’s a third order of magnitude in this one that will really change the way we use databases. Systems like MapReduce allow a great way to perform massively parallel data processing but are sequential in terms of data access . At some point the data has to be put in an index (a H-Base or BigTable), which is on a disk cluster.  But with SSDs, we have incredibly fast random access, which has an interesting element to add to that, now we have truly non-sequential access to input data.  Research like that of Logothetis and Yocum gives us a hint at the future — SSDs bring the power of indexing to the MapReduce party for free, just as a side-effect of their fast random reads. Of course, then you have to store all your data on SSDs, which for now is a bit pricey….

Anyway, the fun is only just beginning. Personally, I’m looking forward to  Document DBs like Apache CouchDB becoming suddenly a lot more viable as replacements for those tedious relational mappings we were all forced to go through in the past.