We need tools to help programmers name things!

Phil Karlton’s statement

There are only two hard things in Computer Science: cache invalidation and naming things.

has been immortalized in programming literature from Martin Fowler to The Codeless Code, largely because it’s true.

Cache Invalidation is a well-understood problem. Students learn about it in college. Tools exist to help you figure out when to invalidate data in your cache. Naming things, on the other hand, is rarely touched by programming courses (exception to the rule), and help is hard to find beyond basic guidance in some blog posts or powerpoint presentations.

Even when naming is discussed, advice is usually limited to basic conventions of a language such as methods with certain names have certain expectations (getX, setX, equals, etc.) or the use of nouns and verbs. Discussion of the real issue – what are the best words to describe this new functionality to my audience – is shockingly rare.

It would be easy to give a hand-waving response to this question, like

Consider your audience. Writing code for a wide audience of programmers will limit you to using generally understood concepts and vocabulary. Writing specialist code for domain experts will allow you to use more precise and specialized words. What is the vocabulary that those interacting with your code are likely to understand?

Sure – this is great general advice, but it doesn’t help me with today’s problem of How do I call this variable that can contain a single date or a date range of up to 4 days?

This is a solvable general problem. Probably easier than cache invalidation.

I’m imagining a tool that’s something between Stack Overflow and Urban Dictionary. Programmers can submit words that they’ve used in their own applications, along with their meanings, and some examples of their usage in publicly available code or APIs.

Words could be ranked between general and specific, vague and precise, knowing that different words will occupy different parts of the spectrum. For example value is both general and vague, currency code is general and precise, Fare Basis is specific and precise. Developers should be able to vote words up or down depending on their experience with using them. The ideal result? A reusable corpus of defined names that programmers find useful, and a common vocabulary that would permeate a variety of programs across domains, languages and applications.

(Now I just need to find a few days to get time to code it!)


Migrating from Play Framework v2.2 to Activator v2.3

I really like the Play Framework but they’re not shy about making changes when moving to new major versions. The migration of existing an existing app from v2.2.x to v2.3.x can be a painful process, I’ve done it twice now, and there are many pitfalls on the way. Looking through articles as I go suggests I’m not the only one suffering.

It’s a good idea to read the documentation at https://www.playframework.com/documentation/2.4.x/Migration23

Here’s a summary of the changes you’ll need to move a Java project

Before you begin…

Make sure you commit your current version, and do a `play eclipse` in the old version before you start to migrate if you want IDE support. Once you start the migration, you won’t be able to do this





lazy val root = (project in file(“.”)).enablePlugins(PlayJava).enablePlugins(SbtWeb)

in your build.sbt. You can leave out the SbtWeb plugin if you don’t use Play’s templating language.

If you use external APIs, you’ll also need to add javaWs to your library dependencies. Mine now looks like this:

libraryDependencies ++= Seq(

If your project uses LESS, you now have to explicitly indicate that you want LESS files to be compiled. This line in build.sbt will cause all .less assets to be included in the compilation

includeFilter in (Assets, LessKeys.less) := “*.less”

.java files

The javaWs package name has also changed from upper-case to lower-case, which is easy-to-miss in your imports. Do a search and replace on all your .java files, changing

import play.libs.WS.


import play.libs.ws.

Note the trailing . at the end of the line!


Coffeescript and LESS are now no-longer included by default. If you use them, they need to be included in your plugins.sbt file. Annoyingly, you won’t get an error if you fail to include the LESS component, your site will just look bad.

Add these lines to your plugins.sbt

addSbtPlugin(“com.typesafe.sbt” % “sbt-coffeescript” % “1.0.0”)

addSbtPlugin(“com.typesafe.sbt” % “sbt-less” % “1.0.1”)

Note the blank line between the two plug-ins. That’s not an accident, it’s required.

While you’re here, you should also roll up your Play sbt plugin version

addSbtPlugin(“com.typesafe.play” % “sbt-plugin” % “2.3.3”)


Finally, upgrade your Play version to get the changes. Set


in your build.properties

Check it out

That’s it. Run

activator clean update start

to compile and run your new version!



Gah! Joda DateTimeBuilder is not always symmetric either

In a previous post I complained that Java Date Format was not symmetric.

Well, turns out Java 7’s Joda Time is also not necessarily symmetric… even when the formatter is not lossy!

Try it and see (for this to work, your current system time must not be UTC!)

DateTime now = new DateTime(DateTimeZone.UTC).withMillis(0);

String nowText = ISODateTimeFormat.dateTimeNoMillis().print(now);

DateTime then = ISODateTimeFormat.dateTimeNoMillis().parse(nowText);

assertEquals(now, then);

and your test will fail! Turns out that the reparsing of the date loses the TimeZone information, so that even though the printed format of the nowText string is correct, the reparsing doesn’t initialize then with the timezone in the response. This, according to Joda, this is considered buggy but won’t be changed for historical reasons (sounds like any other Java Date/Calendar packages?) but it would be useful if this was clearly stated, up front in the documentation. Calling

DateTime then = DateTime.parse(nowText);

will give the correct result… which looks obvious when written here, but is not necessarily so obvious when you’re deep in your debugger. .

So, another new DateTimeCalendar package for Java 9 anyone? 🙂


The Importance of Using Real Data when Developing a Proof-of-Concept

When developing data-driven software, there’s a constant tension between anonymity and usefulness.

On one hand, some level of anonymity is required when using any data set to protect sensitive customer and commercial information. On the other, increasingly obfuscating data reduces its usefulness for information discovery.
When demonstrating a proof-of-concept, data-driven application, it is typical to use generated, fake data. This has the obvious advantage of completely protecting customer anonymity, satisfying the commercial folks. It also placates project managers, by replacing the time required to obfuscate the data (an unknown quantity) with the time required to generate fake data (a predictable quantity). But as part of the software development team, you should avoid letting this happen.


When replacing the real with a fake, it’s important to remember that the best-case for this generated, fake data, is that it looks and smells entirely like real data, and that this best case is unachievable. In order to generate completely accurate fake data, you would have to have a complete understanding of the domain being analysed, obviously an impossible demand. So generally the benchmark is set at generating “believable data.”

Believable to whom?

Well, to anyone you want to sell it to, I imagine. But this is a dangerous demand. You can never know if your data is not believable until it’s too late – in the same way as if you can never know if your network security is good enough, until it’s too late – and even then, only if someone raises an alarm.

Your team probably has good domain knowledge, and you put that knowledge into your generated data, creating what you believe to be realistic scenarios, and believable correlations. You show it to the executives in your own company, who are unlikely to have as strong domain knowledge as your team.
Their reaction is enthusiastic – they are surprised by correlations in your data and this helps generate a positive impression of your product.

So you go and show it to potential customers, who gather executives and experts for your presentation, often putting hundreds of years of domain expertise in a single room. Of course, many of the correlations they expect to see in your fake data don’t exist, because you didn’t think to create them, generating immediate suspicion of your product for your potential customer. If you’re lucky, they’ll understand the issue is due to your generated data.  If you’re lucky, they will question why they’re not seeing what they expect. If you’re unlucky, they will simply use it to dismiss your product, using the missing information to re-enforce the reasons why they don’t need software to support their jobs, or to argue that such tools should really be developed in-house. In any of these cases, you’ve just devalued your potential new product.

But why couldn’t this happen with obfuscated data?

In real data, patterns such as locations, times or names in the data could be used by competitors (to whom you’re going to show this proof-of-concept) to reasonably guess to which of your customers this data belongs (argues the commercial team, who don’t want to risk your customer’s data). So your obfuscations must ensure that these elements of the data are hidden sufficiently to ensure that this doesn’t happen – knowing, of course, that these domain experts might spot some pattern you missed and use it to infer the underlying customer data in any case.

During the obfuscation process it’s important to remember that while many patterns will be diligently preserved, many will be necessarily destroyed, reducing the selection of available correlations in the data. But won’t this obfuscation process – of unknown duration – reduce the quality of the data to such a point that it would be less convincing to clients than generated data (argues the product manager who wants this software to be finished ‘yesterday’)? That’s unlikely.

Data generation is an additive process, but obfuscation is a subtractive one.

When the elements to be subtracted are sensitive commercial data, it’s likely that more time and due care will be allocated to creating the data set – which is the key component of any data-driven software – than if the data was generated, because commercially sensitive data would otherwise be at risk. And because those people doing the subtraction are data-driven domain experts themselves, they will make sure to preserve the correlations they would have otherwise placed in their generated data, as best they can.

Meanwhile, because the process is subtractive, rather than additive, unseen correlations in the original real dataset may live on, providing the ability surprise both the development team and potential customers alike. And finally, using obsfucated data, you can easily explain in advance that some correlations were removed during the obfuscation process. This means that if your potential client doesn’t see a correlation they’re expecting, they’re more likely to blame it on the necessary obfuscation process, or simply question it, than silently blaming it on your incompetence.

The data must be surprising

Remember that it’s undiscovered surprises in the data pushes the development of data-driven software in the first place. It’s important for development teams to remember that understanding what would otherwise be surprises in the data is what makes someone a domain expert. The only way to gain this expertise is to work on real data, to the maximum extent possible.

So argue as hard as you can to never generate fake data… and if you do have to fake it, work with real data until the last possible moment.


Be careful – Java SimpleDateFormat is not always symmetric

In my job, we often have to work with free-text data storage, where our customers, and the customers of our customers, will be looking directly at the only copy of the same data the we’re depending on in our business logic, encoded as a string. We make pretty heavy use of configurable grammars, and our structured data often comes with reversible encoders/decoders to render it as a human-readable string (in 11 different languages).

For Java date conversion we make heavy use of Java’s SimpleDateFormat – as implemented in the Sun JDK 1.6, which has proved pretty robust in the past. Thus I was surprised when I started seeing ParseExceptions in my unit tests, especially in dates being parsed had originally been produced by SimpleDateFormat itself.

With SimpleDateFormat, you initialise the class with a string pattern – and then we use dateFormat.format(Date) to produce the encoded string, or dateFormat.parse(String) to parse a String back to a Date object. The pattern you choose may be as simple as yyyy – in which case dateFormat.format(Date) would produce something like 2013 from today’s date. When you call dateFormat.parse(String) on 2013, obviously the rest of the date data will be lost. By default dateFormat will instantiate the unknown values of the resulting date with the Date default values, so midnight 1970-1-1 in your timezone.  If you formated 2013 with this example, calling parse with the same formatter would correctly give you back 2013-1-1 00:00 from your input – any day, time or timezone information in the original date would be lost.

In this case, I wanted to print the time of departure of Amtrak trains, using the standard travel agent date format, hmma. [1] So for a train departing at 9:30, travelAgentDate.format(..) will produce 930am.  For 21:30, format will produce 930pm. And for 22:59, format will produce 1059pm. And when I call travelAgentDate.parse(1059pm), I will get a ParseException. Inspecting the source code, it’s easy to see why.

The parse(..) method here parses from left-to-right, generally the most-efficient but less-robust way to parse text. When it tries to parse the hour using h, if there’s a delimiter in the pattern, it will peek ahead to see if the delimiter is the next character, but if the next character is numeric, it will simply assume it’s part of the next variable, in this case the mm minute variable. So if I had used a format like h.mma or even hhmma I would have been safe, but the variable-length h string without a trailing delimiter confuses the parse method, even though the format method can produce such a string. Too bad.

Not having the option of changing the format (travel agencies are pretty set in their ways), and not feeling too enthused about writing my own RTL parser for this one edge case, I began to look for workarounds. After a coffee break and a chat with  a colleague, we decided that we should set the parser to a format that it could use symmetrically, and then massage the resulting input-output into that format. The two options we came up with were

  • Use hhmma format in the parser, trim the leading 0’s after encoding it to a string, and when decoding, pad it back with 0’s to get to the right length before calling parse
  • Use hh:mma format in the parser, remove the : after encoding it to a string, and when decoding, re-inject a : 4 characters from the end.

I plumped for the first option, mostly because I already had the toolkit already there to pad and the numbers – the efficiency gain of remaining LTR with no read-ahead is not relevant in this scenario but does sound nice, although we agreed that the second option would actually be slightly more robust.

Anyway, the code is in production now with no complaints yet, and another lesson learned about the oddities of the Sun JVM!

[1] I lied about the travel agency date format above in order to keep things simple. If the time is on the hour, travel agents will skip the minute part, and simply say 22:00 -> 10pm. This means that the parser itself is actually given by a factory depending on the minute in the time (or when decoding, the length of the input string). To further complicate matters, they also forgoe the m in am and pm, instead just using 10a or 10p. Neither of those details are relevant to the case at hand, except for the fact that it meant we were already wrapping SimpleDateFormat extensively to get the desired results.

java, programming, Uncategorized

Exception Antipatterns

Java has pretty good exception handling, but it’s something that a surprising number of programmers get wrong.
The point of an exception is to provide detailed information to software maintainers about the reason for a program’s failure.
The three key points here are:

  • You should only throw an exception in cases where the program fails. If you’re frequently seeing the same exception in the same place – you should question whether your program should be allowed to fail under those conditions
  • Exceptions are for software maintainers only. The exception should be logged at the the appropriate point in the code, and a friendlier message should be displayed to the user
  • Detailed information should be provided (within the limits of data security, of course). The minimum should be a stack trace locating the exact point where things went wrong, ideally also with an informative message.

Its worthwhile spending a little time when you code to produce useful exceptions, it saves time when something goes wrong with your code in production because the source of the problem can easily be found and traced.

Bearing that in mind, here’s some anti-patterns I’ve recently seen around exceptions. Lets look at why they’re bad practice:

 stuff that contacts an external system;
 catch(APIException e){
 throw new BusinessException("Failure in external system");

The good stuff here is that the developer is catching only the appropriate exception, he’s describing what’s wrong in his new exception, and that he’s allowing the program to fail.
The problem is that, in the process he’s just destroyed the whole stack trace that would inform us where the external system failed during the reply. Without a debugger or a detailed functional knowledge of how the external system works, there’s nothing more you can do with this exception.
Some better solutions here would be (in order of my preference):

  • Don’t catch and rethrow – maybe it’s fine to just let the APIException continue up the stack
  • Aggregate the APIException to the BusinessException… and make sure to log the whole trace when you catch the business exception
  • Log the result of the APIException now before rethrowing

Here’s another one

 BusinessResponse processAPIResponse() throws TimeoutException, ConcurrentModificationException, TechnicalException, FunctionalException{
 Response r = result of call to external system;
 int code = r.getErrorCode();
 if(code == 1234){
 throw new ConcurrentModificationException();
 if(code >= 8000 && code < 9000){
 throw new TechnicalException();
 if(code > 0 && code < 8000){
 throw new FunctionalException();

Here, the problem is that we’re throwing a bunch of checked exceptions, based on the API response, that the client may not be able to handle.
A good rule of thumb for exception handling (and it’s open to debate), as mentioned by O’Reilly
is that, Unless there’s something the client code can do to recover from this failure, it’s better to throw a runtime exception.

Finally, when’s an exception not an exception?

There’s two common approaches to avoiding throwing exceptions when you should have thrown one… and they are:

  •  if without else — Of course you code if statements all the time. But do you know what should happen when your if statement is not true?  Should the default behaviour really be nothing? (Hint: The default behavior is frequently not nothing!)

Like, for example, a UI event handler… if you just coded


then without the condition, you risk the possibility that UI may just stay in the same place, un-updated forever.

  • Returning null — This can basically be a null pointer exception that hasn’t happened yet, in someone else’s code.

Two great ways to avoid this problem are:

  • If you’re returning null, ask does null have a meaning here (like getFirstSomething() can return null if the list of something has 0 elements), or am I just returning null because I don’t know what to do otherwise?
  • Put a javadoc return tag on every method that returns an object. Document under what conditions this method can return null.

Putting this stuff in when you code can sometimes feel like a hassle – it’s code not targeted at solving the issue you’re coding to solve… but it can pay great dividends later when your code, or some system it’s connected to, starts behaving unexpectedly!


Keyboard navigation

For the last six months or so, I’ve been working on a team designing a pretty nifty web-app for travel agents. I’ve always thought myself that keyboard navigation was important, since most of us don’t use both hands for the mouse, and I’m struck by the efforts that my UI coder and User Experience colleagues make to ensure the app works smoothly for keyboard navigators. (This is particularly important in the travel agency business, where some agents have been typing on a green screen since the eighties.)

And now I find myself stuck, with no mouse and two hours to kill, navigating the internet on a television with a keyboard, and I’m quickly finding out which of my favourite sites have thought about keyboard navigation!

In heavily JS-based web apps, this is not something that comes for free. Watching good UI coders suffer for an hour or so to set up a good tabbing order on a screen does not scream fun work…. and that goes double for fixing keyboard navigation bugs that can sometimes leave the tabbing order off the screen! But if you want to see who’s made the effort and who hasn’t, you don’t have to look far. Unplug your mouse and fire up facebook.com, and you’ll see the extent that the developers have gone to to ensure that their site browses easily using only the keyboard. Not only have they set up a very easy and logical tabbing order (including custom button highlights), they’ve also started using keyboard shortcuts as part of the design. Nice. On the other hand, some sites have just made no effort, which can lead to dead-end tabs, and looking to see the link target in order to determine where your current tab position will take you… try logging into the wordpress.com administrator panel if you want to see what I mean!

The good news is that regular websites, and webapps that reload the page,  get away scot-free, as long as they’re well designed. Looking at some old sites I wrote, they proved to be dead easy to navigate using just they keyboard, since the divs are well lined out. However, trying one older table-hell websites or ones with wierdly positioned divs (delicious.com, surprisingly!), and the hacks become quickly apparent.

What’s the point? Well,  since most PC users always have one hand on the keyboard, it seems a waste not to take advantage. I’m hoping to see more sites take up keypress navigation in the future.

PS. WordPress.com, my keyboard is stuck in this box. You suck!