Quoth the runtime, "Segmentation Fault": On Coe’s First Law of Software Development

Last week, I discussed my Second Law of Software Development (self-documenting code isn’t), in reference to why proper, discrete documentation is a Good Thing. I didn’t get into what I think is the best time to write documentation (before you write code), because that’s a whole other rant in and of itself, but I did briefly mention my First Law of Software Development:

When it happens, you’ll know.

I’m not ashamed to admit that I’ve borrowed the phrasing from The Simpsons, but it’s a really good line. The First Law originally applied to baking in security when you’re developing SaaS, but other things keep coming up, wherein if I’d kept in mind that when it happens, you’ll know, I could have avoided a whole lot of hassle.

Simply put, the First Law is all about trying to see things coming, and being prepared for them. I could have borrowed from Scouts and gone with be prepared, but it doesn’t quite appeal to my sense of humour. The fact is, something will eventually go wrong, and when it finally happens, you’ll know. And when you look at the block of code that’s to blame, you’ll ask yourself why you didn’t code defensively for it in the first place.

It originally came to me when I was writing a CRM tool for the company I worked for in 2007. Inspired by a software engineering professor at my university, I wanted to code against bad input. At first, it was about malicious input, but as time has gone on, it really is about just generally bad input. The original motivation was about accepting the fact that at some point in time, someone, somewhere, is going to discover and exploit a weakness in your software. You don’t want to assume that all of your users will be nefarious little pissants, but in the interest of your good users, you ought to assume that your average long-term number of less-than-trustworthy users is nonzero.

So, eventually, someone will try to misuse your software. But does it stop there? The correct answer is no, no it doesn’t. While you’re validating your input against inappropriate behaviour, you can just as easily, if not more easily, validate for correct behaviour—that your users haven’t accidentally done something wrong. Type checking falls under this umbrella, and it’s useful both in the functions that are retrieving user input and the functions that are processing it. This is particularly important in weakly typed languages, because you can’t reliably just cast your input into a variable of type x (particularly in JavaScript, where concatenation and mathematic addition use the same operator). When users provide improper input that isn’t what it should be (but they still have honourable intentions), then you have a problem (maybe it’s in your documentation… but that’s another post). Maybe the input is well-formed, but has unexpected side-effects. When it happens, you’ll know.

Now that you’re validating your user input for validity and intent, are you done? Probably not. In this day and age, software doesn’t exist in a vacuum (apart from the little noddy programs you write to prove that you can handle the concept you were just taught). There are external subsystems that you rely on. In an ideal world, you’ll get perfect data from them, but this isn’t an ideal world. Databases get overloaded and refuse connections. Servers get restarted and services don’t always come up correctly, if at all. Connections time out, or you forget whether or not this request has to go through a proxy. When it happens, you’ll know, because all of a sudden, your software breaks. Hard. You need to figure out what subsystem failed, and more importantly, why, so that you can prevent it from happening that way in the future.

However, that isn’t enough. You should have been ready for that failure. You can’t assume that all the other subsystems will be there 100% of the time. Assume that your caching layer will disappear at an inconvenient moment. Know that your database won’t always give you a result set. If you have to call out to a separately managed web service, do not rely on it being there, or having the same API forever. Code defensively for the fact that eventually, something will go wrong, and you won’t be watching when it happens.

So there’s a very good reason why the First Law of Software Development is when it happens, you’ll know. Eventually, “it” will happen, and when you figure out what “it” was, and where it caused you problems, it’ll seem so obvious that there was a point of failure, or a weakness, that you’ll ask yourself why this problem wasn’t coded against in the first place.

Quoth the runtime, "Segmentation Fault"

Monday 27 June 2011

On Coe’s First Law of Software Development

No comments:

Post a Comment

Pages

Labels

Older Posts

About Me