Wednesday 28 November 2012

Your Encapsulation Is Bad, And You Should Feel Bad

Pass-by-reference is a fantastically powerful tool in object-oriented languages. In Java’s case, it ensures that no argument on a stack frame is longer than a processor word, by only passing along copies of primitives, and the heap locations of objects. It reduces your memory footprint fantastically, because it’s always there, unlike in C, where you had to specifically indicate that this argument is actually a pointer. Java does the same with return values as well—anything that you return from a method that isn’t a primitive is passed by its location on the heap.

And thus are a whole host of encapsulation and coupling issues born—particularly when you work with Collections.

Let’s say, for the sake of a specious example, I run a rental car agency. My agency is represented by an object Location:

According to the Rules and Standards of JavaBeans, I’ve done encapsulation right… until I decide to process a series of updates to my stock in this boneheaded way:

If, at any point after that for loop, I want to work with the list of vehicles, I’ll only have access to those vehicles that haven’t been washed in a week—I removed them from the same List that my Location object refers to.

Like I say, it’s a pretty specious example, but it shows what kind of unintended consequences can crop up when you pass mutable objects around by reference. Fortunately this doesn’t happen with Strings and Numbers (because they create new objects on the heap just about every time you assign a new value), but as soon as you start doing the same thing with more complex objects, you risk loss of data integrity. My Updater needs to know how my Location stores, and returns, the list of Vehicles in order to prevent problems, when it probably shouldn’t. The encapsulation here is bad, because it permits side-effects.

So what’s the fix? Replace the body of Location.getVehicles() with this: return new ArrayList(vehicles); and keep on going as-is. While each individual Vehicle that the lists are backed on will probably point to the same place on the heap (and this may, itself, have attendant problems, depending on what you’re doing, at least you know, for a fact, that whatever changes you make to the list you got back will be self-contained.

This gets even worse when you start throwing around DTOs for different serialisation methods. Because various annotations used by persistence architectures may not necessarily be compatible, often times you need to create three different DTOs for each of your database, XML, and JSON representations. These DTOs should only exist long enough to prepare your problem domain object for serialisation, or to deserialise something into your problem domain.

Say you created an amazing Web service that’s backed on SOAP (I know, I know, JSON geeks. It’s just an example). When I call your Java API’s method to getThing(), I shouldn’t be aware of the SOAP Body. I shouldn’t have to call, say, thing.getAttr(“thing”) to get something that’s an XML attribute, and then thing.getOtherThing().getValue() to get the String value of something that’s stored as an XML element. As a consumer of your API, I shouldn’t be aware of this; it means two things:

  1. You can’t easily move your service away from SOAP, without either...
    1. Forcing all your customers to update their code to use a new API, or
    2. Internally converting your new serialisation into something that can be expressed as a SOAP call,
  2. and you’ve told the world that your service is backed on SOAP.

Whether or not you’re proud of the fact that you’re using SOAP is irrelevant; it’s an implementation detail that I, as a consumer of your API, don’t care about. For all I know or care, you could be using a proprietary binary format, or even passing messages around by carrier pigeon. From my application’s perspective, this is all irrelevant. The encapsulation here is bad, because it exposes implementation details.

So, what’s the take-away from all this? Two things:

  1. Don’t return references to your member collections and arrays. It’s bad for coupling. Return copies, instead.
  2. When designing your API, give the consumers of it a paradigm that makes sense from the problem domain, instead of just blindly representing your storage format.

No comments:

Post a Comment