Thursday, January 31, 2013

How not to write a subclass

As a programmer coming to JavaScript from the Java world (arguably a bad influence) and the ActionScript world (undoubtedly an even worse one), learning how JavaScript works has been a gradual and error-prone process. I've recently been working on a Coding conventions page on the Orion wiki, and a big chunk of that page is devoted to avoiding the kind of common JavaScript traps that I'm always falling into.

Today I wrote a section about creating classes, taking care to point out a mistake that I'd made dozens of times in the past. This post is adapted from there, so go read the original writeup if you're not interested in the extra verbiage.

First, here's how we tend to create classes in Orion's JavaScript code. I won't claim this is the One True Way, just an easy and straightforward convention that we've settled on.

function Duck(name) {
    this.name = name;
}
Duck.prototype.greet = function() {
    console.log("Quack quack, I'm a duck named " + this.name);
};

What new really does

Let's look at this piece of code, which is familiar to any JavaScript programmer:
new Duck("Robert");

It obviously creates a new instance of Duck. But what is the new operator actually doing here? Well, it performs an algorithm that we can break into 4 distinct steps:

  1. Create a brand-new object (call it O).
  2. Set O's prototype equal to Duck.prototype.
  3. Invoke the Duck function with this equal to O, and the name parameter equal to "Robert".
  4. Return O.

Subclasses

A problem arises when we want to extend an existing class with new behavior. How do we create a prototype for the subclass such that it extends the superclass's prototype?

Here's the wrong way:

function SeaDuck(name, diveDepth) {
    Duck.call(this, name);      // call the super constructor
    this.diveDepth = diveDepth;
}
SeaDuck.prototype = new Duck(); // XXX wrong
SeaDuck.prototype.dive = function() {
    console.log(this.name + " dived to a depth of " + this.diveDepth);
};

Everything here is reasonable except the XXX'd line, SeaDuck.prototype = new Duck(). This part is wrong. (Unfortunately, many occurrences of this pattern remain in the Orion source code, lots of them written by me, which still need to be cleaned up.) So what's wrong with it?

  • It's inefficient, since SeaDuck.prototype has fields created by Duck that are never used (like name, for example). Any work done in the Duck constructor is useless to us here.
  • It's fragile, since the proper operation of this code relies on Duck not validating its input parameters. If anyone ever changes Duck to assert that it receives a valid name, our SeaDuck code will blow up. (And sure: we could pass in a fake name to satisfy it, but that clutters up our code with even more useless data).

For our SeaDuck.prototype, what we really want is not to call the Duck constructor, but just to create a new object whose prototype is set to Duck.prototype. In other words, we only want steps #1, #2, and #4 of the new algorithm, not #3.

Object.create

For a long time JavaScript provided no way to decouple object creation and prototype-setting from initialization. ECMAScript 5 finally fixed this problem by introducing Object.create. Among other things, Object.create allows you to build a new object and tell the JS engine exactly what its prototype should be. Just pass the desired prototype object as the first argument. Easy!
SeaDuck.protoype = Object.create(Duck.prototype);

And that's exactly what we wanted. In fact, we can use Object.create to replace even legitimate uses of the new operator. Instead of this:

var robert = new Duck("Robert");

…We can write this, effectively re-implementing the new algorithm by hand:

var robert = Object.create(Duck.prototype);
Duck.call(robert, "Robert");

While this might be conceptually clearer, it's wildly verbose, so I'd recommend sticking with new in these cases.

But what if you're coding for a crap browser like Internet Explorer 8, which doesn't support Object.create? Then you need to write your own utility, typically called "beget":

function beget(obj) {
  function BogusConstructor() {}
  BogusConstructor.prototype = obj;
  return new BogusConstructor();
}

SeaDuck.prototype = beget(Duck.prototype);

beget avoids the problem I pointed out in the previous section by creating a new BogusConstructor every time it's called, which does no initialization work and only exists to achieve point #3 of the new algorithm.

You could also turn beget into a partial shim for Object.create. I say partial because Object.create also deals with property descriptors, which are impossible to shim.

The prototype, "prototype", [[Prototype]], and __proto__ mess

Looking back, a big source of my confusion as a learner was in understanding how the prototype property of functions relates to an "object's prototype" and how that in turn affects property lookup. You can easily see that regular objects don't have a prototype property: try evaluating ({ }).prototype in your debugger. So what's this "prototype" thing everyone keeps talking about on objects? Well, here's my attempt to clarify things, in point form:
  1. When we say "an object's prototype", we're referring to an internal property of the object, which the ECMAScript spec calls [[Prototype]]. Being an internal property, [[Property]] is not observable from regular ECMAScript code.
  2. An object's [[Prototype]] is consulted to resolve property names when the dot . and array index [] operators are used on the object. If the desired property name is not found in [[Prototype]], then the [[Prototype]]'s [[Prototype]] is consulted, and so on. When people talk about the prototype chain, this is what they mean.
  3. The prototype property can be set on a function. When some function F is invoked as a constructor through the new operator, the value of F.prototype becomes the [[Prototype]] property of the newly-constructed object.
  4. To create a new object whose [[Prototype]] is some existing object P, use Object.create(P).
  5. I lied a bit in point (i). Most JavaScript engines provide a non-standard alias for the internal [[Prototype]] property. It's called __proto__. (ES5 has defined a standardized Object.getPrototypeOf, so use that instead of __proto__!)

While you can't rely on __proto__ in production code, it's great for debugging, and for fixing your mental model of how the JS engine works. Here's what a SeaDuck's __proto__ looks like in the JS console:

Note how you can expand the __proto__ chain all the way up to Object.prototype.