Sunday, November 1, 2009

Why hashCode() matters

If you have ever overridden the equals() method of Object in your class code (and who hasn't?) you will obviously also re-implemented hashcode() too, following the (ominously scary) warning in the Javadoc of equals():

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

Or perhaps not.

The reality is that we developers can be lazy, and sometimes laziness leads to making one assumption too many... such as that, having a sensible implementation of equals() somehow alleviates from the need of re-implementing hashCode() too.

Well, if you are like me, and you haven't heeded Sun's advice, then it's worth noting that this will soon or later bite you, and possibly pretty hard (and, I'd guess rather sooner than later...).

The reason is that the Collections framework make pretty heavy use of hashCode() to speed up such mundane matters as objects lookup and matching; unsurprisingly, especially so in the heavily-used implementation classes such as HashMap and HashSet.

In particular, the contains() method of either Collections will not do an asinine sequential sweep of the entire collection (if the concrete implementation is based directly or indirectly on either of the above two classes) using an equals() test on each element, returning true the first time an equal (in the sense defined by your implementation of equals()) element is encountered, or false otherwise.

Sun's implementation will instead use hashCode() to 'hit' the first possible 'bucket' where the element would have been stored by the HashSet, and, upon finding it (possibly) empty, it will simply assume that the Set does not contain the given element, even though an element that would have been equal (in the sense as defined by your implementation of equals()) is indeed someplace else inside that Set's internal implementation storage.

So, if you do plan to use the Collection framework with your newly-minted class, which you have implemented equals() for, then do yourself a favour and implement a suitable hashCode() too.
In fact, I suggest you do so even if you don't plan to use Collections of those objects: you know Sod's Law, now, don't you?

(and, no, there is no prize in guessing how I found out...)

1 comment:

  1. When people have trouble understanding why equal objects have to have equal hash codes, I like to point out the simple fact that that's the entire reason hashCode() even exists! Its whole purpose in life is precisely to provide a number that equal objects always share, and unequal objects rarely share.

    ReplyDelete