Week4

Equals and Hashcode

Overriding the equals() and hashCode() methods

As we have seen last week, the cosmic superclass Object has several general methods, of which we have already seen the toString() method. Now, we will see the other two methods that we promised to cover.

equals()

When overriding the boolean equals() methods, there is a number of rules you need to take into account if you want to let other classes use your objects. These rules that are related to overriding methods are called a contract. It is the responsibility of the programmer (which is you!) to adhere to these! Failure to do so may lead to weird complicated bugs that can take a lot of time to find as they may lead to hard-to-detect logical errors in your programs.

Contracts can typically be found in the Javadoc documentation of the class that introduces the method you are implementing or overriding. For the equals method itself we can find in the documentation that there are five rules in the contract that an overriden implementation should adhere to:

  • It is reflexive: x.equals(x) should always return true.
  • It is symmetric: x.equals(y) should return true if and only if y.equals(x) returns true.
  • It is transitive: if x.equals(y) returns true and y.equals(z) returns true too, then x.equals(z) must also return true.
  • It is consistent: calling x.equals(y) multiple times should each time result in true or each time result in false, provided no information used in these equal comparisons of the object is modified in between the calls.
  • And x.equals(null) should return false, provided x is not null (which would result in a NullPointerException anyways).

Aside from these five rules for equals(), the documentation also mentions: "Note that it is generally necessary to override the hashCode() method whenever equals() is overridden, to maintain the general contract for the hashCode() method, which states that equal objects must have equal hash codes."

hashCode()

So, what does int hashCode() do?

By default, it returns something that is strongly related to the the memory address of the object as an int. In general, a hash function maps data of arbitrary size to a fixed size, that is in our case an int. In Java, the main purpose is rapid data lookup, typically in a HashSet or HashMap, about which we talk more next week. Possible analogies of how a hash code can be used to speed up identification of an object are a fingerprint of a person or the length of a movie. To understand this well, realize that a movie has only one length, so if we try to find a movie in a large collection of movies, we can start by looking for movies that have the correct length. Even though multiple movies could have the same length, we can eliminate many movies already by first looking at this property. As you can see from this example, a hashCode might not be, and does not have to be, unique.

The documentation of the Object class states the following rules for the contract of hashCode:

  • hashCode must return the same integer when called multiple times, if no information used in equals is modified between the calls.
  • If x.equals(y) returns true, then it must hold that x.hashCode() == y.hashCode().
  • If x.equals(y) returns false, then x.hashCode does not necessarily have to differ from y.hashCode(). However, distinct integer results for unequal objects may improve the performance of hash tables.

Equal hashcodes thus are a necessary condition for equality of two objects, but are not a sufficient condition.

For deeper understanding, elaborate for yourself on the fingerprint analogy: if the same human leaves fingerprints in two locations, they will be equal. However, if two sets of fingerprints are equal, they still could be from different humans as there is an extremely small (but non-zero) probability that two different humans have matching fingerprints. Still, a fingerprint match will eliminate many candidates you do not need to consider and investigate in more detail.

Code examples overriding equals() and hashCode()

Adhering to the contracts of equals() and hashCode() can be a tricky business. In practice, you should try to avoid doing this by hand. Actually, IntelliJ or any other good IDE can generate the code for you!

Consider the following example:

public class Pair {
  private final int x;
  private final int y;

  public Pair(int x, int y) {
    this.x = x;
    this.y = y;
  }

  public int getFirst() {
    return x;
  }

  public int getSecond() {
    return y;
  }
}

When using IntelliJ with the IntellJ default template, we see the following code:

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;

    Pair pair = (Pair) o;

    if (x != pair.x) return false;
    return y == pair.y;
}

@Override
public int hashCode() {
    int result = x;
    result = 31 * result + y;
    return result;
}

As you can see, the equals method contains a couple of checks for edge cases, before it starts comparing the instance variables. The hashCode method uses a prime number 31 factor to combine different instance variables, to avoid that the pair 1, 2 will end up with the same hashcode as the pair 2, 1.

If we use IntelliJ with the java.util.Objects template, we get slightly different code:

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;
    Pair pair = (Pair) o;
    return x == pair.x && y == pair.y;
}

@Override
public int hashCode() {
    return Objects.hash(x, y);
}

The equals() method looks very similar, but for hashCode we now see that a helper method from the utility class java.util.Objects (not to be confused with the cosmic superclass java.lang.Object!!) is used to conveniently compute the hashCode.

When using Eclipse, it could generate the following code that is slightly longer, but very similar in functionality to the IntelliJ default template:

@Override
public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + x;
  result = prime * result + y;
  return result;
}

@Override
public boolean equals(Object obj) {
  if (this == obj)
    return true;
  if (obj == null)
    return false;
  if (getClass() != obj.getClass())
    return false;
  Pair other = (Pair) obj;
  if (x != other.x)
    return false;
  if (y != other.y)
    return false;
  return true;
}

When to override hashCode and equals

Whether you need to override equals() and hashCode() depends on your intended use. If you are likely to have two separate objects with the same values but want to regard them as separate objects under all circumstances, the default implementation is what you want. On the other hand, if you want to be able to compare separate objects of your class based on their contents, you should override hashCode and equals. For example, if you read and store pairs of numbers from a file and create objects that way and then want to check whether they also exist in a second file. In most cases where you want to override both methods, it is preferable to let your IDE generate the code, but in some circumstances (such as unordered pairs, i.e. if we do want the pair [1, 2] to be considered equal to [2, 1]), you may want to do it yourself, or generate the code automatically and adjust it as needed.

Exercise

Test your knowledge

In this quiz, you can test your knowledge on the subjects covered in this chapter.

If you would want to compare two String objects and see if they hold the same word(s), would you use == or equals, and why? You should assume that these String objects can come from any source (e.g. from user input, from a file, etcetera).


What should you keep in mind when overriding the equals() method?


Assuming the contract between hashCode() and equals() is respected, what does it mean when two objects have the same hash code? And what if they have different ones?

You have reached the end of this section! Continue to the next section: