Equals and Hashcode
Overriding the equals() and hashCode() methods
As we have seen last week, the cosmic superclass Object
has several general methods, of which we have already seen the toString()
method. Now, we will see the other two methods that we promised to cover.
equals()
When overriding the boolean equals()
methods, there is a number of rules you need to take into account if you want to let other classes use your objects.
These rules that are related to overriding methods are called a contract. It is the responsibility of the programmer (which is you!) to adhere to these!
Failure to do so may lead to weird complicated bugs that can take a lot of time to find as they may lead to hard-to-detect logical errors in your programs.
Contracts can typically be found in the Javadoc documentation of the class that introduces the method you are implementing or overriding.
For the equals
method itself we can find in the documentation
that there are five rules in the contract that an overriden implementation should adhere to:
- It is reflexive:
x.equals(x)
should always returntrue
. - It is symmetric:
x.equals(y)
should returntrue
if and only ify.equals(x)
returns true. - It is transitive: if
x.equals(y)
returnstrue
andy.equals(z)
returnstrue
too, thenx.equals(z)
must also returntrue
. - It is consistent: calling
x.equals(y)
multiple times should each time result intrue
or each time result infalse
, provided no information used in these equal comparisons of the object is modified in between the calls. - And
x.equals(null)
should returnfalse
, providedx
is notnull
(which would result in aNullPointerException
anyways).
Aside from these five rules for equals()
, the documentation also mentions: "Note that it is generally necessary to override the hashCode()
method whenever equals()
is overridden, to maintain the general contract for the hashCode()
method, which states that equal objects must have equal hash codes."
hashCode()
So, what does int hashCode()
do?
By default, it returns something that is strongly related to the the memory address of the object as an int
. In general, a hash function maps data of arbitrary size to a fixed size, that is in our case an int
.
In Java, the main purpose is rapid data lookup, typically in a HashSet
or HashMap
, about which we talk more next week. Possible analogies of how a hash code can be used to speed up identification of an object
are a fingerprint of a person or the length of a movie.
To understand this well, realize that a movie has only one length, so if we try to find a movie in a large collection of movies, we can start by looking for movies that have the correct length.
Even though multiple movies could have the same length, we can eliminate many movies already by first looking at this property. As you can see from this example, a hashCode might not be, and does not have to be, unique.
The documentation of the Object class states the following rules for the contract of hashCode
:
hashCode
must return the same integer when called multiple times, if no information used inequals
is modified between the calls.- If
x.equals(y)
returnstrue
, then it must hold thatx.hashCode() == y.hashCode()
. - If
x.equals(y)
returnsfalse
, thenx.hashCode
does not necessarily have to differ fromy.hashCode()
. However, distinct integer results for unequal objects may improve the performance of hash tables.
Equal hashcodes thus are a necessary condition for equality of two objects, but are not a sufficient condition.
For deeper understanding, elaborate for yourself on the fingerprint analogy: if the same human leaves fingerprints in two locations, they will be equal. However, if two sets of fingerprints are equal, they still could be from different humans as there is an extremely small (but non-zero) probability that two different humans have matching fingerprints. Still, a fingerprint match will eliminate many candidates you do not need to consider and investigate in more detail.
Code examples overriding equals() and hashCode()
Adhering to the contracts of equals()
and hashCode()
can be a tricky business. In practice, you should try to avoid doing this by hand. Actually, IntelliJ
or any other good IDE can generate the code for you!
Consider the following example:
public class Pair {
private final int x;
private final int y;
public Pair(int x, int y) {
this.x = x;
this.y = y;
}
public int getFirst() {
return x;
}
public int getSecond() {
return y;
}
}
When using IntelliJ with the IntellJ default template, we see the following code:
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Pair pair = (Pair) o;
if (x != pair.x) return false;
return y == pair.y;
}
@Override
public int hashCode() {
int result = x;
result = 31 * result + y;
return result;
}
As you can see, the equals
method contains a couple of checks for
edge cases, before it starts comparing the instance variables.
The hashCode
method uses a prime number 31
factor to combine
different instance variables, to avoid that the pair 1, 2
will
end up with the same hashcode as the pair 2, 1
.
If we use IntelliJ with the java.util.Objects template, we get slightly different code:
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Pair pair = (Pair) o;
return x == pair.x && y == pair.y;
}
@Override
public int hashCode() {
return Objects.hash(x, y);
}
The equals()
method looks very similar, but for hashCode
we
now see that a helper method from the utility class java.util.Objects
(not to be confused with the cosmic superclass java.lang.Object
!!)
is used to conveniently compute the hashCode.
When using Eclipse, it could generate the following code that is slightly longer, but very similar in functionality to the IntelliJ default template:
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + x;
result = prime * result + y;
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Pair other = (Pair) obj;
if (x != other.x)
return false;
if (y != other.y)
return false;
return true;
}
When to override hashCode and equals
Whether you need to override equals()
and hashCode()
depends on your intended use. If you are likely to have two separate objects with the same values but want to regard them as separate objects under all circumstances, the default implementation is what you want.
On the other hand, if you want to be able to compare separate objects of your class based on their contents, you should override hashCode
and equals
.
For example, if you read and store pairs of numbers from a file and create objects that way and then want to check whether they also exist in a second file.
In most cases where you want to override both methods, it is preferable to let your IDE generate the code, but in some circumstances (such as unordered pairs, i.e. if we do want the pair [1, 2]
to be considered equal to [2, 1]
), you may want to do it yourself, or generate the code automatically and adjust it as needed.
Test your knowledge
In this quiz, you can test your knowledge on the subjects covered in this chapter.
If you would want to compare two String
objects and see if they hold the same word(s), would you use ==
or equals
, and why? You should assume that these String
objects can come from any source (e.g. from user input, from a file, etcetera).
What should you keep in mind when overriding the equals()
method?
Assuming the contract between hashCode()
and equals()
is respected, what does it mean when two objects have the same hash code?
And what if they have different ones?