Saturday, October 10, 2020

Why does removing an element from a HashMap while iterating cause ConcurrentModificationException exception?

I'm reading about Java Concurrency from the book "OCP Oracle Certified Associate Java SE 8 Programmer II Study Guide  Exam 1Z0-809" by Boyarsky Jeanne, Selikoff Scott. It says that the code snippet below throws ConcurrentModificationException exception because it removed an element from the foodData hashmap while looping over its iterator. 

Map<String, Object> foodData = new HashMap<String, Object>();
foodData.put("penguin", 1);
foodData.put("flamingo", 2);

for(String key: foodData.keySet())
   foodData.remove(key);

It doesn't explain in details how this works, but it tells that the solution to this problem is using ConcurrentHashMap instead of HashMap. I found it interesting so I dug deeper.

At first, I wrote a program to run the code snippet above and I got the exception's stack trace below.

Exception in thread "main" java.util.ConcurrentModificationException
        at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1494)
        at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1517)
        at ConcurrentCollectionTest.main(ConcurrentCollectionTest.java:9)

It shows that the exception was thrown from within the HashIterator.nextNode() method, not the HashMap.remove() method as I thought. And below is the source code.

abstract class HashIterator {

Node<K,V> next;        // next entry to return
Node<K,V> current;     // current entry
int expectedModCount;  // for fast-fail
int index;             // current slot

HashIterator() {
expectedModCount = modCount;
Node<K,V>[] t = table;
current = next = null;
index = 0;
if (t != null && size > 0) { // advance to first entry
do {} while (index < t.length && (next = t[index++]) == null);
}
}

public final boolean hasNext() {
return next != null;
}

final Node<K,V> nextNode() {
Node<K,V>[] t;
Node<K,V> e = next;
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
if (e == null)
throw new NoSuchElementException();
if ((next = (current = e).next) == null && (t = table) != null) {
do {} while (index < t.length && (next = t[index++]) == null);
}
return e;
}
...
}

The exception is thrown if the number of hash map's elements before iterating is different from while iterating (modCount != expectedModCount). This technique is called fail-fast. Java does this to prevent problems such as memory consistency errors. If the element is modified by the current thread while iterating, it should be ok. But, it can be updated by other threads so it can produce unexpected result. 

The collection classes with "Concurrent" prefix, for example ConcurrentHashMap, does not throw the exception when an update (adding or removing element) happens during iteration. This has weaker consistency than fail-fast behavior, but it's thread-safe and provides concurrent read from and write to the collection object (the blocks of code to do update operations are synchronized). The concurrent collection classes should be used when multiple threads modify the collection object outside of synchronized block or method. The concurrent collections is an alternative way to synchronized collections, which require locking on the monitor of the entire collection object, resulting one thread must wait for another even if they're calling different (synchronized) methods. 

The synchronized collections can be created using the methods of Collections class. For example, the synchronized hashmap can be instantiated by invoking the Collections.synchronizedMap() method.

The concurrent collections are thread-safe because all of their methods ensure happens-before relationship between threads as mentioned in this documentation
The methods of all classes in java.util.concurrent and its subpackages extend these guarantees to higher-level synchronization. In particular:
  • Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
  • Actions in a thread prior to the submission of a Runnable to an Executor happen-before its execution begins. Similarly for Callables submitted to an ExecutorService.
  • Actions taken by the asynchronous computation represented by a Future happen-before actions subsequent to the retrieval of the result via Future.get() in another thread.
  • Actions prior to "releasing" synchronizer methods such as Lock.unlockSemaphore.release, and CountDownLatch.countDown happen-before actions subsequent to a successful "acquiring" method such as Lock.lockSemaphore.acquireCondition.await, and CountDownLatch.await on the same synchronizer object in another thread.
  • For each pair of threads that successfully exchange objects via an Exchanger, actions prior to the exchange() in each thread happen-before those subsequent to the corresponding exchange() in another thread.
  • Actions prior to calling CyclicBarrier.await and Phaser.awaitAdvance (as well as its variants) happen-before actions performed by the barrier action, and actions performed by the barrier action happen-before actions subsequent to a successful return from the corresponding await in other threads.

Happens-before relationship means one thread finishes its execution before another thread starts its execution so there is no memory consistency errors (or race condition).

No comments:

Post a Comment