We're updating the issue view to help you get more done. 

Datanode dying and rejoining* can cause infinite loop in HeartbeatManager.heartbeatCheck()

Description

*Investigation did not yet show whether the error appeared after rejoining, or right after dying. However, we could see the datanode retrying datanode registration infinitely as well.

The namenode gets stuck in a loop trying to remove the dead datanode. HeartbeatManager.heartbeatCheck() calls dm.removeDeadDatanode() in case a datanode has not heartbeated within it's interval. It seems the datanode remains in the list after being "removed" by a call to HeartbeatManager.removeDatanode().

Tests need to be written to try duplicate this error.

Status

Assignee

Unassigned

Reporter

August Bonds

Labels

None

Affects versions

2.8.2.1

Priority

High