Uploaded image for project: 'hops'
  1. HOPS-146

Datanode dying and rejoining* can cause infinite loop in HeartbeatManager.heartbeatCheck()

    Details

    • Type: Bug
    • Status: To Do (View workflow)
    • Priority: High
    • Resolution: Unresolved
    • Affects versions: 2.8.2.1
    • Fix versions: None
    • Labels:
      None
    • Sprint:

      Description

      *Investigation did not yet show whether the error appeared after rejoining, or right after dying. However, we could see the datanode retrying datanode registration infinitely as well.

      The namenode gets stuck in a loop trying to remove the dead datanode. HeartbeatManager.heartbeatCheck() calls dm.removeDeadDatanode() in case a datanode has not heartbeated within it's interval. It seems the datanode remains in the list after being "removed" by a call to HeartbeatManager.removeDatanode().

      Tests need to be written to try duplicate this error.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              aganom August Bonds
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: