WebHDFS connection hungs when RM is colocated with DN

Description

There is a bug in ServiceJWTManager which leads to a deadlock if a ResourceManager is colocated with a DataNode.

During stop of ServiceJWTManager it waits for the renewer thread to finish gracefully. If the RM is already running, the DN will detect it and tries to change permissions of the file lock. This operation is not permitted as the file is owned by the user running the RM. It will throw an exception and it will trigger the shutdown of ServiceJWTManager on the DN side.

The shutdown will wait for the renewer thread to finish, but the thread was never created because of the aforementioned exception. The WebHDFS connection will hung until it times out.

Assignee

Antonios Kouzoupis

Reporter

Antonios Kouzoupis

Labels

None

Fix versions

Affects versions

Priority

High
Configure