RPC server gets stuck in TLS handshake protocol and becomes unresponsive


The listener thread of RPC server performs the TLS handshake protocol upon accepted a connection. Under high load the loop performing the TLS handshake might never exit.

Normally the handshake will continue until it finishes or until the server has reached the EOF from the underlying TCP socket. In high load there is a case where the connection has been dropped but the socket is still connected. Calling read on that socket will not return EOF but 0 (zero), and the handshake loop is spinning without exiting. As a consequence all incoming requests will be blocked behind this problematic connection.

After doing a thread dump, the thread in question (IPC Server Listener) is stuck at:

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

  • locked <0x00000001b2fb4e98> (a java.lang.Object)
    at org.apache.hadoop.ipc.RpcSSLEngineAbstr.doHandshake(RpcSSLEngineAbstr.java:95)
    at org.apache.hadoop.ipc.Server$Connection.doHandshake(Server.java:1660)
    at org.apache.hadoop.ipc.Server$Listener.doAccept(Server.java:1137)
    at org.apache.hadoop.ipc.Server$Listener.run(Server.java:1049)


Antonios Kouzoupis


Antonios Kouzoupis



Fix versions

Affects versions