ChenFolin created HDFS-10472:
--------------------------------
Summary: NameNode Rpc Reader Thread crash, and cluster hang.
Key: HDFS-10472
URL: https://issues.apache.org/jira/browse/HDFS-10472
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs, namenode
Affects Versions: 2.6.4, 2.6.2, 2.7.2, 2.6.0, 2.5.0, 2.8.0
Reporter: ChenFolin
My Cluster hang yesterday .
Becuase the rpc server Reader threads crash. So all rpc request timeout,
include datanode hearbeat &.
We can see , the method doRunLoop just catch InterruptedException and
IOException:
while (running) {
SelectionKey key = null;
try {
// consume as many connections as currently queued to avoid
// unbridled acceptance of connections that starves the select
int size = pendingConnections.size();
for (int i=size; i>0; i--) {
Connection conn = pendingConnections.take();
conn.channel.register(readSelector, SelectionKey.OP_READ, conn);
}
readSelector.select();
Iterator<SelectionKey> iter =
readSelector.selectedKeys().iterator();
while (iter.hasNext()) {
key = iter.next();
iter.remove();
if (key.isValid()) {
if (key.isReadable()) {
doRead(key);
}
}
key = null;
}
} catch (InterruptedException e) {
if (running) { // unexpected -- log it
LOG.info(Thread.currentThread().getName() + " unexpectedly
interrupted", e);
}
} catch (IOException ex) {
LOG.error("Error in Reader", ex);
}
}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]