ChenFolin created HDFS-10472: -------------------------------- Summary: NameNode Rpc Reader Thread crash, and cluster hang. Key: HDFS-10472 URL: https://issues.apache.org/jira/browse/HDFS-10472 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Affects Versions: 2.6.4, 2.6.2, 2.7.2, 2.6.0, 2.5.0, 2.8.0 Reporter: ChenFolin
My Cluster hang yesterday . Becuase the rpc server Reader threads crash. So all rpc request timeout, include datanode hearbeat &. We can see , the method doRunLoop just catch InterruptedException and IOException: while (running) { SelectionKey key = null; try { // consume as many connections as currently queued to avoid // unbridled acceptance of connections that starves the select int size = pendingConnections.size(); for (int i=size; i>0; i--) { Connection conn = pendingConnections.take(); conn.channel.register(readSelector, SelectionKey.OP_READ, conn); } readSelector.select(); Iterator<SelectionKey> iter = readSelector.selectedKeys().iterator(); while (iter.hasNext()) { key = iter.next(); iter.remove(); if (key.isValid()) { if (key.isReadable()) { doRead(key); } } key = null; } } catch (InterruptedException e) { if (running) { // unexpected -- log it LOG.info(Thread.currentThread().getName() + " unexpectedly interrupted", e); } } catch (IOException ex) { LOG.error("Error in Reader", ex); } } -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org