Amithsha created HDFS-13828: ------------------------------- Summary: DataNode breaching Xceiver Count Key: HDFS-13828 URL: https://issues.apache.org/jira/browse/HDFS-13828 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Amithsha
We were observing the breach of the xceiver count 4096, On a particular set of nodes from 5 - 8 nodes in a 900 nodes cluster. And we stopped the datanode services on those nodes and made to replicate across the cluster. After that also, we observed the same issue on a new set of nodes. Q1: Why on a particular node, and also after decommissioning the node the data should be replicated across the cluster, But why again difference set of node? Assumptions : Reading a particular block/ data on that node might be the cause for this but it should be mitigated after the decommission but not why? So suspected that those MR jobs are triggered from Hive, so the query might be referring to the same block mulitple times in different stages and creating this issue? >From Thread Dump : Thread dump of datanode says that out of 4090+ xceiver threads created on that node nearly 4000+ where belong to the same AppId of multiple mappers with state no operation. Any suggestions on this? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org