Amithsha created HDFS-13828:
-------------------------------

             Summary: DataNode breaching Xceiver Count
                 Key: HDFS-13828
                 URL: https://issues.apache.org/jira/browse/HDFS-13828
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.7.1
            Reporter: Amithsha


We were observing the breach of the xceiver count 4096, On a particular set of 
nodes from 5 - 8 nodes in a 900 nodes cluster.
And we stopped the datanode services on those nodes and made to replicate 
across the cluster. After that also, we observed the same issue on a new set of 
nodes.

Q1: Why on a particular node, and also after decommissioning the node the data 
should be replicated across the cluster, But why again difference set of node?

Assumptions :
Reading a particular block/ data on that node might be the cause for this but 
it should be mitigated after the decommission but not why? So suspected that 
those MR jobs are triggered from Hive, so the query might be referring to the 
same block mulitple times  in different stages and creating this issue?

>From Thread Dump :

Thread dump of datanode says that out of 4090+ xceiver threads created on that 
node nearly 4000+ where belong to the same AppId of multiple mappers with state 
no operation.

 

Any suggestions on this?

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to