Daryn Sharp created HDFS-12645:
----------------------------------

             Summary: FSDatasetImpl lock will stall BP service actors and may 
cause missing blocks
                 Key: HDFS-12645
                 URL: https://issues.apache.org/jira/browse/HDFS-12645
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.8.0
            Reporter: Daryn Sharp


The DN is extremely susceptible to a slow volume due bad locking practices.  DN 
operations require a fs dataset lock.  IO in the dataset lock should not be 
permissible as it leads to severe performance degradation and possibly 
(temporarily) missing blocks.

A slow disk will cause pipelines to experience significant latency and 
timeouts, increasing lock/io contention while cleaning up, leading to more 
timeouts, etc.  Meanwhile, the actor service thread is interleaving multiple 
lock acquire/releases with xceivers.  If many commands are issued, the node may 
be incorrectly declared as dead.

HDFS-12639 documents that both actors synchronize on the offer service lock 
while processing commands.  A backlogged active actor will block the standby 
actor and cause it to go dead too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to