Daryn Sharp created HDFS-12645: ---------------------------------- Summary: FSDatasetImpl lock will stall BP service actors and may cause missing blocks Key: HDFS-12645 URL: https://issues.apache.org/jira/browse/HDFS-12645 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.8.0 Reporter: Daryn Sharp
The DN is extremely susceptible to a slow volume due bad locking practices. DN operations require a fs dataset lock. IO in the dataset lock should not be permissible as it leads to severe performance degradation and possibly (temporarily) missing blocks. A slow disk will cause pipelines to experience significant latency and timeouts, increasing lock/io contention while cleaning up, leading to more timeouts, etc. Meanwhile, the actor service thread is interleaving multiple lock acquire/releases with xceivers. If many commands are issued, the node may be incorrectly declared as dead. HDFS-12639 documents that both actors synchronize on the offer service lock while processing commands. A backlogged active actor will block the standby actor and cause it to go dead too. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org