Nathan Roberts created HDFS-11755: ------------------------------------- Summary: Underconstruction blocks can be considered missing Key: HDFS-11755 URL: https://issues.apache.org/jira/browse/HDFS-11755 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0-alpha2, 2.8.1 Reporter: Nathan Roberts Assignee: Nathan Roberts
Following sequence of events can lead to a block underconstruction being considered missing. - pipeline of 3 DNs, DN1->DN2->DN3 - DN3 has a failing disk so some updates take a long time - Client writes entire block and is waiting for final ack - DN1, DN2 and DN3 have all received the block - DN1 is waiting for ACK from DN2 who is waiting for ACK from DN3 - DN3 is having trouble finalizing the block due to the failing drive. It does eventually succeed but it is VERY slow at doing so. - DN2 times out waiting for DN3 and tears down its pieces of the pipeline, so DN1 notices and does the same. Neither DN1 nor DN2 finalized the block. - DN3 finally sends an IBR to the NN indicating the block has been received. - Drive containing the block on DN3 fails enough that the DN takes it offline and notifies NN of failed volume - NN removes DN3's replica from the triplets and then declares the block missing because there are no other replicas Seems like we shouldn't consider uncompleted blocks for replication. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org