[ https://issues.apache.org/jira/browse/HDFS-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz Wo Nicholas Sze resolved HDFS-108. -------------------------------------- Resolution: Not a Problem I guess that this is not a problem anymore. Please feel free to reopen this if I am wrong. Resolving ... > File write fails after data node goes down > ------------------------------------------ > > Key: HDFS-108 > URL: https://issues.apache.org/jira/browse/HDFS-108 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Alban Chevignard > Attachments: failed_write.patch > > > If a data node goes down while a file is being written do HDFS, the write > fails with the following errors: > {noformat} > 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream > java.io.IOException: > Bad connect ack with firstBadLink 192.168.0.66:50010 > 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block > blk_-6792221430152215651_1003 > 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream > java.io.IOException: > Bad connect ack with firstBadLink 192.168.0.66:50010 > 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block > blk_-1056044503329698571_1003 > 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream > java.io.IOException: > Bad connect ack with firstBadLink 192.168.0.66:50010 > 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block > blk_-1144491637577072681_1003 > 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream > java.io.IOException: > Bad connect ack with firstBadLink 192.168.0.66:50010 > 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block > blk_6574618270268421892_1003 > 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: > java.io.IOException: > Unable to create new block. > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387) > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746) > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924) > 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block > blk_6574618270268421892_1003 bad datanode[1] > {noformat} > The tests were done with the following configuration: > * Hadoop version 0.18.3 > * 3 data nodes with replication count of 2 > * 1 GB file write > * 1 data node taken down during write > This issue seems to be caused by the fact that there is a delay between the > time a data node goes down and the time it is marked as dead by the name > node. This delay is unavoidable, but the name node should not keep allocating > new blocks to data nodes that are known to be down by the client. Even by > adjusting {{heartbeat.recheck.interval}}, there is still a window during > which this issue can occur. > One possible fix would be to allow clients to exclude known bad data nodes > when allocating new blocks. See {{failed_write.patch}} for an example. -- This message was sent by Atlassian JIRA (v6.2#6252)