[ 
https://issues.apache.org/jira/browse/HDFS-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-108.
--------------------------------------

    Resolution: Not a Problem

I guess that this is not a problem anymore. Please feel free to reopen this if 
I am wrong. Resolving ...

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HDFS-108
>                 URL: https://issues.apache.org/jira/browse/HDFS-108
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write 
> fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block 
> blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block 
> blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block 
> blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block 
> blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: 
> java.io.IOException:
> Unable to create new block.
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block 
> blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the 
> time a data node goes down and the time it is marked as dead by the name 
> node. This delay is unavoidable, but the name node should not keep allocating 
> new blocks to data nodes that are known to be down by the client. Even by 
> adjusting {{heartbeat.recheck.interval}}, there is still a window during 
> which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes 
> when allocating new blocks. See {{failed_write.patch}} for an example.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to