xuzq created HDFS-12687:
---------------------------
Summary: Client has recovered DN will not be removed from the
“filed”
Key: HDFS-12687
URL: https://issues.apache.org/jira/browse/HDFS-12687
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs
Affects Versions: 2.8.1
Reporter: xuzq
When client writing pipeline, such as Client=>DN1=>DN2=DN3.At one point, DN2
crashed, client will execute the recovery process. The error DN2 will be added
into "failed". Client will apply a new DN from NN with "failed" and replace the
DN2 in the pipeline, eg: Client=>DN1=>DN4=>DN3.
This Client running....
After a long time, client is still writing data for the file. Of course, there
are many pipelines. eg. Client => DN-1 => DN-2 => DN-3.
When DN-2 crashed, error DN-2 will be added into "failed", client will execute
the recovery process as before. It will get a new DN from NN with the "failed",
and {color:red}NN will select one DN from all DNs exclude "failed", even if
DN-2 has restarted{color}.
Questions:
Why not remove DN2(started) from "failed"??
Why is the collection of error nodes in the recovery process Shared with the
get next Block.such as
private final List<DatanodeInfo> failed = new ArrayList<>();
private final LoadingCache<DatanodeInfo, DatanodeInfo> excludedNodes;
As Before, when DN2 crashed, client will recovery the pipeline after
timeout(default worst need 490s). When the client finished writing this block
and apply the next block, NN maybe return the block which contains the error
data node 'DN2'. When client will create a new pipeline for the new block,
{color:red}client will has to go through a connection timeout{color}(default
need 60s).
If "failed" and "excludedNodes" is one collection, it will avoid the connection
timeout. Because "excludedNodes" is dynamically deleted, it also avoid the
first problem.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]