Re: DFS replication and Error Recovery on failure

Konstantin Shvachko Mon, 29 Dec 2008 11:37:22 -0800

1) If i set value of dfs.replication to 3 only in hadoop-site.xml of
namenode(master) and
then restart the cluster will this take effect. or  i have to change
hadoop-site.xml at all slaves ?


dfs.replication is the name-node parameter, so you need to restart
only the name-node in order to reset the value.
I should mention that setting new value will not immediately change
replication of the existing blocks, because replication is per file,
and you need to use setReplication to change it.
Although for new files the replication will be set to the new value
automatically.

2)
What can be possible cause of following error at a datanode. ?
ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: Incompatible
namespaceIDs in
/mnt/hadoop28/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-hadoop/dfs/data:
namenode namespaceID = 1396640905; datanode namespaceID = 820259954


namespaceID provides cluster integrity. name- and data-nodes share the same 
value.
This either means you ran the data-nodes with another name-node,
or you reformatted the name-node recently.
It is better to have a dedicated directory for data-node storage rather
than use "tmp".

If my data node goes down due to above error, what should i do in
following scenarios
1) i have some data on the currupted data node that i need to recover,
how can i recover that data ?


You should make sure first which cluster it belongs to.

2) If i dont care about the data, but i want the node back on the
cluster, can i just delete the /mnt/hadoop28/HADOOP/hadoop-0.16.3/tmp
and include the node back in the cluster?


Yes you can remove the directory if you dont need the data.

Thanks,
--Konstantin

Re: DFS replication and Error Recovery on failure

Reply via email to