I was going through the tutorial here.

http://hadoop.apache.org/core/docs/current/cluster_setup.html

Certain things are not clear. I am asking them point-wise. I have a
setup of 4 linux machines. 1 name node, 1 job tracker and 2 slaves
(each is data node as well as task tracker).

1. Should I edit conf/slaves on all nodes or only on name node? Do I
have to edit this in job tracker too?

2. What does the 'bin/hadoop namenode -format' actually do? I want to
know in the OS level. Does it create some temporary folders in all the
slave-data-nodes which will be collectively interpreted as HDFS by the
Hadoop framework?

3. Does the 'bin/hadoop namenode -format' command affect name node,
job tracker and task tracker nodes (assuming there is a slave which is
only a task tracker and not a data node)?

4. If I add one more slave (datanode + task tracker) later to the
cluster, what are the changes I need to do apart from adding the IP
address of the slave node to conf/slaves? Do I need to restart any
service?

5. When I add a new slave to the cluster later, do I need to run the
namenode -format command again? If I have to, how do I ensure that
existing data is not lost. If I don't have to, how will the folders
necessary for HDFS be created in the new slave machine?

Reply via email to