I was going through the tutorial here. http://hadoop.apache.org/core/docs/current/cluster_setup.html
Certain things are not clear. I am asking them point-wise. I have a setup of 4 linux machines. 1 name node, 1 job tracker and 2 slaves (each is data node as well as task tracker). 1. Should I edit conf/slaves on all nodes or only on name node? Do I have to edit this in job tracker too? 2. What does the 'bin/hadoop namenode -format' actually do? I want to know in the OS level. Does it create some temporary folders in all the slave-data-nodes which will be collectively interpreted as HDFS by the Hadoop framework? 3. Does the 'bin/hadoop namenode -format' command affect name node, job tracker and task tracker nodes (assuming there is a slave which is only a task tracker and not a data node)? 4. If I add one more slave (datanode + task tracker) later to the cluster, what are the changes I need to do apart from adding the IP address of the slave node to conf/slaves? Do I need to restart any service? 5. When I add a new slave to the cluster later, do I need to run the namenode -format command again? If I have to, how do I ensure that existing data is not lost. If I don't have to, how will the folders necessary for HDFS be created in the new slave machine?
