Nagaraju, On Wed, Aug 8, 2012 at 10:52 PM, Nagaraju Bingi <nagaraju_bi...@persistent.co.in> wrote: > Hi, > > I'm beginner in Hadoop concepts. I have few basic questions: > 1) looking for APIs to retrieve the capacity of the cluster. so that i can > write a script to when to add a new slave node to the cluster > > a) No.of Task trackers and capacity of each task tracker to > spawn max No.of Mappers
For this, see: http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/mapred/ClusterStatus.html > b) CPU,RAM and disk capacity of each tracker Rely on other tools to provide this one. Tools such as Ganglia and Nagios can report this, for instance. > c) how to decide to add a new slave node to the cluster This is highly dependent on the workload that is required out of your clusters. > 2) what is the API to retrieve metrics like current usage of resources and > currently running/spawned Mappers/Reducers See 1.a. for some, and 1.b for some more. > 3) what is the purpose of Hadoop-common?Is it API to interact with hadoop Hadoop Common encapsulates the utilities shared by both of the other sub-projects - MapReduce and HDFS. Among other things, it does provide a general interaction API for all things 'Hadoop' -- Harsh J