Re: data locality on HDFS

2010-05-07 Thread Amogh Vasekar
Hi, The (o.a.h.fs) FileSystem API has GetBlockLocations that is used to determine replicas. In general cases, (o.a.h.mapreduce.lib.input) FileInputFormat's getSplits() calls this method, which is passed on for job scheduling along with the split info. Hope this is what you were looking for. Am

Re: Checking out Hadoop source code in Eclipse

2009-11-03 Thread Amogh Vasekar
Hi, This was very helpful to me : http://www.cloudera.com/blog/2009/04/20/configuring-eclipse-for-hadoop-development-a-screencast/ I have similar set up as yours and got through without any issues. Amogh On 11/3/09 11:58 PM, "smarthr...@yahoo.co.in" wrote: Hi . I am trying to work on building

RE: last map task taking too long

2009-09-29 Thread Amogh Vasekar
Hi, Can you provide info on the input like compression etc? Also, are you using cached files in your map tasks? It might be helpful if you paste the logs here after blanking your system specific info., as then one can find out where till the reduce it went or if the copy phase started at all. T

RE: why reduce task can be scheduled before map tasks are 100% completed?

2009-08-02 Thread Amogh Vasekar
And the combiner runs while fetching the outputs right? -Original Message- From: Arun C Murthy [mailto:a...@yahoo-inc.com] Sent: Monday, August 03, 2009 9:27 AM To: mapreduce-...@hadoop.apache.org Cc: common-dev@hadoop.apache.org Subject: Re: why reduce task can be scheduled before map ta