Re: data locality on HDFS

2010-05-08 Thread Eli Collins
Hey Momina, Here's the path on 20: DistributedFileSystem#getFileBlockLocations -> DFSClient#getFileBlockLocations -> callGetBlockLocations -> ClientProtocol#getBlockLocations -> (via proxy) NameNode#getBlockLocations See createNamenode and createRPCNamenode in the DFSClient

Re: data locality on HDFS

2010-05-07 Thread momina khan
hi i am still going in circles i still cant pin point a single function call that interacts with the HDFS for block locations... it is as if files are making circular calls to getBlockLocations() which is implemented such that it calls the same function in a different class ... i mean it is n

Re: data locality on HDFS

2010-05-07 Thread Amogh Vasekar
Hi, The (o.a.h.fs) FileSystem API has GetBlockLocations that is used to determine replicas. In general cases, (o.a.h.mapreduce.lib.input) FileInputFormat's getSplits() calls this method, which is passed on for job scheduling along with the split info. Hope this is what you were looking for. Am

data locality on HDFS

2010-05-07 Thread momina khan
hi, i am trying to figure out how hadoop uses data locality to schedule maps on nodes which locally store tha map input ... going through code i am going in circles in between a couple of file but not really getting anywhere ... that is to say that i cant locate the HDFS API or func that can commu