Re: data locality on HDFS

2010-05-08 Thread Eli Collins
Hey Momina, Here's the path on 20: DistributedFileSystem#getFileBlockLocations -> DFSClient#getFileBlockLocations -> callGetBlockLocations -> ClientProtocol#getBlockLocations -> (via proxy) NameNode#getBlockLocations See createNamenode and createRPCNamenode in the DFSClient

Re: data locality on HDFS

2010-05-07 Thread momina khan
hi i am still going in circles i still cant pin point a single function call that interacts with the HDFS for block locations... it is as if files are making circular calls to getBlockLocations() which is implemented such that it calls the same function in a different class ... i mean it is n

Re: data locality on HDFS

2010-05-07 Thread Amogh Vasekar
Hi, The (o.a.h.fs) FileSystem API has GetBlockLocations that is used to determine replicas. In general cases, (o.a.h.mapreduce.lib.input) FileInputFormat's getSplits() calls this method, which is passed on for job scheduling along with the split info. Hope this is what you were looking for. Am