Re: Question about how Hadoop stores intermediate results

2011-09-25 Thread Arun C Murthy
On Sep 25, 2011, at 2:01 PM, He Chen wrote: > Hi Arun and Harsh J > > Thank you for your replies. > > Yes, there will be two finally. But during the map running, there are more > than two. > > The scenario I mentioned before will not occur with the Hadoop default > partitioner. If there is a p

Re: Question about how Hadoop stores intermediate results

2011-09-25 Thread He Chen
Hi Arun and Harsh J Thank you for your replies. Yes, there will be two finally. But during the map running, there are more than two. The scenario I mentioned before will not occur with the Hadoop default partitioner. If there is a partitioner lead to above problem. Is there any security policy p

Re: Question about how Hadoop stores intermediate results

2011-09-25 Thread Arun C Murthy
There is only one file per-map. Actually two, an output file and an index file to quickly get the offset/length for a given reducer. The index file is also cached in memory for performance. Arun On Sep 25, 2011, at 10:00 AM, He Chen wrote: > Hi everyone > > According to my understanding of Ha

Re: Question about how Hadoop stores intermediate results

2011-09-25 Thread Harsh J
Chen, Files are stored based on the reducer partitions, not exactly per-key. The result is that there are far lesser files than you imagine there ought to be. The keys are kept sorted inside the partitioned files and thus you do not lose out on your key groups either. See Partitioner, which is re

Question about how Hadoop stores intermediate results

2011-09-25 Thread He Chen
Hi everyone According to my understanding of Hadoop, it save MapReduce job's intermediate results into files in the mapper's hard drive. Each key will occupy a file. I am curious what will happen if mapper's hard drive does not have enough inodes to save the generated keys. Because every file ne