subject:"Re\: Performance difference between tuning reducer num and partition table"

Re: Performance difference between tuning reducer num and partition table

2013-06-30 Thread Felix . 徐

Hi Dean, Thanks for your reply. If I don't set the number of reducers in the 1st run , the number of reducers will be much smaller and the performance will be worse. The total output file size is about 200MB, I see that many reduce output files are empty, only 10 of them have data. Another questi

Re: Performance difference between tuning reducer num and partition table

2013-06-29 Thread Dean Wampler

What happens if you don't set the number of reducers in the 1st run? How many reducers are executed. If it's a much smaller number, the extra overhead could matter. Another clue is the size of the files the first run produced, i.e., do you have 30 small (much less than a block size) files? On Sat,

Re: Performance difference between tuning reducer num and partition table

2013-06-28 Thread Felix . 徐

Hi Stephen, My query is actually more complex , hive will generate 2 mapreduces, in the first solution , it runs 17 mappers / 30 reducers and 10 mappers / 30 reducers (reducer num is set manually) in the second solution , it runs 6 mappers / 1 reducer and 4 mappers / 1 reducers for each partition

Re: Performance difference between tuning reducer num and partition table

2013-06-28 Thread Stephen Sprague

great question. your parallelization seems to trump hadoop's.I guess i'd ask what are the _total_ number of Mappers and Reducers that run on your cluster for these two scenarios? I'd be curious if there are the same. On Fri, Jun 28, 2013 at 8:40 AM, Felix.徐 wrote: > Hi all, > > Here is