Hi Dean,
Thanks for your reply. If I don't set the number of reducers in the 1st run
, the number of reducers will be much smaller and the performance will be
worse. The total output file size is about 200MB, I see that many reduce
output files are empty, only 10 of them have data.
Another questi
What happens if you don't set the number of reducers in the 1st run? How
many reducers are executed. If it's a much smaller number, the extra
overhead could matter. Another clue is the size of the files the first run
produced, i.e., do you have 30 small (much less than a block size) files?
On Sat,
Hi Stephen,
My query is actually more complex , hive will generate 2 mapreduces,
in the first solution , it runs 17 mappers / 30 reducers and 10 mappers /
30 reducers (reducer num is set manually)
in the second solution , it runs 6 mappers / 1 reducer and 4 mappers / 1
reducers for each partition
great question. your parallelization seems to trump hadoop's.I guess
i'd ask what are the _total_ number of Mappers and Reducers that run on
your cluster for these two scenarios? I'd be curious if there are the
same.
On Fri, Jun 28, 2013 at 8:40 AM, Felix.徐 wrote:
> Hi all,
>
> Here is