Re: Nb of reduce tasks when GROUPing

Vincent Barat Tue, 21 May 2013 12:44:33 -0700

You are right: I actually already set these 2 parameters, and theyusually work well (at least when the loader used is BigStorage).Nevertheless, when my own loader is used, despite the fact that datato be reduced is several gigabytes, it ends up by using only 1reducer, and this is what I don't understand (and of coursepig.exec.reducers.bytes.per.reducer is set to 1GB) .

Is there something special a loader must do for this automatic guessof the number of reducer to use work correctly ?


Le 21/05/13 19:23, Norbert Burger a écrit :

As Jonathan mentioned, TOP should obviate this particular use case.  But
for future examples, the parameters
pig.exec.reducers.bytes.per.reducer and pig.exec.reducers.max
might be useful:

https://issues.apache.org/jira/browse/PIG-1249

Norbert

On Tue, May 21, 2013 at 9:23 AM, Vincent Barat <[email protected]>wrote:

Thanks for your reply.

My goal is actually to AVOID using PARALLEL toi let PIG guess a good
number of reducer by itself.
Usually it works well for me, so I don't understadn why in that case it
does not.

Le 19/05/13 15:37, Norbert Burger a écrit :

  Take a look at the PARALLEL clause:

http://pig.apache.org/docs/r0.**7.0/cookbook.html#Use+the+**
PARALLEL+Clause<http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause>

On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <[email protected]>
**wrote:

  Hi,

I use this request to remove duplicated entries from a set of input files
(I cannot use DISTINCT since some fields can be different)

grp = GROUP alias BY key;
alias = FOREACH grp {
    record = LIMIT  alias 1;
    GENERATE FLATTEN(record) AS ... :
}

It appears that this request always generates 1 reducer (I use 0 as
default nb of reducer to let PIG decide) whatever the size of my input
data.

Is it a normal behavior ? How can I improve my request time by using
several reducers ?

Thanks a lot for your help.

Re: Nb of reduce tasks when GROUPing

Reply via email to