Re: Nb of reduce tasks when GROUPing

Vincent Barat Tue, 21 May 2013 09:17:26 -0700

Seems interesting : where can I find it (cannot see this in the documentation) ?

Le 20/05/13 00:38, Jonathan Coveney a écrit :

Also, look into the TOP udf instead of doing the limit. It can potentially
be a lot faster and is cleaner, IMHO.



2013/5/19 Norbert Burger <[email protected]>

Take a look at the PARALLEL clause:

http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause

On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <[email protected]

wrote:

Hi,

I use this request to remove duplicated entries from a set of input files
(I cannot use DISTINCT since some fields can be different)

grp = GROUP alias BY key;
alias = FOREACH grp {
  record = LIMIT  alias 1;
  GENERATE FLATTEN(record) AS ... :
}

It appears that this request always generates 1 reducer (I use 0 as
default nb of reducer to let PIG decide) whatever the size of my input

data.

Is it a normal behavior ? How can I improve my request time by using
several reducers ?

Thanks a lot for your help.

Vincent BARAT
CTO, Capptain

p. +33 299 656 913

m. +33 615 411 518

e. [email protected]

w. http://www.capptain.com/

a. 18 rue Tronchet, 75008 Paris, France

IMPORTANT: The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email by mistake, please notify the sender immediately and do not disclose the contents to anyone or make copies thereof.

Re: Nb of reduce tasks when GROUPing

Reply via email to