Thanks for your reply.
My goal is actually to AVOID using PARALLEL toi let PIG guess a good
number of reducer by itself.
Usually it works well for me, so I don't understadn why in that case
it does not.
Le 19/05/13 15:37, Norbert Burger a écrit :
Take a look at the PARALLEL clause:
http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause
On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <[email protected]>wrote:
Hi,
I use this request to remove duplicated entries from a set of input files
(I cannot use DISTINCT since some fields can be different)
grp = GROUP alias BY key;
alias = FOREACH grp {
record = LIMIT alias 1;
GENERATE FLATTEN(record) AS ... :
}
It appears that this request always generates 1 reducer (I use 0 as
default nb of reducer to let PIG decide) whatever the size of my input data.
Is it a normal behavior ? How can I improve my request time by using
several reducers ?
Thanks a lot for your help.