Hi,

I use this request to remove duplicated entries from a set of input files (I cannot use DISTINCT since some fields can be different)

grp = GROUP alias BY key;
alias = FOREACH grp {
  record = LIMIT  alias 1;
  GENERATE FLATTEN(record) AS ... :
}

It appears that this request always generates 1 reducer (I use 0 as default nb of reducer to let PIG decide) whatever the size of my input data.

Is it a normal behavior ? How can I improve my request time by using several reducers ?

Thanks a lot for your help.


Reply via email to