Ok, let me state what I think happens (from looking at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage),
and I'd be happy if someone could confirm or correct me.
It looks like no matter how many bags there are, if the accumulator is used the
same amount of tuples are transferred for each bag, i.e., the first
pig.accumulative.batchsize tuples, then the next, until all the bags are
exhausted, and then getValue() will be called.
Is this right?
On Friday, February 26, 2016 10:47 PM, Eyal Allweil
<[email protected]> wrote:
I asked this question on Stack Overflow, but this is a better place to ask.
What happens when a tuple with more than one bag gets sent to a UDF that
implements Accumulator? (and the accumulator should be used) Does this mean
that the first bag gets sent in batches, but subsequent bags are sent in their
entirety? Or all the bags get sent in batches? Or the accumulator isn't used?
Here's a link to the question there:
http://stackoverflow.com/questions/35610426/how-does-pig-handle-tuples-with-more-than-one-bag-when-using-the-accumulator
Thanks,Eyal