[ https://issues.apache.org/jira/browse/PIG-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036302#comment-14036302 ]
Cheolsoo Park commented on PIG-4002: ------------------------------------ Thank you for the patch. I agree with the idea that two optimizations should be configurable independently. But I am not sure that removing combiner plan after CombinerOptimizer runs is the right way to fix it. Your CombinerPlanRemover doesn't seem to undo any changes made to the reduce plan by CombinerOptimizer. For example, CombinerPackager isn't unset- {code} // Change the package operator in the reduce plan to // be the POCombiner package, as it needs to act // differently than the regular package operator. pack.setPkgr(pkgr.clone()); {code} Doesn't this have any side effect? I am no expert of this area of code either, so please correct me if I am wrong. Btw, your patch doesn't apply nicely to trunk. Also, please add the Apache header to every new file. > Disable combiner when map-side aggregation is used > -------------------------------------------------- > > Key: PIG-4002 > URL: https://issues.apache.org/jira/browse/PIG-4002 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.13.0 > Reporter: Travis Woodruff > Assignee: Travis Woodruff > Priority: Minor > Attachments: PIG-4002-1.patch > > > This may be controversial, so I'd like others' opinions on this. > It is not currently possible to disable the combiner and use map-side > aggregation at the same time. This is a problematic since map-side > aggregation effectively combines in the mapper, so running the combiner adds > expensive combiner execution (combiner requires deserialization & > reserialization) for little to no value. > PIG-2829 had a patch to disable the combiner when map-side aggregation is > used (along with some other changes). This was never integrated because the > map-side aggregation code was redone while this was in progress. -- This message was sent by Atlassian JIRA (v6.2#6252)