[
https://issues.apache.org/jira/browse/PIG-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799424#comment-13799424
]
Koji Noguchi commented on PIG-3492:
-----------------------------------
Daniel asked me to summarize the changes. Here you go.
(1) Move DuplicateForEachColumnRewrite and ImplicitSplitInserter from
LogicalPlanOptimizer to PigServer.compile().
Reason: Next visitor, TypeCheckingRelVisitor, was calling
resetSchema/getSchema and fields with duplicate UIDs were getting incorrect
aliases due to that. (Execution was fine since LogicalPlanOptimizer.optimize()
was eventually called, but not for 'describe'. Also, even temporary, it's not
good to have incorrect aliases assigned to LogicalOperators.
(2) Fix the test cases that started failing due to (1).
(2-1) LineageTrimmingVisitor (used in illustrate) was hitting with
NullPointerException since LOSplitOutput was missing in the code.
(2-2) TestOptimizeLimit failed due to changed logicalplan causing typecast
error. Fixed.
(2-3) Bunch of TestOptimizeLimit tests failed due to new logicalplan
including LOSplit and LOSplitOUtput.
(3) Rolling back changes in LogicalPlanBuilder from pig-2710 since
ProjectStarExpander/ProjStarInUdfExpander/ProjStarInUdfExpander used to be
called only for the corresponding LogicalOperator but the change started
calling them for the entire plan each time. This change itself fixed (5) and
brought back the 0.10 behavior.
(4) Revert PIG-3020. Take out LogicalRelationalOperator.fixDuplicateUids from
LOJoin. Fixed in PIG-3310.
(5) Revert PIG-3144. Take out LogicalRelationalOperator.fixDuplicateUids from
LOGenerate. Fixed by (1) and (3).
> ColumnPrune dropping used column due to
> LogicalRelationalOperator.fixDuplicateUids changes not propagating
> ----------------------------------------------------------------------------------------------------------
>
> Key: PIG-3492
> URL: https://issues.apache.org/jira/browse/PIG-3492
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11.1, 0.12.1, 0.13.0
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Attachments: pig-3492-trunk_04.patch, pig-3492-v0.12_01.patch
>
>
> I don't have a testcase I can upload at the moment, but here's my observation.
> SplitFilter -> schemaResetter -> LOGenerate.getSchema ->
> LogicalRelationalOperator.fixDuplicateUids() creating a new UID but that UID
> is not propagated to the entire plan (since SplitFilter.reportChanges only
> returns subplan).
> As a result, I am seeing ColumnPruning cutting off those used columns.
--
This message was sent by Atlassian JIRA
(v6.1#6144)