[ 
https://issues.apache.org/jira/browse/PIG-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799424#comment-13799424
 ] 

Koji Noguchi commented on PIG-3492:
-----------------------------------

Daniel asked me to summarize the changes.  Here you go.

(1) Move DuplicateForEachColumnRewrite and ImplicitSplitInserter from 
LogicalPlanOptimizer to PigServer.compile().
    Reason: Next visitor, TypeCheckingRelVisitor, was calling 
resetSchema/getSchema and fields with duplicate UIDs were getting incorrect 
aliases due to that.  (Execution was fine since LogicalPlanOptimizer.optimize() 
was eventually called, but not for 'describe'.  Also, even temporary, it's not 
good to have incorrect aliases assigned to LogicalOperators.

(2) Fix the test cases that started failing due to (1).
    (2-1) LineageTrimmingVisitor (used in illustrate) was hitting with 
NullPointerException since LOSplitOutput was missing in the code.
    (2-2) TestOptimizeLimit failed due to changed logicalplan causing typecast 
error.  Fixed.
    (2-3) Bunch of TestOptimizeLimit tests failed due to new logicalplan 
including LOSplit and LOSplitOUtput.

(3) Rolling back changes in LogicalPlanBuilder from pig-2710 since 
ProjectStarExpander/ProjStarInUdfExpander/ProjStarInUdfExpander used to be 
called only for the corresponding LogicalOperator but the change started 
calling them for the entire plan each time.  This change itself fixed (5) and 
brought back the 0.10 behavior.

(4) Revert PIG-3020. Take out LogicalRelationalOperator.fixDuplicateUids from 
LOJoin.  Fixed in PIG-3310.

(5) Revert PIG-3144. Take out LogicalRelationalOperator.fixDuplicateUids from 
LOGenerate.  Fixed by (1) and (3).

> ColumnPrune dropping used column due to 
> LogicalRelationalOperator.fixDuplicateUids changes not propagating
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3492
>                 URL: https://issues.apache.org/jira/browse/PIG-3492
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11.1, 0.12.1, 0.13.0
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>         Attachments: pig-3492-trunk_04.patch, pig-3492-v0.12_01.patch
>
>
> I don't have a testcase I can upload at the moment, but here's my observation.
> SplitFilter -> schemaResetter -> LOGenerate.getSchema -> 
> LogicalRelationalOperator.fixDuplicateUids() creating a new UID but that UID 
> is not propagated to the entire plan (since SplitFilter.reportChanges only 
> returns subplan).
> As a result, I am seeing ColumnPruning cutting off those used columns.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to