[
https://issues.apache.org/jira/browse/PIG-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786430#comment-13786430
]
Koji Noguchi commented on PIG-3492:
-----------------------------------
We're only seeing this issue on complicate scripts with hundreds of lines.
This is the shortest I got. This test needs to be called with '-t
PushUpFilter'.
{noformat}
pig> cat test.pig
A = load './test.txt' as (a:int, b:chararray);
B = FOREACH A generate a;
C = GROUP B by a;
D = filter C by group > 0 and group < 100;
E = FOREACH D {
F = LIMIT B 1 ;
GENERATE B.a as mya, FLATTEN(F.a) as setting;
}
G = FOREACH E GENERATE mya, setting as setting;
dump G;
{noformat}
Relation G should contain two columns, 'mya' and 'setting'. But result only
contains 1 column.
{noformat}
pig> cat test.txt
3 i
3 i
1 i
2 i
2 i
3 i
pig> pig -x local -t PushUpFilter ./test.pig
({(1)})
({(2),(2)})
({(3),(3),(3)})
{noformat}
By skipping ColumnMapKeyPrune or SplitFilter, you get a correct result of
{noformat}
pig> pig -x local -t PushUpFilter -t ColumnMapKeyPrune ./test.pig
or
pig> pig -x local -t PushUpFilter -t SplitFilter ./test.pig
({(1)},1)
({(2),(2)},2)
({(3),(3),(3)},3)
{noformat}
Explain would show that second column was cut off.
{noformat}
Incorrect case (-t PushUpFilter)
G: (Name: LOStore Schema:
mya#60:bag{#59:tuple(a#23:int)})ColumnPrune:InputUids=[63,
60]ColumnPrune:OutputUids=[63, 60]
Correct case (-t PushUpFilter -t SplitFilter)
G: (Name: LOStore Schema:
mya#60:bag{#59:tuple(a#23:int)},setting#63:int)ColumnPrune:InputUids=[63,
60]ColumnPrune:OutputUids=[63, 60]
{noformat}
> ColumnPrune dropping used column due to
> LogicalRelationalOperator.fixDuplicateUids changes not propagating
> ----------------------------------------------------------------------------------------------------------
>
> Key: PIG-3492
> URL: https://issues.apache.org/jira/browse/PIG-3492
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11.1, 0.12.1, 0.13.0
> Reporter: Koji Noguchi
>
> I don't have a testcase I can upload at the moment, but here's my observation.
> SplitFilter -> schemaResetter -> LOGenerate.getSchema ->
> LogicalRelationalOperator.fixDuplicateUids() creating a new UID but that UID
> is not propagated to the entire plan (since SplitFilter.reportChanges only
> returns subplan).
> As a result, I am seeing ColumnPruning cutting off those used columns.
--
This message was sent by Atlassian JIRA
(v6.1#6144)