[
https://issues.apache.org/jira/browse/PIG-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202021#comment-16202021
]
Will Oberman commented on PIG-5309:
-----------------------------------
I forgot to mention I did see PIG-3856, but am not sure if it is related.
> Problem with tez + union + replicated join
> ------------------------------------------
>
> Key: PIG-5309
> URL: https://issues.apache.org/jira/browse/PIG-5309
> Project: Pig
> Issue Type: Bug
> Components: tez
> Affects Versions: 0.17.0
> Reporter: Will Oberman
> Priority: Minor
>
> I've been using Pig 0.12.1 for quite some time and am finally upgrading to
> 0.17. One of my existing scripts failed. I have a workaround (SET
> pig.tez.opt.union false), but I thought I'd pass on the problem I observed.
> In stdout:
> {noformat}
> ERROR 2017: Internal error creating job configuration.
> {noformat}
> In the Pig log:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Edge [scope-93 :
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ->
> [scope-83 :
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({
> BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >>
> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >>
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:272)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:404)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:259)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
> at
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
> ... 20 more
> {noformat}
> I played around with a minimum viable test script and can cause this to fail:
> {noformat}
> weblogs = LOAD '/tmp/in/weblogInfo' as (path:chararray,
> queryMap:map[chararray]);
> featureToExtraData = LOAD '/tmp/in/featureToExtraData' as (feature:chararray,
> extraData:chararray);
> oldA = FILTER weblogs BY path == '/A';
> newA = FILTER weblogs BY path == '/somethingElse';
> B = FILTER weblogs BY path == '/B';
> oldAFeatures = FOREACH oldA GENERATE queryMap#'feature1' as feature1,
> queryMap#'feature2' as feature2;
> newAFeatures = FOREACH newA GENERATE queryMap#'different1' as feature1,
> queryMap#'different2' as feature2;
> AFeatures = UNION oldAFeatures, newAFeatures;
> AFeaturesPlusMore = JOIN AFeatures BY feature1 LEFT, featureToExtraData BY
> feature USING 'replicated';
> BFeatures = FOREACH B GENERATE queryMap#'somethingElseEntirely1' as feature1,
> queryMap#'somethingElseEntirely2' as feature2;
> BFeaturesPlusMore = JOIN BFeatures BY feature1 LEFT, featureToExtraData BY
> feature USING 'replicated';
> STORE AFeaturesPlusMore INTO '/tmp/out/1/AFeaturesPlusMore';
> STORE BFeaturesPlusMore INTO '/tmp/out/1/BFeaturesPlusMore';
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)