Cheolsoo Park created PIG-3620:
----------------------------------
Summary: TezCompiler adds duplicate predecessors of blocking
operators to TezPlan
Key: PIG-3620
URL: https://issues.apache.org/jira/browse/PIG-3620
Project: Pig
Issue Type: Sub-task
Components: tez
Affects Versions: tez-branch
Reporter: Cheolsoo Park
Fix For: tez-branch
Here is a simplest example that reproduces the issue-
{code:title=test.pig}
a = LOAD 'foo' AS (x:int, y:chararray);
b = GROUP a BY x;
c = FOREACH b GENERATE a.x;
STORE c INTO 'c';
d = FOREACH b GENERATE a.y;
STORE d INTO 'd';
{code}
If you run {{pig \-x tex_local \-e 'explain \-script test.pig'}}, you will see
two vertices that contains the following sub-plan-
{code}
Tez vertex scope-27
# Plan on vertex
b: Local Rearrange[tuple]{int}(false) - scope-10
| |
| Project[int][0] - scope-11
|
|---a: New For Each(false,false)[bag] - scope-7
| |
| Cast[int] - scope-2
| |
| |---Project[bytearray][0] - scope-1
| |
| Cast[chararray] - scope-5
| |
| |---Project[bytearray][1] - scope-4
|
|---a:
Load(file:///Users/cheolsoop/workspace/pig/foo:org.apache.pig.builtin.PigStorage)
- scope-0
{code}
What's happening is that since there are 2 stores (and thus 2 data flows, i.e.
a=>c and a=>d), Pig generates two physical plans. Now TezCompile compiles them
into a single tez plan but adds the same sub-plan twice.
This is an issue with any blocking operators (join, union, etc) followed by
split.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)