[ 
https://issues.apache.org/jira/browse/PIG-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen Rachabattuni updated PIG-3620:
--------------------------------------

    Description: 
Here is a simplest example that reproduces the issue-
{code:title=test.pig}
a = LOAD 'foo' AS (x:int, y:chararray);
b = GROUP a BY x;
c = FOREACH b GENERATE a.x;
STORE c INTO 'c';
d = FOREACH b GENERATE a.y;
STORE d INTO 'd';
{code}
If you run {{pig \-x tez_local \-e 'explain \-script test.pig'}}, you will see 
two vertices that contains the following sub-plan- 
{code}
Tez vertex scope-27
# Plan on vertex
b: Local Rearrange[tuple]{int}(false) - scope-10
|   |
|   Project[int][0] - scope-11
|
|---a: New For Each(false,false)[bag] - scope-7
    |   |
    |   Cast[int] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[chararray] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |
    |---a: 
Load(file:///Users/cheolsoop/workspace/pig/foo:org.apache.pig.builtin.PigStorage)
 - scope-0
{code}
What's happening is that since there are 2 stores (and thus 2 data flows, i.e. 
a=>c and a=>d), Pig generates two physical plans. Now TezCompile compiles them 
into a single tez plan but adds the same sub-plan twice.

This is an issue with any blocking operators (join, union, etc) followed by 
split.

  was:
Here is a simplest example that reproduces the issue-
{code:title=test.pig}
a = LOAD 'foo' AS (x:int, y:chararray);
b = GROUP a BY x;
c = FOREACH b GENERATE a.x;
STORE c INTO 'c';
d = FOREACH b GENERATE a.y;
STORE d INTO 'd';
{code}
If you run {{pig \-x tex_local \-e 'explain \-script test.pig'}}, you will see 
two vertices that contains the following sub-plan- 
{code}
Tez vertex scope-27
# Plan on vertex
b: Local Rearrange[tuple]{int}(false) - scope-10
|   |
|   Project[int][0] - scope-11
|
|---a: New For Each(false,false)[bag] - scope-7
    |   |
    |   Cast[int] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[chararray] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |
    |---a: 
Load(file:///Users/cheolsoop/workspace/pig/foo:org.apache.pig.builtin.PigStorage)
 - scope-0
{code}
What's happening is that since there are 2 stores (and thus 2 data flows, i.e. 
a=>c and a=>d), Pig generates two physical plans. Now TezCompile compiles them 
into a single tez plan but adds the same sub-plan twice.

This is an issue with any blocking operators (join, union, etc) followed by 
split.


> TezCompiler adds duplicate predecessors of blocking operators to TezPlan
> ------------------------------------------------------------------------
>
>                 Key: PIG-3620
>                 URL: https://issues.apache.org/jira/browse/PIG-3620
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Cheolsoo Park
>            Assignee: Rohini Palaniswamy
>             Fix For: tez-branch
>
>         Attachments: PIG-3620-1.patch
>
>
> Here is a simplest example that reproduces the issue-
> {code:title=test.pig}
> a = LOAD 'foo' AS (x:int, y:chararray);
> b = GROUP a BY x;
> c = FOREACH b GENERATE a.x;
> STORE c INTO 'c';
> d = FOREACH b GENERATE a.y;
> STORE d INTO 'd';
> {code}
> If you run {{pig \-x tez_local \-e 'explain \-script test.pig'}}, you will 
> see two vertices that contains the following sub-plan- 
> {code}
> Tez vertex scope-27
> # Plan on vertex
> b: Local Rearrange[tuple]{int}(false) - scope-10
> |   |
> |   Project[int][0] - scope-11
> |
> |---a: New For Each(false,false)[bag] - scope-7
>     |   |
>     |   Cast[int] - scope-2
>     |   |
>     |   |---Project[bytearray][0] - scope-1
>     |   |
>     |   Cast[chararray] - scope-5
>     |   |
>     |   |---Project[bytearray][1] - scope-4
>     |
>     |---a: 
> Load(file:///Users/cheolsoop/workspace/pig/foo:org.apache.pig.builtin.PigStorage)
>  - scope-0
> {code}
> What's happening is that since there are 2 stores (and thus 2 data flows, 
> i.e. a=>c and a=>d), Pig generates two physical plans. Now TezCompile 
> compiles them into a single tez plan but adds the same sub-plan twice.
> This is an issue with any blocking operators (join, union, etc) followed by 
> split.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to