[ 
https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5452:
------------------------------
    Description: 
Follow up from PIG-5201, 
{code:java}
A = load 'input' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as 
a3;
C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
dump C;{code}
This produces right number of nulls.
{code:java}
(a,,,a)
(b,,,b)
(c,,,c)
(d,,,d)
(f,,,f) {code}
 

However, 
{code:java}
A = load 'input.txt' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
dump C;{code}
This produces wrong number of null and the output is shifted incorrectly. 
{code:java}
(a,,a,)
(b,,b,)
(c,,c,)
(d,,d,)
(f,,f,) {code}
Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
tuple() with empty inner fields but with user defined schema of "as 
(A1:chararray, A2:chararray)". 

 

  was:
Follow up from PIG-5201, 
{code:java}
A = load 'input' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as 
a3;
C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
dump C;{code}
This produces right number of nulls.


{code:java}
(a,,,a)
(b,,,b)
(c,,,c)
(d,,,d)
(f,,,f) {code}
 

However, 
{code:java}
A = load 'input.txt' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
dump C;{code}
This produces wrong number of null and the output is shifted incorrectly. 
{code:java}
(a,,a,)
(b,,b,)
(c,,c,)
(d,,d,)
(f,,f,) {code}
Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
tuple() with empty inner fields.

 


> Null handling of FLATTEN with user defined schema (as clause)
> -------------------------------------------------------------
>
>                 Key: PIG-5452
>                 URL: https://issues.apache.org/jira/browse/PIG-5452
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Major
>         Attachments: pig-5452-v01.patch
>
>
> Follow up from PIG-5201, 
> {code:java}
> A = load 'input' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 
> as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
> dump C;{code}
> This produces right number of nulls.
> {code:java}
> (a,,,a)
> (b,,,b)
> (c,,,c)
> (d,,,d)
> (f,,,f) {code}
>  
> However, 
> {code:java}
> A = load 'input.txt' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
> dump C;{code}
> This produces wrong number of null and the output is shifted incorrectly. 
> {code:java}
> (a,,a,)
> (b,,b,)
> (c,,c,)
> (d,,d,)
> (f,,f,) {code}
> Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
> tuple() with empty inner fields but with user defined schema of "as 
> (A1:chararray, A2:chararray)". 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to