[ https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated PIG-5452: ------------------------------ Description: Follow up from PIG-5201, {code:java} A = load 'input' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2), a3; dump C;{code} This produces right number of nulls. {code:java} (a,,,a) (b,,,b) (c,,,c) (d,,,d) (f,,,f) {code} However, {code:java} A = load 'input.txt' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; dump C;{code} This produces wrong number of null and the output is shifted incorrectly. {code:java} (a,,a,) (b,,b,) (c,,c,) (d,,d,) (f,,f,) {code} Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of tuple() with empty inner fields but with user defined schema of "as (A1:chararray, A2:chararray)". was: Follow up from PIG-5201, {code:java} A = load 'input' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2), a3; dump C;{code} This produces right number of nulls. {code:java} (a,,,a) (b,,,b) (c,,,c) (d,,,d) (f,,,f) {code} However, {code:java} A = load 'input.txt' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; dump C;{code} This produces wrong number of null and the output is shifted incorrectly. {code:java} (a,,a,) (b,,b,) (c,,c,) (d,,d,) (f,,f,) {code} Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of tuple() with empty inner fields. > Null handling of FLATTEN with user defined schema (as clause) > ------------------------------------------------------------- > > Key: PIG-5452 > URL: https://issues.apache.org/jira/browse/PIG-5452 > Project: Pig > Issue Type: Bug > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Priority: Major > Attachments: pig-5452-v01.patch > > > Follow up from PIG-5201, > {code:java} > A = load 'input' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 > as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2), a3; > dump C;{code} > This produces right number of nulls. > {code:java} > (a,,,a) > (b,,,b) > (c,,,c) > (d,,,d) > (f,,,f) {code} > > However, > {code:java} > A = load 'input.txt' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; > dump C;{code} > This produces wrong number of null and the output is shifted incorrectly. > {code:java} > (a,,a,) > (b,,b,) > (c,,c,) > (d,,d,) > (f,,f,) {code} > Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of > tuple() with empty inner fields but with user defined schema of "as > (A1:chararray, A2:chararray)". > -- This message was sent by Atlassian Jira (v8.20.10#820010)