Koji Noguchi created PIG-5400:
---------------------------------

             Summary: OrcStorage dropping struct(tuple) when it only holds a 
single field
                 Key: PIG-5400
                 URL: https://issues.apache.org/jira/browse/PIG-5400
             Project: Pig
          Issue Type: Improvement
          Components: impl
            Reporter: Koji Noguchi
            Assignee: Koji Noguchi


I was asked by a user that they were seeing inconsistent schema when stored on 
OrcStorage. Sample code 

{code} 
A = load 'input.txt' as (a0:long); 
B = GROUP A by a0; 
STORE B into 'filename' using OrcStorage(); 
{code} 

Pig's schema {{B: {group: long,A: bag: { tuple(a0: long)}}}}. 

Expected Orc schema {{struct<group:bigint,A:array<struct<bigint>>>}} 
Actual Orc schema {{struct<group:bigint,A:array<bigint>>}} 

_This only happens when a tuple contains a single field._ 

Current schema without struct(tuple) is better in saving space but it would be 
nice to have an option to keep the extra struct(tuple) layer if user expects 
schema evolution within that tuple by adding more fields in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to