[jira] [Updated] (PIG-5272) BagToTuple Output Schema

Joshua Juen (JIRA) Fri, 22 Sep 2017 10:52:59 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joshua Juen updated PIG-5272:
-----------------------------
    Attachment: BagToTupleSchema.patch

> BagToTuple Output Schema
> ------------------------
>
>                 Key: PIG-5272
>                 URL: https://issues.apache.org/jira/browse/PIG-5272
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.17.0
>            Reporter: Joshua Juen
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: BagToTupleSchema.patch
>
>
> The output schema from BagToTuple is nonsensical causing problems using the 
> tuple later in the same script. 
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2, 
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial 
> validation phase. Thus, I believe the output schema from the UDF should be 
> modified to be type tuple without the number of fields being fixed to the 
> number of columns in the input bag. 
> Under the current way, the elements in the tuple cannot be accessed in the 
> script after calling BagToTuple without getting an incompatible type error. 
> We have modified the UDF in our internal UDF jars to work around the issue. 
> Let me know if this sounds reasonable and I can generate the patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIG-5272) BagToTuple Output Schema

Reply via email to