[
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joshua Juen updated PIG-5272:
-----------------------------
Attachment: BagToTupleSchema.patch
> BagToTuple Output Schema
> ------------------------
>
> Key: PIG-5272
> URL: https://issues.apache.org/jira/browse/PIG-5272
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.17.0
> Reporter: Joshua Juen
> Priority: Minor
> Fix For: 0.18.0
>
> Attachments: BagToTupleSchema.patch
>
>
> The output schema from BagToTuple is nonsensical causing problems using the
> tuple later in the same script.
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2,
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial
> validation phase. Thus, I believe the output schema from the UDF should be
> modified to be type tuple without the number of fields being fixed to the
> number of columns in the input bag.
> Under the current way, the elements in the tuple cannot be accessed in the
> script after calling BagToTuple without getting an incompatible type error.
> We have modified the UDF in our internal UDF jars to work around the issue.
> Let me know if this sounds reasonable and I can generate the patch.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)