[
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-5272:
------------------------------------
[~daijy],
It is a valid issue. Schema of the tuple output by BagToTuple cannot be
determined at compile time. It depends on how many entries are there in the bag.
[~juen1jp],
Can you upload the patch?
> BagToString Output Schema
> -------------------------
>
> Key: PIG-5272
> URL: https://issues.apache.org/jira/browse/PIG-5272
> Project: Pig
> Issue Type: Improvement
> Reporter: Joshua Juen
> Priority: Minor
>
> The output schema from BagToTuple is nonsensical causing problems using the
> tuple later in the same script.
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2,
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial
> validation phase. Thus, I believe the output schema from the UDF should be
> modified to be type tuple without the number of fields being fixed to the
> number of columns in the input bag.
> Under the current way, the elements in the tuple cannot be accessed in the
> script after calling BagToTuple without getting an incompatible type error.
> We have modified the UDF in our internal UDF jars to work around the issue.
> Let me know if this sounds reasonable and I can generate the patch.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)