[ https://issues.apache.org/jira/browse/PIG-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Travis Woodruff updated PIG-4018: --------------------------------- Attachment: PIG-4018-2.patch Looks like TypeCheckingRelVisitor adds type conversion, so my previous patch is no good. Need a more complicated patch, unfortunately. This patch updates LOUnion.getSchema() to ensure that all output schema fields have a uid set. It also makes some small changes to minimize calls to getSchema() on inputs, since several additional iterations over inputs have been added, and getScema() calls can be quite costly (for example, PigStorage reloads from HDFS every time getSchema() is called). Also moved the test for this issue into TestUnionOnSchema. > Schema validation fails with UNION ONSCHEMA > ------------------------------------------- > > Key: PIG-4018 > URL: https://issues.apache.org/jira/browse/PIG-4018 > Project: Pig > Issue Type: Bug > Affects Versions: 0.13.0 > Reporter: Travis Woodruff > Assignee: Travis Woodruff > Attachments: PIG-4018-2.patch, PIG-4018.patch > > > When relations with differing schemas are unioned (using UNION ONSCHEMA), > schema validation can fail with this exception: > {{org.apache.pig.impl.plan.PlanValidationException: Logical plan invalid > state: invalid uid -1 in schema}} > This worked before the fix for PIG-3492. > The merged schema (from {{LOUnion.getSchema()}}) does not contain uids for > columns not in the schema of the first input (uids are set to -1). This is > because only the first input's schema is used for looking up "cached" uids. > Normally, this isn't a problem because {{UnionOnSchemaSetter}} comes along > and fixes the missing fields. > However, when {{ImplicitSplitInsertVisitor}} is active, it is called before > {{UnionOnSchemaSetter}}. {{ImplicitSplitInsertVisitor}} calls > {{schemaResetter.visit()}}, which throws the validation exception because > {{UnionOnSchemaSetter}} has not had a chance to create the missing fields > (and thus uids are still -1 for these fields). -- This message was sent by Atlassian JIRA (v6.2#6252)