Hi Steven, The new AvroStorage will let you specify the input schema: https://issues.apache.org/jira/browse/PIG-3015
In fact, somebody made the same request in a comment of the jira that I am copying and pasting below: Furthermore, we occasionally have issues with pig jobs picking the old > schema when we have a schema update. Manually specifying the schema would > fix this and give us more flexibility in defining the data we want pig to > pull from a file. This jira is work in progress, but hopefully it will be in next major released. Thanks, Cheolsoo On Sat, Apr 27, 2013 at 3:24 PM, Enns, Steven <[email protected]> wrote: > Resending now that I am subscribed :) > > On 4/25/13 4:01 PM, "Enns, Steven" <[email protected]> wrote: > > >Hi everyone, > > > >I would like to override the input schema in AvroStorage to make a pig > >script robust to schema evolution. For example, suppose a new field is > >added to an avro schema with a default value of null. If the input to a > >pig script using this field includes both old and new data, AvroStorage > >will merge the input schemas from the old and new data. However, if the > >input includes only old data, the new schema will not be available to > >AvroStorage and pig will fail to interpret the script with an error such > >as "projected field [newField] does not exist in schema". If AvroStorage > >accepted an input schema, the script would be valid for both the new and > >old data. Is there any plan to implement this? > > > >Thanks, > >Steve > > > >
