Not really sure, but can you try adding 'no_schema_check'while using AvroStorage in Store function.
On Wed, Oct 15, 2014 at 1:59 PM, Jakub Stransky <[email protected]> wrote: > Hello experienced users, > > I am working with avro data files using AvroStorage and I am facing > following issue. I cannot store the data of my result back to avro data > file. > > I have following script > inputdata = load '$INP' using AvroStorage(); > dirtydata = DISTINCT inputdata; > sodtr = FILTER dirtydata BY TransactionBlockNumber == 1; > sto = FOREACH sodtr GENERATE Dob.Value AS Dob,StoreId, > Created.UnixUtcTime; > g = GROUP sto BY (Dob,StoreId); > sodtime = FOREACH g GENERATE group.Dob AS Dob, group.StoreId AS StoreId, > MAX(sto.UnixUtcTime) AS latestStartOfDayTime; > > joined = JOIN dirtydata BY (Dob.Value, StoreId) LEFT OUTER, sodtime BY > (Dob, StoreId); > > cleandata = FILTER joined BY dirtydata::Created.UnixUtcTime >= > sodtime.latestStartOfDayTime; --1412864846 > finaldata = FOREACH cleandata GENERATE dirtydata::Version .. > dirtydata::Created; > > STORE finaldata INTO '$OUT' USING AvroStorage('schema_uri','$SCHEMA'); > > Where $SCHEMA contains exactly the same schema as inputdata. By pig > operations I got several nested relation, columns etc. Those should be > removed by .. operator. Resulting schema using describe > > > finaldata: {dirtydata*::*Version: int,dirtydata::Dob: (Value: > int),dirtydata::StoreId: chararray,dirtydata::TransactionBlockNumber: > int,dirtydata::TransactionData: {TransactionData: (TransactionHeader: (Dob: > (Value: int),StoreId: chararray,TransactionId: int,TransactionTime: > (UnixUtcTime: long,OffsetMinutes: int),TerminalId: > chararray,ResponsibleEmployees: (Employee: (Id: chararray,Name: > chararray),Manager: (Id: chararray,Name: chararray))),CustomData: > {KeyValue: (Key: chararray,Value: chararray)},StoreInfo: (IsQuickService: > boolean,CurrencyIsoCode: chararray),NewChecks: {NewCheckData: (CheckId: > chararray,CheckHeader: (CarriedOver: boolean,TerminalId: > chararray,Training: boolean,Period: (Id: chararray,Label: > chararray),GroupInfo: (Id: chararray,Label: (Id: chararray,Label: > chararray),IsTable: boolean),Events: {CheckEvent: (CustomEventLabel: > chararray,Time: (UnixUtcTime: long,OffsetMinutes: int),CheckEventType: > chararray)},CheckResponsibleEmployees: {CheckResponsibleEmployee: > (Employee: (Id: chararray,Name: chararray),Time: (UnixUtcTime: > long,OffsetMinutes: int))},GuestCounting: (Guests: (Value: chararray),Mode: > chararray),PrintedCheckId: chararray,RevenueCenter: (Id: chararray,Label: > chararray),Room: (Id: chararray,Label: chararray)))},Checks: {CheckData: > (CheckId: chararray,CheckHeaderUpdate: (Period: (Id: chararray,Label: > chararray),GroupInfo: (Id: chararray,Label: (Id: chararray,Label: > chararray),IsTable: boolean),Events: {CheckEvent: (CustomEventLabel: > chararray,Time: (UnixUtcTime: long,OffsetMinutes: int),CheckEventType: > chararray)},CheckResponsibleEmployees: {CheckResponsibleEmployee: > (Employee: (Id: chararray,Name: chararray),Time: (UnixUtcTime: > long,OffsetMinutes: int))},GuestCounting: (Guests: (Value: chararray),Mode: > chararray),PrintedCheckId: chararray,RevenueCenter: (Id: chararray,Label: > chararray),Room: (Id: chararray,Label: chararray)),Summary: (NetAmount: > (Value: chararray),Total: (Value: chararray)),CheckItems: {CheckItem: > (AbstractCheckElement: (Amount: (Value: chararray),ElementId: > chararray,ElementKind: (Id: chararray,Label: chararray),CreatedOn: > (UnixUtcTime: long,OffsetMinutes: int),ResponsibleEmployees: (Employee: > (Id: chararray,Name: chararray),Manager: (Id: chararray,Name: > chararray))),Categories: {Category: (CategoryInfo: (Id: chararray,Label: > chararray),Type: chararray)},ModifierInfo: (Label: (Id: chararray,Label: > chararray),ItemModifierInfoType: chararray),NetAmount: (Value: > chararray),OrderMode: (Id: chararray,Label: chararray),OriginalPrice: > (Value: chararray),ParentItem: chararray,Quantity: (Value: > chararray),Revenue: boolean,Seat: int,ProcessedInKitchen: boolean,GiftCard: > boolean,SplitItemElementId: chararray)},Comps: {CheckComp: > (AbstractCheckLinkedElement: (AbstractCheckElement: (Amount: (Value: > chararray),ElementId: chararray,ElementKind: (Id: chararray,Label: > chararray),CreatedOn: (UnixUtcTime: long,OffsetMinutes: > int),ResponsibleEmployees: (Employee: (Id: chararray,Name: > chararray),Manager: (Id: chararray,Name: chararray))),Items: {ItemAmount: > (Amount: (Value: chararray),ElementId: chararray)}),CheckCompType: > chararray,Note: chararray)},Payments: {CheckPayment: (AbstractCheckElement: > (Amount: (Value: chararray),ElementId: chararray,ElementKind: (Id: > chararray,Label: chararray),CreatedOn: (UnixUtcTime: long,OffsetMinutes: > int),ResponsibleEmployees: (Employee: (Id: chararray,Name: > chararray),Manager: (Id: chararray,Name: chararray))),ChangeBack: (Value: > chararray),DocumentId: chararray,Rounding: (Value: chararray),Tip: (Value: > chararray),CheckPaymentType: chararray,Card: chararray)},Promos: > {CheckPromo: (AbstractCheckLinkedElement: (AbstractCheckElement: (Amount: > (Value: chararray),ElementId: chararray,ElementKind: (Id: chararray,Label: > chararray),CreatedOn: (UnixUtcTime: long,OffsetMinutes: > int),ResponsibleEmployees: (Employee: (Id: chararray,Name: > chararray),Manager: (Id: chararray,Name: chararray))),Items: {ItemAmount: > (Amount: (Value: chararray),ElementId: chararray)}),Discount: (Value: > chararray),CheckPromoType: chararray)},Surcharges: {CheckSurcharge: > (AbstractCheckLinkedElement: (AbstractCheckElement: (Amount: (Value: > chararray),ElementId: chararray,ElementKind: (Id: chararray,Label: > chararray),CreatedOn: (UnixUtcTime: long,OffsetMinutes: > int),ResponsibleEmployees: (Employee: (Id: chararray,Name: > chararray),Manager: (Id: chararray,Name: chararray))),Items: {ItemAmount: > (Amount: (Value: chararray),ElementId: chararray)}),Rate: (Value: > chararray),CheckSurchargeType: chararray,Accounting: chararray)},Voids: > {CheckVoid: (AbstractCheckLinkedElement: (AbstractCheckElement: (Amount: > (Value: chararray),ElementId: chararray,ElementKind: (Id: chararray,Label: > chararray),CreatedOn: (UnixUtcTime: long,OffsetMinutes: > int),ResponsibleEmployees: (Employee: (Id: chararray,Name: > chararray),Manager: (Id: chararray,Name: chararray))),Items: {ItemAmount: > (Amount: (Value: chararray),ElementId: chararray)}),CheckVoidType: > chararray,Note: chararray)},RemovedElements: {RemovedElement: (ElementId: > chararray,RemovedElementType: chararray)})},LaborData: {LaborData: (Shifts: > {Shift: (State: chararray,StartDate: (UnixUtcTime: long,OffsetMinutes: > int),EndDate: (UnixUtcTime: long,OffsetMinutes: int),TotalPay: (Value: > chararray),PayRates: {ShiftPayRate: (AfterHours: int,HourlyRate: (Value: > chararray),IsOvertime: boolean)},ShiftNumber: int,Job: (Id: > chararray,Label: chararray),Breaks: {Break: (Paid: boolean,StartDate: > (UnixUtcTime: long,OffsetMinutes: int),EndDate: (UnixUtcTime: > long,OffsetMinutes: int))},IsManager: boolean)},Employee: (Id: > chararray,Name: chararray))})},dirtydata::Created: (UnixUtcTime: > long,OffsetMinutes: int)} > > *I am getting error: Pig Schema contains a name that is not allowed in > Avro. Which is probably because of :: remains for dirtydata. Is there a way > how to strip this off (as now there is no point being there) otherwise > schema should be identical to input schema.* > > *Thanks for helping me out* > *Jakub* >
