Not really sure, but can you try adding 'no_schema_check'while using
AvroStorage in Store function.

On Wed, Oct 15, 2014 at 1:59 PM, Jakub Stransky <[email protected]>
wrote:

> Hello experienced users,
>
> I am working with avro data files using AvroStorage and I am facing
> following issue. I cannot store the data of my result back to avro data
> file.
>
> I have following script
> inputdata = load '$INP' using AvroStorage();
> dirtydata = DISTINCT inputdata;
> sodtr = FILTER dirtydata BY TransactionBlockNumber == 1;
> sto   = FOREACH sodtr GENERATE Dob.Value AS Dob,StoreId,
> Created.UnixUtcTime;
> g     = GROUP sto BY  (Dob,StoreId);
> sodtime = FOREACH g GENERATE group.Dob AS Dob, group.StoreId AS StoreId,
> MAX(sto.UnixUtcTime) AS latestStartOfDayTime;
>
> joined = JOIN dirtydata BY (Dob.Value, StoreId) LEFT OUTER, sodtime BY
> (Dob, StoreId);
>
> cleandata = FILTER joined BY dirtydata::Created.UnixUtcTime >=
> sodtime.latestStartOfDayTime; --1412864846
> finaldata = FOREACH cleandata GENERATE dirtydata::Version ..
> dirtydata::Created;
>
> STORE finaldata INTO '$OUT' USING AvroStorage('schema_uri','$SCHEMA');
>
> Where $SCHEMA contains exactly the same schema as inputdata. By pig
> operations I got several nested relation, columns etc. Those should be
> removed by .. operator. Resulting schema using describe
>
>
> finaldata: {dirtydata*::*Version: int,dirtydata::Dob: (Value:
> int),dirtydata::StoreId: chararray,dirtydata::TransactionBlockNumber:
> int,dirtydata::TransactionData: {TransactionData: (TransactionHeader: (Dob:
> (Value: int),StoreId: chararray,TransactionId: int,TransactionTime:
> (UnixUtcTime: long,OffsetMinutes: int),TerminalId:
> chararray,ResponsibleEmployees: (Employee: (Id: chararray,Name:
> chararray),Manager: (Id: chararray,Name: chararray))),CustomData:
> {KeyValue: (Key: chararray,Value: chararray)},StoreInfo: (IsQuickService:
> boolean,CurrencyIsoCode: chararray),NewChecks: {NewCheckData: (CheckId:
> chararray,CheckHeader: (CarriedOver: boolean,TerminalId:
> chararray,Training: boolean,Period: (Id: chararray,Label:
> chararray),GroupInfo: (Id: chararray,Label: (Id: chararray,Label:
> chararray),IsTable: boolean),Events: {CheckEvent: (CustomEventLabel:
> chararray,Time: (UnixUtcTime: long,OffsetMinutes: int),CheckEventType:
> chararray)},CheckResponsibleEmployees: {CheckResponsibleEmployee:
> (Employee: (Id: chararray,Name: chararray),Time: (UnixUtcTime:
> long,OffsetMinutes: int))},GuestCounting: (Guests: (Value: chararray),Mode:
> chararray),PrintedCheckId: chararray,RevenueCenter: (Id: chararray,Label:
> chararray),Room: (Id: chararray,Label: chararray)))},Checks: {CheckData:
> (CheckId: chararray,CheckHeaderUpdate: (Period: (Id: chararray,Label:
> chararray),GroupInfo: (Id: chararray,Label: (Id: chararray,Label:
> chararray),IsTable: boolean),Events: {CheckEvent: (CustomEventLabel:
> chararray,Time: (UnixUtcTime: long,OffsetMinutes: int),CheckEventType:
> chararray)},CheckResponsibleEmployees: {CheckResponsibleEmployee:
> (Employee: (Id: chararray,Name: chararray),Time: (UnixUtcTime:
> long,OffsetMinutes: int))},GuestCounting: (Guests: (Value: chararray),Mode:
> chararray),PrintedCheckId: chararray,RevenueCenter: (Id: chararray,Label:
> chararray),Room: (Id: chararray,Label: chararray)),Summary: (NetAmount:
> (Value: chararray),Total: (Value: chararray)),CheckItems: {CheckItem:
> (AbstractCheckElement: (Amount: (Value: chararray),ElementId:
> chararray,ElementKind: (Id: chararray,Label: chararray),CreatedOn:
> (UnixUtcTime: long,OffsetMinutes: int),ResponsibleEmployees: (Employee:
> (Id: chararray,Name: chararray),Manager: (Id: chararray,Name:
> chararray))),Categories: {Category: (CategoryInfo: (Id: chararray,Label:
> chararray),Type: chararray)},ModifierInfo: (Label: (Id: chararray,Label:
> chararray),ItemModifierInfoType: chararray),NetAmount: (Value:
> chararray),OrderMode: (Id: chararray,Label: chararray),OriginalPrice:
> (Value: chararray),ParentItem: chararray,Quantity: (Value:
> chararray),Revenue: boolean,Seat: int,ProcessedInKitchen: boolean,GiftCard:
> boolean,SplitItemElementId: chararray)},Comps: {CheckComp:
> (AbstractCheckLinkedElement: (AbstractCheckElement: (Amount: (Value:
> chararray),ElementId: chararray,ElementKind: (Id: chararray,Label:
> chararray),CreatedOn: (UnixUtcTime: long,OffsetMinutes:
> int),ResponsibleEmployees: (Employee: (Id: chararray,Name:
> chararray),Manager: (Id: chararray,Name: chararray))),Items: {ItemAmount:
> (Amount: (Value: chararray),ElementId: chararray)}),CheckCompType:
> chararray,Note: chararray)},Payments: {CheckPayment: (AbstractCheckElement:
> (Amount: (Value: chararray),ElementId: chararray,ElementKind: (Id:
> chararray,Label: chararray),CreatedOn: (UnixUtcTime: long,OffsetMinutes:
> int),ResponsibleEmployees: (Employee: (Id: chararray,Name:
> chararray),Manager: (Id: chararray,Name: chararray))),ChangeBack: (Value:
> chararray),DocumentId: chararray,Rounding: (Value: chararray),Tip: (Value:
> chararray),CheckPaymentType: chararray,Card: chararray)},Promos:
> {CheckPromo: (AbstractCheckLinkedElement: (AbstractCheckElement: (Amount:
> (Value: chararray),ElementId: chararray,ElementKind: (Id: chararray,Label:
> chararray),CreatedOn: (UnixUtcTime: long,OffsetMinutes:
> int),ResponsibleEmployees: (Employee: (Id: chararray,Name:
> chararray),Manager: (Id: chararray,Name: chararray))),Items: {ItemAmount:
> (Amount: (Value: chararray),ElementId: chararray)}),Discount: (Value:
> chararray),CheckPromoType: chararray)},Surcharges: {CheckSurcharge:
> (AbstractCheckLinkedElement: (AbstractCheckElement: (Amount: (Value:
> chararray),ElementId: chararray,ElementKind: (Id: chararray,Label:
> chararray),CreatedOn: (UnixUtcTime: long,OffsetMinutes:
> int),ResponsibleEmployees: (Employee: (Id: chararray,Name:
> chararray),Manager: (Id: chararray,Name: chararray))),Items: {ItemAmount:
> (Amount: (Value: chararray),ElementId: chararray)}),Rate: (Value:
> chararray),CheckSurchargeType: chararray,Accounting: chararray)},Voids:
> {CheckVoid: (AbstractCheckLinkedElement: (AbstractCheckElement: (Amount:
> (Value: chararray),ElementId: chararray,ElementKind: (Id: chararray,Label:
> chararray),CreatedOn: (UnixUtcTime: long,OffsetMinutes:
> int),ResponsibleEmployees: (Employee: (Id: chararray,Name:
> chararray),Manager: (Id: chararray,Name: chararray))),Items: {ItemAmount:
> (Amount: (Value: chararray),ElementId: chararray)}),CheckVoidType:
> chararray,Note: chararray)},RemovedElements: {RemovedElement: (ElementId:
> chararray,RemovedElementType: chararray)})},LaborData: {LaborData: (Shifts:
> {Shift: (State: chararray,StartDate: (UnixUtcTime: long,OffsetMinutes:
> int),EndDate: (UnixUtcTime: long,OffsetMinutes: int),TotalPay: (Value:
> chararray),PayRates: {ShiftPayRate: (AfterHours: int,HourlyRate: (Value:
> chararray),IsOvertime: boolean)},ShiftNumber: int,Job: (Id:
> chararray,Label: chararray),Breaks: {Break: (Paid: boolean,StartDate:
> (UnixUtcTime: long,OffsetMinutes: int),EndDate: (UnixUtcTime:
> long,OffsetMinutes: int))},IsManager: boolean)},Employee: (Id:
> chararray,Name: chararray))})},dirtydata::Created: (UnixUtcTime:
> long,OffsetMinutes: int)}
>
> *I am getting error: Pig Schema contains a name that is not allowed in
> Avro. Which is probably because of :: remains for dirtydata. Is there a way
> how to strip this off  (as now there is no point being there) otherwise
> schema should be identical to input schema.*
>
> *Thanks for helping me out*
> *Jakub*
>

Reply via email to