Hi,
I loading data into pig, using LOAD without specifying datatypes. In the
second step I call UDF and using AS () I set the proper data types. Typed
set looks as below.
grunt> describe sensitiveSet;
sensitiveSet: {rank_ID: long,name: chararray,customerId: long,VIN:
chararray,birth_date: chararray,fuel_mileage: int,fuel_consumption: float}
When I want to store data typed as above, using AvroStorage, Im getting
really strange error Datum "Name" is not in union ["null","string"]. When I
change the type inside the schema to bytes, all works fine.
STORE sensitiveSet INTO 'OutputFileGen1aa'
USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check',
'schema',
'{"type":"record","name":"test","namespace":"","fields":[
{"name":"rank_ID","type":"long"},
{"name":"name","type":["null","*string*"],"store":"no","sensitive":"na"},
{"name":"cid","type":["null","bytes"],"store":"yes","sensitive":"yes"},
{"name":"VIN","type":["null","bytes"],"store":"yes","sensitive":"yes"},
{"name":"birth_date","type":["null","bytes"],"store":"yes","sensitive":"no"},
{"name":"fuel_mileage","type":["null","bytes"],"store":"yes","sensitive":"no"},
{"name":"fuel_consumption","type":["null","bytes"],"store":"yes","sensitive":"no"}
]}');
error below:
2016-01-11 15:16:15,644 [Thread-28] INFO
org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2016-01-11 15:16:15,647 [Thread-28] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local2100282506_0010
java.lang.Exception:
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.RuntimeException: Datum "Name" is not in union ["null","string"]
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.RuntimeException: Datum "Name" is not in union ["null","string"]
at
org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
at
org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:808)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:281)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Datum "Name" is not in union
["null","string"]
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.resolveUnionSchema(PigAvroDatumWriter.java:128)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion(PigAvroDatumWriter.java:111)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:365)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at
org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)
... 20 more
2016-01-11 15:16:15,792 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local2100282506_0010
When I change string to bytes it works properly. What should be the problem?
STORE sensitiveSet INTO 'OutputFileGen1aa'
USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check',
'schema',
'{"type":"record","name":"test","namespace":"","fields":[
{"name":"rank_ID","type":"long"},
{"name":"name","type":["null","*bytes*"],"store":"no","sensitive":"na"},
{"name":"cid","type":["null","bytes"],"store":"yes","sensitive":"yes"},
{"name":"VIN","type":["null","bytes"],"store":"yes","sensitive":"yes"},
{"name":"birth_date","type":["null","bytes"],"store":"yes","sensitive":"no"},
{"name":"fuel_mileage","type":["null","bytes"],"store":"yes","sensitive":"no"},
{"name":"fuel_consumption","type":["null","bytes"],"store":"yes","sensitive":"no"}
]}');
Thanks