You need to load JSON-simple jar, as he did in his example. Start with it, not your own.
Russell Jurney http://datasyndrome.com On Jan 11, 2013, at 1:03 PM, Milind Vaidya <[email protected]> wrote: > As you said I was able to fix the PigStorage ( ) related problem by having > separate input and output directories. > > Sorry, my mistake...! did not realise that the fully qualified name was > missing.That along with separate output directory solved my AvroStorage() > problem too. > > here are my 2 scripts: > > *AvroStorage() usage :* > > REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar > REGISTER /homes/immilind/HadoopLocal/Jars/piggybank.jar > > employee= LOAD '/user/immilind/AvroData' USING > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > DESCRIBE employee; > DUMP employee; > > STORE employee INTO '/user/immilind/AvroStoredData' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > > employee_new= LOAD '/user/immilind/AvroStoredData' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > DESCRIBE employee_new; > DUMP employee_new; > * > PigStorage( ) usage:* > > REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar > REGISTER /homes/immilind/HadoopLocal/Jars/piggybank.jar > > employee= load '/user/immilind/AvroData' USING > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > DESCRIBE employee; > DUMP employee; > > NewEmployee = foreach employee generate name as name, age as age,dept as > dept,office as office,salary as salary,lastname as lastname; > STORE NewEmployee INTO '/user/immilind/PlainData' USING PigStorage(','); > > employee_new = LOAD '/user/immilind/PlainData' USING PigStorage(); > DESCRIBE employee_new; > DUMP employee_new; > > > On Fri, Jan 11, 2013 at 11:33 AM, Cheolsoo Park <[email protected]>wrote: > >> Hi, >> >> Here is a working version of your example. >> >> >> 1) AvroStorage Load -> AvroStorage Store -> AvroStorage Load >> >> ----- >> REGISTER build/ivy/lib/Pig/avro-1.7.1.jar >> REGISTER build/ivy/lib/Pig/json-simple-1.1.jar >> REGISTER contrib/piggybank/java/piggybank.jar >> >> DEFINE AVRO_LOAD_1 >> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); >> DEFINE AVRO_LOAD_2 org.apache.pig.piggybank.storage.avro.AvroStorage(); >> DEFINE AVRO_STORE >> org.apache.pig.piggybank.storage.avro.AvroStorage('same', >> 'AvroData/employee.avro'); >> >> employee = LOAD 'AvroData' USING AVRO_LOAD_1; >> DUMP employee; >> >> STORE employee INTO 'StoredAvro' USING AVRO_STORE; >> >> employee = LOAD 'StoredAvro' USING AVRO_LOAD_2; >> DUMP employee; >> ----- >> >> Please note that: >> * The 2nd Avro load command defines the schema by the 'same' option. It >> means it will store the relation 'emplyee' using the same schema of >> 'AvroData/employee.avro'. Alternatively, you can specify the schema using >> JSON string by the 'schema' option. For example, AvroStorage('schema', >> '<JSON string>'). >> * I moved StoredAvro out of AvroData. This is because AvroStorage loads >> directories recursively. If I run this script multiple times, I will load >> files not only files in AvroData but also in AvroData/StoredAvor from a >> previous run. Therefore, I am using separate directories for input and >> output. >> >> >> 2) AvroStorage Load -> PigStorage Store -> PigStorage Load >> >> ----- >> DEFINE AVRO_LOAD >> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); >> >> employee = LOAD 'AvroData' USING AVRO_LOAD; >> DUMP employee; >> >> STORE employee INTO 'StoredText' USING PigStorage(','); >> >> employee = LOAD 'StoredText' USING PigStorage(',') as (name:chararray, >> age:int, dept:chararray, office:chararray, salary:int, lastname:chararray); >> DUMP employee; >> ----- >> >> >> 3) Regarding your errors: >> >> * ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve >> AvroStorage using imports: [, org.apache.pig.builtin., >> org.apache.pig.impl.builtin.] >> This is because you didn't use fully qualified name of AvroStorage in your >> script. Pig assumes default qualifiers if no qualifier is given. >> >> * ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245: Cannot get schema >> from loadFunc org.apache.pig.piggybank.storage.avro.AvroStorage >> This can happen you load non-Avro files (e.g. text files) using >> AvroStorage. For example, if you store data using AvroStorage() without a >> schema, they will be stored as a text file. Then, if you load them again >> using AvroStorage, you will get this error. It's hard to tell exactly how >> you run into this situation, but given that you're writing files into a >> sub-directory of the input directory, you probably loaded text files stored >> from a previous run. This is why I recommend you should separated input and >> output directories. >> >> Thanks, >> Cheolsoo >>
