Hi I'm trying to load a sequence file compressed with GZipCodec from HDFS
into Pig USING org.apache.pig.piggybank.storage.SequenceFileLoader() from
the piggybank-0.12.jar file.
*The file format is : *
*SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec*
*This file is in HIVE and I'm able to see the data for all the columns
correctly.*
*A = LOAD '/user/a/test/part-r-00000' USING
org.apache.pig.piggybank.storage.SequenceFileLoader() AS
(user_id:chararray,flwd_id:chararray,intrst_id:chararray,vsblty_id:chararray);*
*STORE A into '/user/a/test/output' using PigStorage(',');*
After I load into a variable and dump/store the variable, I see that the
fields are all concatenated and some records are truncated.
Please let me know if this is the right way to read a sequencefile with
Gzip (created using HIVE) into Pig.
Thanks!!