Ahh, not as custom as I expected...that makes sense now. Glad things are working for you.
-Phil On Fri, Jan 28, 2011 at 5:34 PM, Christopher, Pat < patrick.christop...@hp.com> wrote: > Not sure what I did wrong the first time but I tried to create a table with > stored type of textfile and using my custom serde so it had a format line > of: > > > > ROW FORMAT SERDE ‘org.myorg.hadoop.hive.udf.MySerDe’ STORED AS textfile > > > > Then I loaded a gzipped file using LOAD DATA LOCAL INPATH ‘path.gz’ INTO > TABLE mytable and it worked as expected, ie the file was read and I’m able > to query it using hive. > > > > Sorry to bother and thanks a bunch for the help! Forcing me to go read > more about InputFormats is a long term help anyway. > > > > Pat > > > > *From:* phil young [mailto:phil.wills.yo...@gmail.com] > *Sent:* Friday, January 28, 2011 1:54 PM > *To:* user@hive.apache.org > *Subject:* Re: Custom SerDe Question > > > > To be clear, you would then create the table with the clause: > > > > STORED AS > > INPUTFORMAT 'your.custom.input.format' > > > > > > If you make an external table, you'll then be able to point to a directory > (or file) that contains gzipped files, or uncompressed files. > > > > > > > > On Fri, Jan 28, 2011 at 4:52 PM, phil young <phil.wills.yo...@gmail.com> > wrote: > > This can be accomplished with a custom input format. > > > > Here's a snippet of the relevant code in the customer RecordReader > > > > > > compressionCodecs = new CompressionCodecFactory(jobConf); > > Path file = split.getPath(); > > final CompressionCodec codec = > compressionCodecs.getCodec(file); > > // open the file and seek to the start of the split > > start = split.getStart(); > > end = start + split.getLength(); > > pos=0; > > > > FileSystem fs = file.getFileSystem(jobConf); > > fsdat = fs.open(split.getPath()); > > fsdat.seek(start); > > > > if (codec != null) > > { > > fsin = codec.createInputStream(fsdat); > > } > > else > > { > > fsin = fsdat; > > } > > > > > > > > > > > > On Fri, Jan 28, 2011 at 1:57 PM, Christopher, Pat < > patrick.christop...@hp.com> wrote: > > Hi, > > I’ve written a SerDe and I’d like it to be able handle compressed data > (gzip). Hadoop detects and decompresses on the fly so if you have a > compressed data set and you don’t need to perform any custom interpretation > of it as you go, hadoop and hive will handle it. Is there a way to get Hive > to notice the data is compressed, decompress it then push it through the > custom SerDe? Or will I have to either > > a. add some decompression logic to my SerDe (possibly impossible) > > b. decompress the data before pushing it into a table with my SerDe > > > > Thanks! > > > > Pat > > > > >