To be clear, you would then create the table with the clause: STORED AS INPUTFORMAT 'your.custom.input.format'
If you make an external table, you'll then be able to point to a directory (or file) that contains gzipped files, or uncompressed files. On Fri, Jan 28, 2011 at 4:52 PM, phil young <phil.wills.yo...@gmail.com>wrote: > This can be accomplished with a custom input format. > > Here's a snippet of the relevant code in the customer RecordReader > > > > > compressionCodecs = new CompressionCodecFactory(jobConf); > > Path file = split.getPath(); > > final CompressionCodec codec = compressionCodecs > .getCodec(file); > > // open the file and seek to the start of the split > > start = split.getStart(); > > end = start + split.getLength(); > > pos=0; > > > FileSystem fs = file.getFileSystem(jobConf); > > fsdat = fs.open(split.getPath()); > > fsdat.seek(start); > > > if (codec != null) > > { > > fsin = codec.createInputStream(fsdat); > > } > > else > > { > > fsin = fsdat; > > } > > > > > > > On Fri, Jan 28, 2011 at 1:57 PM, Christopher, Pat < > patrick.christop...@hp.com> wrote: > >> Hi, >> >> I’ve written a SerDe and I’d like it to be able handle compressed data >> (gzip). Hadoop detects and decompresses on the fly so if you have a >> compressed data set and you don’t need to perform any custom interpretation >> of it as you go, hadoop and hive will handle it. Is there a way to get Hive >> to notice the data is compressed, decompress it then push it through the >> custom SerDe? Or will I have to either >> >> a. add some decompression logic to my SerDe (possibly impossible) >> >> b. decompress the data before pushing it into a table with my SerDe >> >> >> >> Thanks! >> >> >> >> Pat >> > >