This can be accomplished with a custom input format. Here's a snippet of the relevant code in the customer RecordReader
compressionCodecs = new CompressionCodecFactory(jobConf); Path file = split.getPath(); final CompressionCodec codec = compressionCodecs.getCodec(file); // open the file and seek to the start of the split start = split.getStart(); end = start + split.getLength(); pos=0; FileSystem fs = file.getFileSystem(jobConf); fsdat = fs.open(split.getPath()); fsdat.seek(start); if (codec != null) { fsin = codec.createInputStream(fsdat); } else { fsin = fsdat; } On Fri, Jan 28, 2011 at 1:57 PM, Christopher, Pat < patrick.christop...@hp.com> wrote: > Hi, > > I’ve written a SerDe and I’d like it to be able handle compressed data > (gzip). Hadoop detects and decompresses on the fly so if you have a > compressed data set and you don’t need to perform any custom interpretation > of it as you go, hadoop and hive will handle it. Is there a way to get Hive > to notice the data is compressed, decompress it then push it through the > custom SerDe? Or will I have to either > > a. add some decompression logic to my SerDe (possibly impossible) > > b. decompress the data before pushing it into a table with my SerDe > > > > Thanks! > > > > Pat >