To be clear, you would then create the table with the clause:

STORED AS
  INPUTFORMAT 'your.custom.input.format'


If you make an external table, you'll then be able to point to a directory
(or file) that contains gzipped files, or uncompressed files.



On Fri, Jan 28, 2011 at 4:52 PM, phil young <phil.wills.yo...@gmail.com>wrote:

> This can be accomplished with a custom input format.
>
> Here's a snippet of the relevant code in the customer RecordReader
>
>
>
>
>             compressionCodecs = new CompressionCodecFactory(jobConf);
>
>             Path file = split.getPath();
>
>             final CompressionCodec codec = compressionCodecs
> .getCodec(file);
>
>             // open the file and seek to the start of the split
>
>             start = split.getStart();
>
>             end = start + split.getLength();
>
>             pos=0;
>
>
>             FileSystem fs = file.getFileSystem(jobConf);
>
>             fsdat = fs.open(split.getPath());
>
>             fsdat.seek(start);
>
>
>             if (codec != null)
>
>             {
>
>                 fsin = codec.createInputStream(fsdat);
>
>             }
>
>             else
>
>             {
>
>                 fsin = fsdat;
>
>             }
>
>
>
>
>
>
> On Fri, Jan 28, 2011 at 1:57 PM, Christopher, Pat <
> patrick.christop...@hp.com> wrote:
>
>> Hi,
>>
>> I’ve written a SerDe and I’d like it to be able handle compressed data
>> (gzip).  Hadoop detects and decompresses on the fly so if you have a
>> compressed data set and you don’t need to perform any custom interpretation
>> of it as you go, hadoop and hive will handle it.  Is there a way to get Hive
>> to notice the data is compressed, decompress it then push it through the
>> custom SerDe?  Or will I have to either
>>
>>   a. add some decompression logic to my SerDe (possibly impossible)
>>
>>   b. decompress the data before pushing it into a table with my SerDe
>>
>>
>>
>> Thanks!
>>
>>
>>
>> Pat
>>
>
>

Reply via email to