Re: Can compression be used with ColumnarSerDe ?

Edward Capriolo Mon, 24 Jan 2011 16:45:01 -0800

On Mon, Jan 24, 2011 at 4:42 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> On Mon, Jan 24, 2011 at 4:14 PM, yongqiang he <heyongqiang...@gmail.com> 
> wrote:
>> How did you upload the data to the new table?
>> You can get the data compressed by doing a insert overwrite to the
>> destination table with setting "hive.exec.compress.output" to true.
>>
>> Thanks
>> Yongqiang
>> On Mon, Jan 24, 2011 at 12:30 PM, Edward Capriolo <edlinuxg...@gmail.com> 
>> wrote:
>>> I am trying to explore some use case that I believe are perfect for
>>> the columnarSerDe, tables with 100+ columns where only one or two are
>>> selected in a particular query.
>>>
>>> CREATE TABLE (....)
>>> ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
>>>   STORED AS RCFile ;
>>>
>>> My issue is my data from our source table, with gzip sequence files,
>>> is much smaller then the ColumnarSerDe table and as a result any
>>> performance gains are lost.
>>>
>>> Any ideas?
>>>
>>> Thank you,
>>> Edward
>>>
>>
>
> Thank you! That was a RTFM question.
>
>  set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.compress.output=true;
> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>
> I was unclear about  'STORED AS RCFile' since normally you would need
> to use ' STORED AS SEQUENCEFILE'
>
> However 
> http://hive.apache.org/docs/r0.6.0/api/org/apache/hadoop/hive/ql/io/RCFile.html
> explains this well. RCFILE is a special type of sequence file.
>
> I did get it working. Looks good compression for my table was smaller
> then using GZIP BLOCK Sequence file. Query time was slightly better in
> limited testing. Cool stuff.
>
> Edward
>


Do rcfiles support a blocksize for compression like other compressed
sequence files?

Re: Can compression be used with ColumnarSerDe ?

Reply via email to