On Mon, Jan 24, 2011 at 4:42 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > On Mon, Jan 24, 2011 at 4:14 PM, yongqiang he <heyongqiang...@gmail.com> > wrote: >> How did you upload the data to the new table? >> You can get the data compressed by doing a insert overwrite to the >> destination table with setting "hive.exec.compress.output" to true. >> >> Thanks >> Yongqiang >> On Mon, Jan 24, 2011 at 12:30 PM, Edward Capriolo <edlinuxg...@gmail.com> >> wrote: >>> I am trying to explore some use case that I believe are perfect for >>> the columnarSerDe, tables with 100+ columns where only one or two are >>> selected in a particular query. >>> >>> CREATE TABLE (....) >>> ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" >>> STORED AS RCFile ; >>> >>> My issue is my data from our source table, with gzip sequence files, >>> is much smaller then the ColumnarSerDe table and as a result any >>> performance gains are lost. >>> >>> Any ideas? >>> >>> Thank you, >>> Edward >>> >> > > Thank you! That was a RTFM question. > > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.exec.compress.output=true; > set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; > > I was unclear about 'STORED AS RCFile' since normally you would need > to use ' STORED AS SEQUENCEFILE' > > However > http://hive.apache.org/docs/r0.6.0/api/org/apache/hadoop/hive/ql/io/RCFile.html > explains this well. RCFILE is a special type of sequence file. > > I did get it working. Looks good compression for my table was smaller > then using GZIP BLOCK Sequence file. Query time was slightly better in > limited testing. Cool stuff. > > Edward >
Do rcfiles support a blocksize for compression like other compressed sequence files?