This will not work. set hive.merge.size.per.task=28*1024*1024;
It has to be a number. On Mon, Jun 18, 2012 at 2:46 PM, Benyi Wang <bewang.t...@gmail.com> wrote: > I try to use Hive merge options to merge the smallfiles into a large files > using the following query. It is working well except that I cannot control > the output file size. I cannot explain why the output files are always > 256MB using the following hive.merge.size.per.task and > hive.merge.smallfiles.avgsize > settings. Tried 56MB for hive.merge.size.per.task, the size is still 256MB. > > "omniture_hit" is an uncompressed CSV file format hive table. I want to > convert it into RCFile format. The problem is that there will a lot of > small RCFiles created which are much smaller than our default block size > 128M if I just simple select * and insert into the new table. > > Another problem is that I want to change hive.io.rcfile.record.size to 8MB > to see if there is more compression ratio for my data. But the result seems > similar compared with 4MB. The data pattern could be like that as RCFile > paper said. But how can I verify if my setting to 8MB works? > > Thanks. > > Ben > > SET hive.exec.compress.output=true; > SET hive.exec.compress.intermediate=true; > > set hive.merge.size.per.task=28*1024*1024; > set hive.merge.smallfiles.avgsize=100000000; > set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; > > SET hive.exec.dynamic.partition=true; > SET hive.exec.dynamic.partition.mode=nonstrict; > SET hive.exec.max.dynamic.partitions.pernode=10000; > SET hive.exec.max.dynamic.partitions=10000; > SET hive.exec.max.created.files=150000; > > create table omniture_hit_rc like omniture_hit; > > insert overwrite table omniture_hit_rc partition (local_dt) select * > from omniture_hit where local_dt>='2012-06-01';