Re: hive.merge properties with RCFile

Edward Capriolo Mon, 18 Jun 2012 12:27:51 -0700

This will not work.

set hive.merge.size.per.task=28*1024*1024;


It has to be a number.

On Mon, Jun 18, 2012 at 2:46 PM, Benyi Wang <bewang.t...@gmail.com> wrote:
> I try to use Hive merge options to merge the smallfiles into a large files
> using the following query. It is working well except that I cannot control
> the output file size. I cannot explain why the output files are always
> 256MB using the following hive.merge.size.per.task and
> hive.merge.smallfiles.avgsize
> settings. Tried 56MB for hive.merge.size.per.task, the size is still 256MB.
>
> "omniture_hit" is an uncompressed CSV file format hive table. I want to
> convert it into RCFile format. The problem is that there will a lot of
> small RCFiles created which are much smaller than our default block size
> 128M if I just simple select * and insert into the new table.
>
> Another problem is that I want to change hive.io.rcfile.record.size to 8MB
> to see if there is more compression ratio for my data. But the result seems
> similar compared with 4MB. The data pattern could be like that as RCFile
> paper said. But how can I verify if my setting to 8MB works?
>
> Thanks.
>
> Ben
>
> SET hive.exec.compress.output=true;
> SET hive.exec.compress.intermediate=true;
>
> set hive.merge.size.per.task=28*1024*1024;
> set hive.merge.smallfiles.avgsize=100000000;
> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>
> SET hive.exec.dynamic.partition=true;
> SET hive.exec.dynamic.partition.mode=nonstrict;
> SET hive.exec.max.dynamic.partitions.pernode=10000;
> SET hive.exec.max.dynamic.partitions=10000;
> SET hive.exec.max.created.files=150000;
>
> create table omniture_hit_rc like omniture_hit;
>
> insert overwrite table omniture_hit_rc partition (local_dt) select *
> from omniture_hit where local_dt>='2012-06-01';

Re: hive.merge properties with RCFile

Reply via email to