Hi Avrilia,

In org.apache.hadoop.hive.ql.io.orc.WriterImpl, the block size is
determined by Math.min(1.5GB, 2 * stripeSize). Also, you can use
"orc.block.padding" in the table property to control whether the writer to
pad HDFS blocks to prevent stripes from straddling blocks. The default
value of this flag is true.

Thanks,

Yin

On Sun, Dec 29, 2013 at 11:36 PM, Avrilia Floratou <
avrilia.flora...@gmail.com> wrote:

> best stripe size to use. The default one (250MB) is larger than the block
> size. Is each stripe splittable or in this case each map task will have to
> access data over the network? I also tried to set the stripe size to 128 MB
> (same as the block size) using the tblproperties in the create table
> statement but noticed that for a file of about 544GB, 2026 map tasks are
> launched which means that each split corresponds to about 250 MB. Is there
> anything else I should do to align the block size, stripe size and split
> size in the orc file?

Reply via email to