Re: ORC slipt

Prasanth Jayachandran Fri, 31 Mar 2017 10:00:57 -0700

Please find answers inline.

Thanks
Prasanth
_____________________________
From: Alberto Ramón 
<a.ramonporto...@gmail.com<mailto:a.ramonporto...@gmail.com>>
Sent: Friday, March 31, 2017 9:32 AM
Subject: ORC slipt
To: <user@hive.apache.org<mailto:user@hive.apache.org>>



Some doubts about ORC:


1- hive.exec.orc.default.buffer.size is used for read or write?
Configurable only during write. Writer writes this buffer size into the footer 
which readers use during decompression.

2- orc.stripe.size is compressed or uncompresed?

Both. Stripe size is essentially sum of all buffers of all columns (also 
dictionary size) held in memory.

3- orc.stripe.size must be multiple of HDFS block size?

It is optimal to have it as multiple of hdfs block size else writer will adjust 
the last stripe size within a block so as to not straddle hdfs block boundary 
or pad the remaining space if it is less than 5% of block size. Note that hdfs 
block size can be configurable via orc.block.size and is independent of cluster 
wide block size. Default stripe size is 64 mb and block size is 256mb.


4- For read ORC file , the numbers of mappers depends onr HDFS blocks or Stripe 
number?

Depends. If predicate pushdown is enabled each split could have one or more 
stripes. If predicate pushdown is disabled adjacent stripes are grouped 
together until hdfs block size to form a single split.

Let's say, we have 3 stripes and if 2nd stripe does not satisfy the predicate 
then 1st and 3rd stripe will become 2 separate splits and 2nd stripe will be 
ignored. If predicate pushdown is disabled, all 3 stripes will together form a 
single split as it is less than block boundary.

Number of splits will vary based on input format and execution engine.

5- hive.exec.orc.split.strategy is used for read?

Yes.

Re: ORC slipt

Reply via email to