Hi,

We have an external ORC table which includes ~200 relatively small orc
files (less than 256MB). When querying the table with selective SARG
predicate (explain shows the predicate is qualified pushdown), we expects a
few splits generated with pruning based on predicate condition and only a
few files will be scanned. However, somehow predicate pushdown is not in
effect at all, all the files are scanned in MR job and SARG did not even
show up in the MR job config.

After digging more in hive code (version 0.14), looks like the split
pruning only happens for the stripes within each file. If the file size is
smaller than default split size, SARG is not considered. Here is the code
we are referring:
https://github.com/apache/hive/blob/release-0.14.0/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L656


Any idea why SARG is ignored for this scenario? also can split pruning
filter out the files with all stripes not satisfied with SARG condition?
Thanks for any help, really appreciated.

Jessica

Reply via email to