Hi, We have an external ORC table which includes ~200 relatively small orc files (less than 256MB). When querying the table with selective SARG predicate (explain shows the predicate is qualified pushdown), we expects a few splits generated with pruning based on predicate condition and only a few files will be scanned. However, somehow predicate pushdown is not in effect at all, all the files are scanned in MR job and SARG did not even show up in the MR job config.
After digging more in hive code (version 0.14), looks like the split pruning only happens for the stripes within each file. If the file size is smaller than default split size, SARG is not considered. Here is the code we are referring: https://github.com/apache/hive/blob/release-0.14.0/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L656 Any idea why SARG is ignored for this scenario? also can split pruning filter out the files with all stripes not satisfied with SARG condition? Thanks for any help, really appreciated. Jessica