----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41984/ -----------------------------------------------------------
Review request for hive, Sergio Pena and Xuefu Zhang. Repository: hive-git Description ------- HIVE-12762: Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true Diffs ----- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9a7d990baaabfde8e564f00bb1fcfe30cd16dc90 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 017676bec2163f04fb95f43224a4f8743fa49f55 ql/src/test/queries/clientpositive/parquet_join2.q PRE-CREATION ql/src/test/results/clientpositive/parquet_join2.q.out PRE-CREATION storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/ExpressionTree.java 577d95d1a15a54c2804349b3e5e68d83b72df664 storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java eeff131cbc14d7ef554517109612ae7d891f8003 Diff: https://reviews.apache.org/r/41984/diff/ Testing ------- We have two issues: 1. We are filtering the parquet columns based on the last filter condition in the query. So if the query contains multiple instances of the same table, e.g., join on the same table with different filter conditions, then we could get incorrect result; 2. rewriteLeaves implementation in SearchArgumentImpl is not accurate since the different leaves could be sharing the same object. The current implementation could change the leave index multiple times to an incorrect value. The patch will merge all the filter conditions (create OR expression on all the filters) so that the columns which will be used during operator won't be filtered during earlier splitting stage. rewriteLeaves is reimplemented to get all the unique leaves first and replace in place. Thanks, Aihua Xu