----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28500/ -----------------------------------------------------------
(Updated Dec. 2, 2014, 1:34 a.m.) Review request for hive, Chao Sun, Suhas Satish, and Xuefu Zhang. Changes ------- Fix algorithm and cleanup after discussion with Xuefu. Original code was too aggressively incorporating connected mapjoins into its size calculation, new code only looks at the big table's connected mapjoins. Bugs: HIVE-8943 https://issues.apache.org/jira/browse/HIVE-8943 Repository: hive-git Description ------- SparkMapJoinOptimizer by default combines nested mapjoins into one work due to removal of RS for big-table. So we need to enhance the mapjoin check to calculate if all the MapJoins in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for that particular spark executor. Diffs (updated) ----- ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java 819eef1 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java 0c339a5 ql/src/test/queries/clientpositive/auto_join_stats.q PRE-CREATION ql/src/test/queries/clientpositive/auto_join_stats2.q PRE-CREATION ql/src/test/results/clientpositive/auto_join_stats.q.out PRE-CREATION ql/src/test/results/clientpositive/auto_join_stats2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/auto_join_stats.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/auto_join_stats2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/28500/diff/ Testing ------- Added two unit tests: 1. auto_join_stats, which sets a memory limit and checks that algorithm does not put more than 1 mapjoin in one BaseWork 2. auto_join_stats2, which is the same query without memory limit, and check that algorithm puts all mapjoin in one BaseWork because it can. Thanks, Szehon Ho