----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30443/ -----------------------------------------------------------
Review request for hive and Xuefu Zhang. Repository: hive-git Description ------- This patch refactors SMB MapJoin optimizations in Spark to be one-pass. The main part of SMB MapJoin optimization is to annotate the MapWork with the information from SMBMapJoinOperator and its roots (TableScans). Instead of doing MapWork init/annotation in the SparkSortMergeJoinFactory in a second pass, now both GenSparkWork and SparkSortMergeJoinFactory classes collect information. After the one-pass, we go through all the SMBJoinOperators and annotate their mapworks. Diffs ----- ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java 6e0ac38 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 773cfbd ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 0eac6e1 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java cb5d4fe ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 3a7477a ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkSMBMapJoinInfo.java PRE-CREATION Diff: https://reviews.apache.org/r/30443/diff/ Testing ------- Thanks, Szehon Ho