----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27265/ -----------------------------------------------------------
(Updated Oct. 30, 2014, 7:02 p.m.) Review request for hive. Changes ------- Rebase with mapjoin operator changes, and address review comments. Repository: hive-git Description ------- This change re-uses the SMBJoinOperator for Spark. Background: the logical layer already converts joins to SMB Joins. This changes just introduces a class called "SparkSortMergeJoinFactory" on the Spark-compile path which attaches the data structures (like local work, bucket info) to the MapWork for the SMBJoinOperator to consume. It is largely-based on the MapReduce class "MapJoinFactory". However, in spark-path, it is activated only for SMBJoin and not map-joins, as we have another strategy for map-joins. That is why there's a new optimizer-rule called "TypeRule", so this processor is only run on SMBJoinOperators (which share same name with MapJoinOperators, which is needed for logical-optimizers dealing with hints). One major assumption around the whole SMB concept is that both tables have corresponding buckets. I found during testing of large numbers of buckets (like auto_sortmerge_join_16) that "insert" into a bucketed table wasn't putting the same keys in corresponding buckets. I activated MR-style shuffle (hash-shuffle instead of total-order shuffle), and that seemed to solve the issue. Diffs (updated) ----- itests/src/test/resources/testconfiguration.properties c429799 ql/src/java/org/apache/hadoop/hive/ql/lib/TypeRule.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java da764cf ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d33d877 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 3d08d49 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java b94db6b ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 2d7a134 ql/src/test/results/clientpositive/spark/auto_join32.q.out 8d83188 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out e64d4fb ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 9158d65 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_13.q.out a5a281b ql/src/test/results/clientpositive/spark/auto_sortmerge_join_14.q.out 2fc3bb6 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_15.q.out 74cbd7c ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out d1bb7a0 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out d57a1d7 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 8244c50 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 2ab1bca ql/src/test/results/clientpositive/spark/auto_sortmerge_join_6.q.out bc4a163 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 16ef3ae ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out 9fd3e5a ql/src/test/results/clientpositive/spark/auto_sortmerge_join_9.q.out a7f994f ql/src/test/results/clientpositive/spark/bucket2.q.out b1b2997 ql/src/test/results/clientpositive/spark/bucket3.q.out 019c11a ql/src/test/results/clientpositive/spark/bucket4.q.out 2cbab11 ql/src/test/results/clientpositive/spark/disable_merge_for_bucketing.q.out 590b265 ql/src/test/results/clientpositive/spark/load_dyn_part2.q.out f8f8971 ql/src/test/results/clientpositive/spark/parquet_join.q.out d5a8684 ql/src/test/results/clientpositive/spark/script_pipe.q.out 5b966ff ql/src/test/results/clientpositive/spark/skewjoin.q.out d674d04 ql/src/test/results/clientpositive/spark/skewjoin_noskew.q.out d45cdd3 ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out 482268c ql/src/test/results/clientpositive/spark/smb_mapjoin_25.q.out efa38d4 ql/src/test/results/clientpositive/spark/tez_join_tests.q.out 9254944 Diff: https://reviews.apache.org/r/27265/diff/ Testing ------- Ran existing auto_sortmerge_* tests. Thanks, Szehon Ho