-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27265/
-----------------------------------------------------------

(Updated Oct. 30, 2014, 7:02 p.m.)


Review request for hive.


Changes
-------

Rebase with mapjoin operator changes, and address review comments.


Repository: hive-git


Description
-------

This change re-uses the SMBJoinOperator for Spark.  Background: the logical 
layer already converts joins to SMB Joins.  This changes just introduces a 
class called "SparkSortMergeJoinFactory" on the Spark-compile path which 
attaches the data structures (like local work, bucket info) to the MapWork for 
the SMBJoinOperator to consume.  It is largely-based on the MapReduce class 
"MapJoinFactory".

However, in spark-path, it is activated only for SMBJoin and not map-joins, as 
we have another strategy for map-joins.  That is why there's a new 
optimizer-rule called "TypeRule", so this processor is only run on 
SMBJoinOperators (which share same name with MapJoinOperators, which is needed 
for logical-optimizers dealing with hints).

One major assumption around the whole SMB concept is that both tables have 
corresponding buckets.  I found during testing of large numbers of buckets 
(like auto_sortmerge_join_16) that "insert" into a bucketed table wasn't 
putting the same keys in corresponding buckets.  I activated MR-style shuffle 
(hash-shuffle instead of total-order shuffle), and that seemed to solve the 
issue.


Diffs (updated)
-----

  itests/src/test/resources/testconfiguration.properties c429799 
  ql/src/java/org/apache/hadoop/hive/ql/lib/TypeRule.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java da764cf 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d33d877 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 3d08d49 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java b94db6b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 2d7a134 
  ql/src/test/results/clientpositive/spark/auto_join32.q.out 8d83188 
  ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out e64d4fb 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 9158d65 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_13.q.out a5a281b 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_14.q.out 2fc3bb6 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_15.q.out 74cbd7c 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out d1bb7a0 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out d57a1d7 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 8244c50 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 2ab1bca 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_6.q.out bc4a163 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 16ef3ae 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out 9fd3e5a 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_9.q.out a7f994f 
  ql/src/test/results/clientpositive/spark/bucket2.q.out b1b2997 
  ql/src/test/results/clientpositive/spark/bucket3.q.out 019c11a 
  ql/src/test/results/clientpositive/spark/bucket4.q.out 2cbab11 
  ql/src/test/results/clientpositive/spark/disable_merge_for_bucketing.q.out 
590b265 
  ql/src/test/results/clientpositive/spark/load_dyn_part2.q.out f8f8971 
  ql/src/test/results/clientpositive/spark/parquet_join.q.out d5a8684 
  ql/src/test/results/clientpositive/spark/script_pipe.q.out 5b966ff 
  ql/src/test/results/clientpositive/spark/skewjoin.q.out d674d04 
  ql/src/test/results/clientpositive/spark/skewjoin_noskew.q.out d45cdd3 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out 482268c 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_25.q.out efa38d4 
  ql/src/test/results/clientpositive/spark/tez_join_tests.q.out 9254944 

Diff: https://reviews.apache.org/r/27265/diff/


Testing
-------

Ran existing auto_sortmerge_* tests.


Thanks,

Szehon Ho

Reply via email to