-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27265/
-----------------------------------------------------------

(Updated Oct. 30, 2014, 10:44 p.m.)


Review request for hive.


Changes
-------

Rebase again after the downmerge.


Repository: hive-git


Description
-------

This change re-uses the SMBJoinOperator for Spark.  Background: the logical 
layer already converts joins to SMB Joins.  This changes just introduces a 
class called "SparkSortMergeJoinFactory" on the Spark-compile path which 
attaches the data structures (like local work, bucket info) to the MapWork for 
the SMBJoinOperator to consume.  It is largely-based on the MapReduce class 
"MapJoinFactory".

However, in spark-path, it is activated only for SMBJoin and not map-joins, as 
we have another strategy for map-joins.  That is why there's a new 
optimizer-rule called "TypeRule", so this processor is only run on 
SMBJoinOperators (which share same name with MapJoinOperators, which is needed 
for logical-optimizers dealing with hints).

One major assumption around the whole SMB concept is that both tables have 
corresponding buckets.  I found during testing of large numbers of buckets 
(like auto_sortmerge_join_16) that "insert" into a bucketed table wasn't 
putting the same keys in corresponding buckets.  I activated MR-style shuffle 
(hash-shuffle instead of total-order shuffle), and that seemed to solve the 
issue.


Diffs (updated)
-----

  itests/src/test/resources/testconfiguration.properties c178856 
  ql/src/java/org/apache/hadoop/hive/ql/lib/TypeRule.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 778a68a 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d33d877 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 3d08d49 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java b94db6b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 2d7a134 
  ql/src/test/results/clientpositive/spark/auto_join32.q.out 8d83188 
  ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 511231b 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 3b992d5 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_13.q.out fe7f7a9 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_14.q.out c325539 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_15.q.out ed53363 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 451aedb 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out 9897e11 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 51d68d8 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out ff3a23f 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_6.q.out 22a11ab 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 83739a7 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out 123923e 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_9.q.out deef226 
  ql/src/test/results/clientpositive/spark/bucket2.q.out 2fec831 
  ql/src/test/results/clientpositive/spark/bucket3.q.out 366e5e4 
  ql/src/test/results/clientpositive/spark/bucket4.q.out 2eda107 
  ql/src/test/results/clientpositive/spark/disable_merge_for_bucketing.q.out 
0c7160d 
  ql/src/test/results/clientpositive/spark/load_dyn_part2.q.out e52588c 
  ql/src/test/results/clientpositive/spark/parquet_join.q.out da2ce07 
  ql/src/test/results/clientpositive/spark/script_pipe.q.out 5b966ff 
  ql/src/test/results/clientpositive/spark/skewjoin.q.out c92415c 
  ql/src/test/results/clientpositive/spark/skewjoin_noskew.q.out 9cd9990 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out 1ec11d8 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_25.q.out 131f72d 
  ql/src/test/results/clientpositive/spark/tez_join_tests.q.out 9254944 

Diff: https://reviews.apache.org/r/27265/diff/


Testing
-------

Ran existing auto_sortmerge_* tests.


Thanks,

Szehon Ho

Reply via email to