Re: Review Request 29281: HIVE-8640 : Support hints of SMBJoin [Spark Branch]

Szehon Ho Fri, 19 Dec 2014 19:33:07 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29281/
-----------------------------------------------------------


(Updated Dec. 20, 2014, 3:32 a.m.)


Review request for hive.


Changes
-------

Removed the unnecessary type check.


Bugs: HIVE-8640
    https://issues.apache.org/jira/browse/HIVE-8640


Repository: hive-git


Description
-------

This change is on the same principle as the refactoring of HIVE-8639.  The goal 
is to move as much of the join optimization as possible to the same traversal, 
and in fact the same process(joinOp) method, to simplify the logic and also for 
compiler performance.

Whereas it is too hard to bring SparkMapJoinProcessor (for mapjoin hints) into 
the same level due to the way it was written (see HIVE-8911), it is possible to 
bring Bucket join and SMB join hints to the same level.  This change introduces 
a parallel processor called 'SparkJoinHintOptimizer', which takes a mapjoin 
already converted by SparkMapJoinProcessor as input and converts it to 
Bucket/SMB join accordingly.  It runs alongside 'SparkJoinOptimizer' which 
takes a common join operator and handles the auto-conversion to 
mapjoin/bucketJoin/SMBJoin.

The one difference between mapjoin/bucketJoin vs SMB as Chao found was that 
while Spark mapjoins expect RS for small-table branches in mapjoin/bucketJoin, 
this is not expected for SMB join.  So I added a class 
SparkSMBHintJoinOptimizer that first removes this before re-using the rest of 
the existing code.

Another issue was found in NonBlockingOpDeDupProc that corrupts 
'mapJoinContext' data structure in the parse context.  A fix is offered in 
HIVE-9117 and that should be committed to trunk and merged first, but it is 
included here for reference.


Diffs (updated)
-----

  itests/src/test/resources/testconfiguration.properties fd732c1 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java 
5e0959a 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinHintOptimizer.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSMBJoinHintOptimizer.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinOptimizer.java
 6a47513 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 5227d92 
  ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out b18e02f 
  ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out bb7214c 
  ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out c0adef4 
  ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out 98d0706 
  ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out ea763c7 
  ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 1b31561 
  ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out 97d2d74 
  ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out 94952a1 
  ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out ca59d02 
  ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out f419eaf 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out 
b954feb 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out 
bfe5438 
  ql/src/test/results/clientpositive/spark/smb_mapjoin9.q.out d769ebe 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_1.q.out 8d0527e 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_10.q.out 2df87cf 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_12.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_13.q.out 5637206 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_14.q.out 3aed084 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out 6ed680d 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_16.q.out a4fd7c3 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out 6293450 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_2.q.out 1cf144b 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_3.q.out 6b44d2c 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_4.q.out d07d65a 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_5.q.out 607b1f0 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_6.q.out 30746ff 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_7.q.out c48ed6d 

Diff: https://reviews.apache.org/r/29281/diff/


Testing
-------

Re-enabled all the smb_mapjoin.* tests.

I saw that a lot of the tests are again not alphabetized, so re-ran the script 
to alphabeticize them.  As part of that, realized that some tests like 
'bucket_map_join_spark.*' and 'join_empty' were missing proper comma 
deliminters from the next test and probably not ran.  Also fixed the 
windowing.q which is the last test.  This is all unrelated, but I am not sure 
if they will trigger additional test failures if these were unintentionally 
disabled.


Thanks,

Szehon Ho

Re: Review Request 29281: HIVE-8640 : Support hints of SMBJoin [Spark Branch]

Reply via email to