-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29281/
-----------------------------------------------------------
(Updated Dec. 20, 2014, 3:32 a.m.)
Review request for hive.
Changes
-------
Removed the unnecessary type check.
Bugs: HIVE-8640
https://issues.apache.org/jira/browse/HIVE-8640
Repository: hive-git
Description
-------
This change is on the same principle as the refactoring of HIVE-8639. The goal
is to move as much of the join optimization as possible to the same traversal,
and in fact the same process(joinOp) method, to simplify the logic and also for
compiler performance.
Whereas it is too hard to bring SparkMapJoinProcessor (for mapjoin hints) into
the same level due to the way it was written (see HIVE-8911), it is possible to
bring Bucket join and SMB join hints to the same level. This change introduces
a parallel processor called 'SparkJoinHintOptimizer', which takes a mapjoin
already converted by SparkMapJoinProcessor as input and converts it to
Bucket/SMB join accordingly. It runs alongside 'SparkJoinOptimizer' which
takes a common join operator and handles the auto-conversion to
mapjoin/bucketJoin/SMBJoin.
The one difference between mapjoin/bucketJoin vs SMB as Chao found was that
while Spark mapjoins expect RS for small-table branches in mapjoin/bucketJoin,
this is not expected for SMB join. So I added a class
SparkSMBHintJoinOptimizer that first removes this before re-using the rest of
the existing code.
Another issue was found in NonBlockingOpDeDupProc that corrupts
'mapJoinContext' data structure in the parse context. A fix is offered in
HIVE-9117 and that should be committed to trunk and merged first, but it is
included here for reference.
Diffs (updated)
-----
itests/src/test/resources/testconfiguration.properties fd732c1
ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
5e0959a
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinHintOptimizer.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSMBJoinHintOptimizer.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinOptimizer.java
6a47513
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 5227d92
ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out b18e02f
ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out bb7214c
ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out c0adef4
ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out 98d0706
ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out ea763c7
ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 1b31561
ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out 97d2d74
ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out 94952a1
ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out ca59d02
ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out f419eaf
ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out
b954feb
ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out
bfe5438
ql/src/test/results/clientpositive/spark/smb_mapjoin9.q.out d769ebe
ql/src/test/results/clientpositive/spark/smb_mapjoin_1.q.out 8d0527e
ql/src/test/results/clientpositive/spark/smb_mapjoin_10.q.out 2df87cf
ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out PRE-CREATION
ql/src/test/results/clientpositive/spark/smb_mapjoin_12.q.out PRE-CREATION
ql/src/test/results/clientpositive/spark/smb_mapjoin_13.q.out 5637206
ql/src/test/results/clientpositive/spark/smb_mapjoin_14.q.out 3aed084
ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out 6ed680d
ql/src/test/results/clientpositive/spark/smb_mapjoin_16.q.out a4fd7c3
ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out 6293450
ql/src/test/results/clientpositive/spark/smb_mapjoin_2.q.out 1cf144b
ql/src/test/results/clientpositive/spark/smb_mapjoin_3.q.out 6b44d2c
ql/src/test/results/clientpositive/spark/smb_mapjoin_4.q.out d07d65a
ql/src/test/results/clientpositive/spark/smb_mapjoin_5.q.out 607b1f0
ql/src/test/results/clientpositive/spark/smb_mapjoin_6.q.out 30746ff
ql/src/test/results/clientpositive/spark/smb_mapjoin_7.q.out c48ed6d
Diff: https://reviews.apache.org/r/29281/diff/
Testing
-------
Re-enabled all the smb_mapjoin.* tests.
I saw that a lot of the tests are again not alphabetized, so re-ran the script
to alphabeticize them. As part of that, realized that some tests like
'bucket_map_join_spark.*' and 'join_empty' were missing proper comma
deliminters from the next test and probably not ran. Also fixed the
windowing.q which is the last test. This is all unrelated, but I am not sure
if they will trigger additional test failures if these were unintentionally
disabled.
Thanks,
Szehon Ho