-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28500/
-----------------------------------------------------------

(Updated Dec. 2, 2014, 1:34 a.m.)


Review request for hive, Chao Sun, Suhas Satish, and Xuefu Zhang.


Changes
-------

Fix algorithm and cleanup after discussion with Xuefu.  Original code was too 
aggressively incorporating connected mapjoins into its size calculation, new 
code only looks at the big table's connected mapjoins.


Bugs: HIVE-8943
    https://issues.apache.org/jira/browse/HIVE-8943


Repository: hive-git


Description
-------

SparkMapJoinOptimizer by default combines nested mapjoins into one work due to 
removal of RS for big-table. So we need to enhance the mapjoin check to 
calculate if all the MapJoins in that work (spark-stage) will fit into the 
memory, otherwise it might overwhelm memory for that particular spark executor.


Diffs (updated)
-----

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 819eef1 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java 
0c339a5 
  ql/src/test/queries/clientpositive/auto_join_stats.q PRE-CREATION 
  ql/src/test/queries/clientpositive/auto_join_stats2.q PRE-CREATION 
  ql/src/test/results/clientpositive/auto_join_stats.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/auto_join_stats2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/auto_join_stats.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/auto_join_stats2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/28500/diff/


Testing
-------

Added two unit tests:

1.  auto_join_stats, which sets a memory limit and checks that algorithm does 
not put more than 1 mapjoin in one BaseWork
2.  auto_join_stats2, which is the same query without memory limit, and check 
that algorithm puts all mapjoin in one BaseWork because it can.


Thanks,

Szehon Ho

Reply via email to