----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28889/ -----------------------------------------------------------
Review request for hive, Szehon Ho and Xuefu Zhang. Bugs: HIVE-8911 https://issues.apache.org/jira/browse/HIVE-8911 Repository: hive-git Description ------- Basically the idea is to reuse as much code as possible, from MR. The issue is that, in MR's MapJoinProcessor, after join op is converted to mapjoin op, all the parents ReduceSinkOperators are removed. However, for our Spark branch, we need to preserve those, because they serve as boundaries between BaseWorks, and SparkReduceSinkMapJoinProc triggers upon them. Initially I tried to move this part of logic to SparkMapJoinOptimizer, which happens at a later stage. However, although this works, I'm worried it may have too much affect on the smb join w/ hint, because we then have to move that part of logic to SparkMapJoinOptimizer too. In general, I want to minimize the affect on code path. This patch make changes on MapJoinProcessor. I created a separate method convertMapJoinForSpark, which doesn't remove the ReduceSinkOperators, for small tables. Then, in the transform method it decides which method to call based on the execution engine. I also have to disable several tests related to smb join w/ hints. They can be activated once HIVE-8640 is resolved. Diffs ----- data/conf/spark/hive-site.xml 44eac86 itests/src/test/resources/testconfiguration.properties d6f8267 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 773c827 ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out f24ae73 ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out 33e9e8b ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out aaa0151 ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 9954b77 ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out ad8f0a5 ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out aa3e2b6 ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out 44233f6 ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out c4702ef ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 7c31e05 ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out a8e892e ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out 041ba12 ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out 54c4be3 ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out da9fe1c ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out 5a5e3f6 ql/src/test/results/clientpositive/spark/bucketmapjoin_negative.q.out 5ac3f4c ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out e4ff965 ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out fce5566 ql/src/test/results/clientpositive/spark/join25.q.out 284c97d ql/src/test/results/clientpositive/spark/join26.q.out e271184 ql/src/test/results/clientpositive/spark/join27.q.out d31f29e ql/src/test/results/clientpositive/spark/join30.q.out 7fbbcfa ql/src/test/results/clientpositive/spark/join36.q.out f1317ea ql/src/test/results/clientpositive/spark/join37.q.out 448e983 ql/src/test/results/clientpositive/spark/join38.q.out 735d7ea ql/src/test/results/clientpositive/spark/join39.q.out 0734d4b ql/src/test/results/clientpositive/spark/join40.q.out 60ef13d ql/src/test/results/clientpositive/spark/join_map_ppr.q.out 59fdb99 ql/src/test/results/clientpositive/spark/mapjoin1.q.out 80e38b9 ql/src/test/results/clientpositive/spark/mapjoin_distinct.q.out dc7241c ql/src/test/results/clientpositive/spark/mapjoin_filter_on_outerjoin.q.out 3b80437 ql/src/test/results/clientpositive/spark/mapjoin_test_outer.q.out fdf8f24 ql/src/test/results/clientpositive/spark/semijoin.q.out 2b8e04b ql/src/test/results/clientpositive/spark/skewjoin.q.out 56b78be Diff: https://reviews.apache.org/r/28889/diff/ Testing ------- bucket_map_join_1.q bucket_map_join_2.q bucketmapjoin1.q bucketmapjoin10.q bucketmapjoin11.q bucketmapjoin12.q bucketmapjoin13.q bucketmapjoin2.q bucketmapjoin3.q bucketmapjoin4.q bucketmapjoin5.q bucketmapjoin7.q bucketmapjoin8.q bucketmapjoin9.q bucketmapjoin_negative.q bucketmapjoin_negative2.q column_access_stats.q join25.q join26.q join27.q join30.q join36.q join37.q join38.q join39.q join40.q join_empty.q join_filters_overlap.q join_map_ppr.q mapjoin1.q mapjoin_distinct.q mapjoin_filter_onerjoin.q mapjoin_hook.q mapjoin_tester.q semijoin.q skewjoin.q table_access_keys_stats.q Thanks, Chao Sun