> On June 9, 2018, 6:02 a.m., Gunther Hagleitner wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java > > Lines 167 (patched) > > <https://reviews.apache.org/r/67296/diff/4/?file=2036590#file2036590line167> > > > > This should be ok to do always (regardless of whether it's map or > > reduce)? If not can you explain why?
It maybe, however, I just wanted to isolate the cases which fail and to not introduce unwanted bugs. There are 3 types of joins using this operator 1. Shuffle Join on reducer does not need to flush out the last record as it gets it as soon as the record is read. 2. Mapside SMB also gets the records as soon as read as the TS Op does not keep them holding. 3. ReduceSide SMB is the only one which is prevented from the last record as it is held by GBY Op and only flushed out when Op is closed leading to potential join misses. > On June 9, 2018, 6:02 a.m., Gunther Hagleitner wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java > > Lines 547 (patched) > > <https://reviews.apache.org/r/67296/diff/4/?file=2036590#file2036590line548> > > > > please don't do this in the inner loop. this has to be removed or > > changed. if you want a safety check, just write an if loop. Sure, will do that. > On June 9, 2018, 6:02 a.m., Gunther Hagleitner wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java > > Line 246 (original), 249 (patched) > > <https://reviews.apache.org/r/67296/diff/4/?file=2036592#file2036592line249> > > > > does this need a parallel fix for the vectorized path? There is no vectorized path for this operator. > On June 9, 2018, 6:02 a.m., Gunther Hagleitner wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java > > Lines 261 (patched) > > <https://reviews.apache.org/r/67296/diff/4/?file=2036592#file2036592line261> > > > > You're only doing this for the "reducer". The Operator<> class > > implements this as an empty method. Theoretically if you have a plan like: > > > > Fil -> Gby -> Join > > > > This fix wouldn't work, because the Fil would be the reducer and it > > doesn't forward the flush to Gby. > > > > Unfortunately DemuxOperator uses flush and I don't know if recursive > > would work for it. So maybe add a > > > > flushRecursive() { > > flush(); > > for all children: > > child.flushRecursive() > > } > > > > in Operator and use it? Thanks. It definitely looks more fool proof. > On June 9, 2018, 6:02 a.m., Gunther Hagleitner wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java > > Lines 518 (patched) > > <https://reviews.apache.org/r/67296/diff/4/?file=2036592#file2036592line518> > > > > Stylistically I think "setFlushLastRecord(boolean flushLastRecord)" > > would be nicer. the idea is to prevent user from setting it to false. Anyway, I will do it. > On June 9, 2018, 6:02 a.m., Gunther Hagleitner wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java > > Lines 611 (patched) > > <https://reviews.apache.org/r/67296/diff/4/?file=2036593#file2036593line611> > > > > Open follow up jira? Still doesn't make sense that this is needed given > > the code below. https://issues.apache.org/jira/browse/HIVE-19840 > On June 9, 2018, 6:02 a.m., Gunther Hagleitner wrote: > > ql/src/test/results/clientpositive/llap/mrr.q.out > > Line 460 (original), 460 (patched) > > <https://reviews.apache.org/r/67296/diff/4/?file=2036608#file2036608line460> > > > > these diffs are weird... but ok. yeah, I got a merge conflict due to these when rebased. Weird. - Deepak ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67296/#review204524 ----------------------------------------------------------- On June 9, 2018, 2:19 a.m., Deepak Jaiswal wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67296/ > ----------------------------------------------------------- > > (Updated June 9, 2018, 2:19 a.m.) > > > Review request for hive, Gunther Hagleitner and Jason Dere. > > > Bugs: HIVE-18875 > https://issues.apache.org/jira/browse/HIVE-18875 > > > Repository: hive-git > > > Description > ------- > > Fixed various issues with SMB, mostly on the Reducer side join. > GBY Op now uses inputObjectInspector[0] all the time as it is the only OI it > has. The tag is irrelevant here. Was causing problem with SMB. > Disabled SMB in spark on hive tests as the same config for Tez was enabling > it there. > Some SMB specific tests were designed to first run without SMB and then with > SMB. With SMB enabled by default, it is explicitely turned off to make sure > the behavior is maintained. > > Please go through JIRA comments as they may clear out some questions. > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b5e2d86e62 > itests/src/test/resources/testconfiguration.properties b584c72650 > ql/src/java/org/apache/hadoop/hive/ql/exec/CommonMergeJoinOperator.java > aefaa0586e > ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 4b766382ef > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java > fca783c35e > ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java > 4019f132d3 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java > 9e5446566b > ql/src/test/queries/clientpositive/auto_sortmerge_join_11.q 7416eb0ec0 > ql/src/test/queries/clientpositive/skewjoinopt19.q 02cadda7f5 > ql/src/test/queries/clientpositive/skewjoinopt20.q 160e5b82d9 > ql/src/test/queries/clientpositive/smb_mapjoin_11.q 6ce49b83c2 > ql/src/test/queries/clientpositive/smb_mapjoin_12.q 753e4d3c9a > ql/src/test/queries/clientpositive/smb_mapjoin_17.q d68f5f3139 > ql/src/test/queries/clientpositive/subquery_notin.q 64940277bb > ql/src/test/queries/clientpositive/tez_smb_reduce_side.q PRE-CREATION > ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out > 8e17d952d4 > ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out > 9e424c2f16 > ql/src/test/results/clientpositive/llap/explainuser_1.q.out 0ebd5caf28 > ql/src/test/results/clientpositive/llap/limit_pushdown.q.out fe8b98f21f > ql/src/test/results/clientpositive/llap/mergejoin.q.out 832ed487ec > ql/src/test/results/clientpositive/llap/mrr.q.out cb25b8c2f9 > ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out > ca0de47b5a > ql/src/test/results/clientpositive/llap/smb_cache.q.out 7c885d1ffa > ql/src/test/results/clientpositive/llap/smb_mapjoin_14.q.out c334b9386b > ql/src/test/results/clientpositive/llap/smb_mapjoin_15.q.out 21aac455f2 > ql/src/test/results/clientpositive/llap/smb_mapjoin_4.q.out 4b8728fbff > ql/src/test/results/clientpositive/llap/smb_mapjoin_5.q.out a1313696f0 > ql/src/test/results/clientpositive/llap/smb_mapjoin_6.q.out 3e5acd08a7 > ql/src/test/results/clientpositive/llap/subquery_in_having.q.out b4ce6f8777 > ql/src/test/results/clientpositive/llap/subquery_notin.q.out 21a0f84f33 > ql/src/test/results/clientpositive/llap/tez_smb_reduce_side.q.out > PRE-CREATION > ql/src/test/results/clientpositive/llap/vectorized_bucketmapjoin1.q.out > 61c5051bb9 > ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out a79a8c466a > ql/src/test/results/clientpositive/spark/smb_mapjoin_14.q.out 1fd4490ac4 > ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out 6ca577fdbb > ql/src/test/results/clientpositive/spark/smb_mapjoin_4.q.out 629a6c428a > ql/src/test/results/clientpositive/spark/smb_mapjoin_5.q.out 7d0934010e > ql/src/test/results/clientpositive/spark/smb_mapjoin_6.q.out 7445135159 > ql/src/test/results/clientpositive/spark/subquery_notin.q.out a53c31353b > > > Diff: https://reviews.apache.org/r/67296/diff/4/ > > > Testing > ------- > > > Thanks, > > Deepak Jaiswal > >