> On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java, > > line 519 > > <https://reviews.apache.org/r/47554/diff/1/?file=1387390#file1387390line519> > > > > This is already called, before a call to procesDynP.. > > Do we need it again?
They are different methods: - _checkValidDynPartSortDedup_ checks whether the we have a valid chain of operators between pRS and cRS, and - _checkDynPartSortDedupPossible_ makes all the necessary checks to ensure that the transformation can be done. Nevertheless, checking back the code it seems like I should merge both methods. I have done that in the new patch and I have restructed a bit the code. > On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java, > > line 538 > > <https://reviews.apache.org/r/47554/diff/1/?file=1387390#file1387390line538> > > > > Can you add comments what are permissible operator sequences here? Currently the only valid sequence is pRS-SEL*-cRS. > On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java, > > line 584 > > <https://reviews.apache.org/r/47554/diff/1/?file=1387390#file1387390line584> > > > > Currently, _buceket_number_ comes in as constant when it really its > > not. Will that break assumption about ignoring constant? No, precisely that is the reason why we ignore constants. Also, because ordering by a constant will not change the order of the records. > On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java, > > lines 740-744 > > <https://reviews.apache.org/r/47554/diff/1/?file=1387390#file1387390line740> > > > > Should it be other way round? That we first try normal dedup and then > > extended. Extended is more aggressive; I think we should try being aggressive first. > On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java, > > line 556 > > <https://reviews.apache.org/r/47554/diff/1/?file=1387390#file1387390line556> > > > > Can you also add comment why this is valid only for RS added by SPDO > > and not in general? I was thinking further about this, and indeed I made many assumptions thinking that SPDO will not add the sort columns at the end of the additional RS keys, so then we need to make sure the ordering is correct... but this is not true, as it will add them. I think this optimization can be done general, as ordering before ordering again is a noop. I have changed the code accordingly, I would like your opinion. Am I missing something? Let's see what QA returns... - Jesús ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47554/#review133844 ----------------------------------------------------------- On May 18, 2016, 9:42 p.m., Jesús Camacho Rodríguez wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/47554/ > ----------------------------------------------------------- > > (Updated May 18, 2016, 9:42 p.m.) > > > Review request for hive and Ashutosh Chauhan. > > > Bugs: HIVE-13750 > https://issues.apache.org/jira/browse/HIVE-13750 > > > Repository: hive-git > > > Description > ------- > > HIVE-13750 > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java > 010c89ed978296709b052cc7bc80256a27658e2b > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java > 733620b84657a21829248afe72ab16ad9692f37e > ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java > d7e404c9946461e20357ed53dd8da468590683c6 > ql/src/test/results/clientpositive/dynpart_sort_opt_vectorization.q.out > d03bfe422743d9a5a6b85f9a6198e1e27024f129 > ql/src/test/results/clientpositive/dynpart_sort_optimization.q.out > dec872ab0eef54bd92d5c2bc068e2805cc14e272 > ql/src/test/results/clientpositive/dynpart_sort_optimization_acid.q.out > 832580325873dee741ba86239ee571873994a808 > ql/src/test/results/clientpositive/tez/dynpart_sort_opt_vectorization.q.out > a90e3f63b4646cf0ade9785a501ebd1a6b2a3406 > ql/src/test/results/clientpositive/tez/dynpart_sort_optimization.q.out > 723e8192f2735059005fc3c5c96732a2c4be49c1 > > Diff: https://reviews.apache.org/r/47554/diff/ > > > Testing > ------- > > > Thanks, > > Jesús Camacho Rodríguez > >
