> On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java,
> >  line 556
> > <https://reviews.apache.org/r/47554/diff/1/?file=1387390#file1387390line556>
> >
> >     Can you also add comment why this is valid only for RS added by SPDO 
> > and not in general?
> 
> Jesús Camacho Rodríguez wrote:
>     I was thinking further about this, and indeed I made many assumptions 
> thinking that SPDO will not add the sort columns at the end of the additional 
> RS keys, so then we need to make sure the ordering is correct... but this is 
> not true, as it will add them.
>     
>     I think this optimization can be done general, as ordering before 
> ordering again is a noop. I have changed the code accordingly, I would like 
> your opinion. Am I missing something? Let's see what QA returns...
> 
> Ashutosh Chauhan wrote:
>     yeah.. thats what I think.. this is not specific to SPDO at logical 
> layer. 
>     Thing I am not 100% sure about is in FileSink the way SPDO is implemented 
> I *think* it assumes all rows for a particular key come in sorted and in one 
> batch. Now, when we merge and sort by two keys and if there is a case that FS 
> was expecting all rows sorted on second key only, now will get them sorted on 
> 2 keys and it may have to close ORC writer and then reopen again, as oppose 
> to write it once and never opening it again.
> 
> Jesús Camacho Rodríguez wrote:
>     I am trying to understand this part... The way the optimization works, we 
> will keep the second RS with its keys, we just remove the first RS... Thus, 
> this should not be a problem? Or maybe I am not understanding it correctly
> 
> Ashutosh Chauhan wrote:
>     I think you are correct. Since relevant FS is active in last reducer 
> containing RS which we is not changing, we are not changing any assumptions 
> for FS.
> 
> Ashutosh Chauhan wrote:
>     On a second thought I think there is more to it. e.g. we are transforming 
> RS (a) followed by RS (a,b) to RS(a,b) Than this is not valid transformation 
> in general, only if second RS is introduced by SPDO. e.g, if these two RS are 
> part of two GBYs (or a join & GBy), than its invalid because partitioning on 
> (a,b) != partitioning on (a)

But aggressiveDedup is really restrictive: only pRS-SEL*-cRS, thus we would 
bail out in those specific cases.


- Jesús


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47554/#review133844
-----------------------------------------------------------


On May 19, 2016, 10:49 a.m., Jesús Camacho Rodríguez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47554/
> -----------------------------------------------------------
> 
> (Updated May 19, 2016, 10:49 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-13750
>     https://issues.apache.org/jira/browse/HIVE-13750
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-13750
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
>  010c89ed978296709b052cc7bc80256a27658e2b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
>  733620b84657a21829248afe72ab16ad9692f37e 
>   ql/src/test/results/clientpositive/dynpart_sort_opt_vectorization.q.out 
> d03bfe422743d9a5a6b85f9a6198e1e27024f129 
>   ql/src/test/results/clientpositive/dynpart_sort_optimization.q.out 
> dec872ab0eef54bd92d5c2bc068e2805cc14e272 
>   ql/src/test/results/clientpositive/dynpart_sort_optimization_acid.q.out 
> 832580325873dee741ba86239ee571873994a808 
>   ql/src/test/results/clientpositive/reducesink_dedup.q.out 
> b89df52f965385b85894757896eee487b29c52ae 
>   ql/src/test/results/clientpositive/tez/dynpart_sort_opt_vectorization.q.out 
> a90e3f63b4646cf0ade9785a501ebd1a6b2a3406 
>   ql/src/test/results/clientpositive/tez/dynpart_sort_optimization.q.out 
> 723e8192f2735059005fc3c5c96732a2c4be49c1 
> 
> Diff: https://reviews.apache.org/r/47554/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Reply via email to