> On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java,
> >  line 556
> > <https://reviews.apache.org/r/47554/diff/1/?file=1387390#file1387390line556>
> >
> >     Can you also add comment why this is valid only for RS added by SPDO 
> > and not in general?
> 
> Jesús Camacho Rodríguez wrote:
>     I was thinking further about this, and indeed I made many assumptions 
> thinking that SPDO will not add the sort columns at the end of the additional 
> RS keys, so then we need to make sure the ordering is correct... but this is 
> not true, as it will add them.
>     
>     I think this optimization can be done general, as ordering before 
> ordering again is a noop. I have changed the code accordingly, I would like 
> your opinion. Am I missing something? Let's see what QA returns...

yeah.. thats what I think.. this is not specific to SPDO at logical layer. 
Thing I am not 100% sure about is in FileSink the way SPDO is implemented I 
*think* it assumes all rows for a particular key come in sorted and in one 
batch. Now, when we merge and sort by two keys and if there is a case that FS 
was expecting all rows sorted on second key only, now will get them sorted on 2 
keys and it may have to close ORC writer and then reopen again, as oppose to 
write it once and never opening it again.


> On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java,
> >  line 584
> > <https://reviews.apache.org/r/47554/diff/1/?file=1387390#file1387390line584>
> >
> >     Currently, _buceket_number_ comes in as constant when it really its 
> > not. Will that break assumption about ignoring constant?
> 
> Jesús Camacho Rodríguez wrote:
>     No, precisely that is the reason why we ignore constants. Also, because 
> ordering by a constant will not change the order of the records.

Problem is although type of _bucket_number_ is ExprNodeConstantDesc, its *not* 
a constant. Its created as a placeholder and value of this column is calculated 
on a per-row basis in ReduceSinkOperator::computeBucketNumber() which means at 
run time its not a constant. Correct fix for this is create _bucket_number_ as 
ExprNodeColumnDesc in SPDO.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47554/#review133844
-----------------------------------------------------------


On May 19, 2016, 10:49 a.m., Jesús Camacho Rodríguez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47554/
> -----------------------------------------------------------
> 
> (Updated May 19, 2016, 10:49 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-13750
>     https://issues.apache.org/jira/browse/HIVE-13750
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-13750
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
>  010c89ed978296709b052cc7bc80256a27658e2b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
>  733620b84657a21829248afe72ab16ad9692f37e 
>   ql/src/test/results/clientpositive/dynpart_sort_opt_vectorization.q.out 
> d03bfe422743d9a5a6b85f9a6198e1e27024f129 
>   ql/src/test/results/clientpositive/dynpart_sort_optimization.q.out 
> dec872ab0eef54bd92d5c2bc068e2805cc14e272 
>   ql/src/test/results/clientpositive/dynpart_sort_optimization_acid.q.out 
> 832580325873dee741ba86239ee571873994a808 
>   ql/src/test/results/clientpositive/reducesink_dedup.q.out 
> b89df52f965385b85894757896eee487b29c52ae 
>   ql/src/test/results/clientpositive/tez/dynpart_sort_opt_vectorization.q.out 
> a90e3f63b4646cf0ade9785a501ebd1a6b2a3406 
>   ql/src/test/results/clientpositive/tez/dynpart_sort_optimization.q.out 
> 723e8192f2735059005fc3c5c96732a2c4be49c1 
> 
> Diff: https://reviews.apache.org/r/47554/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Reply via email to