-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68109/#review206607
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Line 354 (original), 355 (patched)
<https://reviews.apache.org/r/68109/#comment289636>

    add a comment here:
    We assume columns are uncorrelated. That is filters on different columns 
will result in filtering out different rows. So, we scale down the ndv of a 
column only when row count is decreased by its own filter. Under correlated 
assumption, we would have scaled down ndv for every column for every filter 
condition. We dont do that. 
    This makes our estimate more conservative than need to be which is good 
since this will result in overestimates when we are wrong but avoids OOM had we 
chosen the other assumption. In future, we need to capture correlatedness of 
columns in metadata so that we can account for that.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Line 2633 (original), 2647 (patched)
<https://reviews.apache.org/r/68109/#comment289635>

    Add assert newNDV <= newNumRows.


- Ashutosh Chauhan


On July 30, 2018, 4:17 p.m., Zoltan Haindrich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68109/
> -----------------------------------------------------------
> 
> (Updated July 30, 2018, 4:17 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20260
>     https://issues.apache.org/jira/browse/HIVE-20260
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> * keep track of used column; and only rescale affected columns
> * much more conservative than old logic - possible too much...
> * wip patch
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/AnnotateStatsProcCtx.java
>  47ee949fbcfa9391c640719a57fab39279c009db 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
>  3c2b0854269d5426153958096a8b5b5ad3612c0f 
>   ql/src/test/queries/clientpositive/stat_estimate_drill.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/stat_estimate_related_col.q 
> 52da2f759a009daa372a53446e2f0fd4a88152be 
>   ql/src/test/results/clientpositive/stat_estimate_drill.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/stat_estimate_related_col.q.out 
> 669adafda3a45f7846face3d99817cd1b9cb3664 
> 
> 
> Diff: https://reviews.apache.org/r/68109/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>

Reply via email to