----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68109/#review206607 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java Line 354 (original), 355 (patched) <https://reviews.apache.org/r/68109/#comment289636> add a comment here: We assume columns are uncorrelated. That is filters on different columns will result in filtering out different rows. So, we scale down the ndv of a column only when row count is decreased by its own filter. Under correlated assumption, we would have scaled down ndv for every column for every filter condition. We dont do that. This makes our estimate more conservative than need to be which is good since this will result in overestimates when we are wrong but avoids OOM had we chosen the other assumption. In future, we need to capture correlatedness of columns in metadata so that we can account for that. ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java Line 2633 (original), 2647 (patched) <https://reviews.apache.org/r/68109/#comment289635> Add assert newNDV <= newNumRows. - Ashutosh Chauhan On July 30, 2018, 4:17 p.m., Zoltan Haindrich wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68109/ > ----------------------------------------------------------- > > (Updated July 30, 2018, 4:17 p.m.) > > > Review request for hive and Ashutosh Chauhan. > > > Bugs: HIVE-20260 > https://issues.apache.org/jira/browse/HIVE-20260 > > > Repository: hive-git > > > Description > ------- > > * keep track of used column; and only rescale affected columns > * much more conservative than old logic - possible too much... > * wip patch > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/AnnotateStatsProcCtx.java > 47ee949fbcfa9391c640719a57fab39279c009db > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java > 3c2b0854269d5426153958096a8b5b5ad3612c0f > ql/src/test/queries/clientpositive/stat_estimate_drill.q PRE-CREATION > ql/src/test/queries/clientpositive/stat_estimate_related_col.q > 52da2f759a009daa372a53446e2f0fd4a88152be > ql/src/test/results/clientpositive/stat_estimate_drill.q.out PRE-CREATION > ql/src/test/results/clientpositive/stat_estimate_related_col.q.out > 669adafda3a45f7846face3d99817cd1b9cb3664 > > > Diff: https://reviews.apache.org/r/68109/diff/1/ > > > Testing > ------- > > > Thanks, > > Zoltan Haindrich > >