[ https://issues.apache.org/jira/browse/HIVE-20260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562086#comment-16562086 ]
Zoltan Haindrich commented on HIVE-20260: ----------------------------------------- I feel that this logic might need to be rethinked at some point...relying the calculation more on the the column stats - however I'm afraid that won't be possible since they might not be available all the time... I've introduced some logic to keep track of the affected columns; this makes it much better. However...a full test run is needed to see if it causes any trouble for other queries https://reviews.apache.org/r/68109/ > NDV of a column shouldn't be scaled when row count is changed by filter on > another column > ----------------------------------------------------------------------------------------- > > Key: HIVE-20260 > URL: https://issues.apache.org/jira/browse/HIVE-20260 > Project: Hive > Issue Type: Improvement > Components: Statistics > Reporter: Ashutosh Chauhan > Assignee: Zoltan Haindrich > Priority: Major > Attachments: HIVE-20260.01wip01.patch, HIVE-20260.01wip02.patch > > > HIVE-17465 introduced progressive scaling of rowcounts in presence of > multiple filters. HIVE-19500 improved on that by also scaling col stats (NDV) > in such scenario. However, it should pay attention to column used in filter > expression and not scale for all filters. eg., > consider filter a = 1 and b = 2 ndv of column b should not be scaled down by > row count changes caused by a = 1 > Other way to say this that ndv of a particular column should be updated at > the end of computation of row count for that operator. > Here are the possible cases where our estimates can be accurate (or close to) > {code} > case 1 - (d_year = 2001 and d_moy=1) > case 2 - (d_year = 2001 and d_year IN (2001, 2002)) > case 3 - (d_year = 2001 and d_moy = 1 and d_dom = 1) > case 4 - (d_date IN ('1999-01-02', '1999-01-02')) > case 5 - (d_date = '1999-01-01') > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)