[ https://issues.apache.org/jira/browse/HIVE-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877945#comment-16877945 ]
Jesus Camacho Rodriguez edited comment on HIVE-21928 at 7/3/19 4:08 PM: ------------------------------------------------------------------------ [~kgyrtkirk], can you take a look at https://github.com/apache/hive/pull/700 ? Current patch will scale the ndv for all columns involved in AND clauses proportionally to the reduction in the number of rows. As you can see from the q file changes, the net effect is the increase in the estimated number of rows for joins that follow another join. bq. I think the following continue block should be removed; even thru the rowcount is not changed; the affectedcolumns might have, is there any reason I don't see why we should do it? If you consider each column independent, then we could skip... but maybe this should only be done once we compute the reduction ratio per column as discussed above, as currently we just scale the ndv for all columns involved in the predicate proportionally. I will create a follow-up. was (Author: jcamachorodriguez): [~kgyrtkirk], can you take a look at https://github.com/apache/hive/pull/700 ? Current patch will scale the ndv for all columns involved in AND clauses proportionally to the reduction in the number of rows. As you can see from the q file changes, the net effect is the increase in the estimated number of rows for joins that follow another join. bq. I think the following continue block should be removed; even thru the rowcount is not changed; the affectedcolumns might have, is there any reason I don't see why we should do it? If you consider each column independent, then we could skip... but maybe this should only be done once we compute the reduction ratio per column as discussed above, as currently we just scale the ndv for all columns involved in the predicate proportionally. > Fix for statistics annotation in nested AND expressions > ------------------------------------------------------- > > Key: HIVE-21928 > URL: https://issues.apache.org/jira/browse/HIVE-21928 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Critical > Labels: pull-request-available > Attachments: HIVE-21928.01.patch, HIVE-21928.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Discovered while working on HIVE-21867. Having predicates with nested AND > expressions may result in different stats, even if predicates are basically > similar (from stats estimation standpoint). > For instance, stats for {{AND(x=5, true, true)}} are different from {{x=5}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)