[ https://issues.apache.org/jira/browse/HIVE-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jesus Camacho Rodriguez updated HIVE-22239: ------------------------------------------- Attachment: HIVE-22239.05.patch > Scale data size using column value ranges > ----------------------------------------- > > Key: HIVE-22239 > URL: https://issues.apache.org/jira/browse/HIVE-22239 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Major > Labels: pull-request-available > Attachments: HIVE-22239.01.patch, HIVE-22239.02.patch, > HIVE-22239.03.patch, HIVE-22239.04.patch, HIVE-22239.04.patch, > HIVE-22239.05.patch, HIVE-22239.patch > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Currently, min/max values for columns are only used to determine whether a > certain range filter falls out of range and thus filters all rows or none at > all. If it does not, we just use a heuristic that the condition will filter > 1/3 of the input rows. Instead of using that heuristic, we can use another > one that assumes that data will be uniformly distributed across that range, > and calculate the selectivity for the condition accordingly. -- This message was sent by Atlassian Jira (v8.3.4#803005)