[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

Jesus Camacho Rodriguez (Jira) Mon, 21 Oct 2019 13:27:42 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956422#comment-16956422
 ]


Jesus Camacho Rodriguez commented on HIVE-22238:
------------------------------------------------

[~kgyrtkirk], `getSelectivitySimpleTree` looks for the TS that is below that 
operator. Does it find it or do we go into logic for multiple operators? If it 
does, maybe we should skip the predicates that have already been accounted for 
on PK side (filter conditions on join keys) from the estimate. Does that make 
sense? Skipping any reduction performed by a join seems too radical (for 
instance, if we filter by year but joined by any other key, we will not predict 
any reduction due to join).

> PK/FK selectivity estimation underscales estimations
> ----------------------------------------------------
>
>                 Key: HIVE-22238
>                 URL: https://issues.apache.org/jira/browse/HIVE-22238
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: HIVE-22238.01.patch, HIVE-22238.02.patch, 
> HIVE-22238.03.patch
>
>
> at [this 
> point|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2182]
>  the parent operators rownum is scaled according to pkfkselectivity
> however [pkfkselectivity is 
> computed|https://github.com/apache/hive/blob/5098d155a1e6a164253f5fa98755273bc34085df/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2157]
>  on a whole subtree.
> Scaling it by that amount will count in estimation already used when 
> parentstats was calculated...so depending on the number of upstream joins - 
> this may lead to severe underestimations
> what happened was:
> * optimization was able to push the filter to the other side of the join
> * as a result the incoming data was already filtered
> * scaling down by the PK selectiviy - was actually already there...but a new 
> "scaling" happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22238) PK/FK selectivity estimation underscales estimations

Reply via email to