[ https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141440#comment-14141440 ]
Mostafa Mokhtar commented on HIVE-8196: --------------------------------------- [~hagleitn] > Joining on partition columns with fetch column stats enabled results it very > small CE which negatively affects query performance > --------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-8196 > URL: https://issues.apache.org/jira/browse/HIVE-8196 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Affects Versions: 0.14.0 > Reporter: Mostafa Mokhtar > Assignee: Prasanth J > Priority: Critical > Labels: performance > Fix For: 0.14.0 > > > To make the best out of dynamic partition pruning joins should be on the > partitioning columns which results in dynamically pruning the partitions from > the fact table based on the qualifying column keys from the dimension table, > this type of joins negatively effects on cardinality estimates with fetch > column stats enabled. > Currently we don't have statistics for partition columns and as a result NDV > is set to row count, doing that negatively affects the estimated join > selectivity from the join. > Workaround is to capture statistics for partition columns or use number of > partitions incase dynamic partitioning is used. > In StatsUtils.getColStatisticsFromExpression is where count distincts gets > set to row count > {code} > if (encd.getIsPartitionColOrVirtualCol()) { > // vitual columns > colType = encd.getTypeInfo().getTypeName(); > countDistincts = numRows; > oi = encd.getWritableObjectInspector(); > {code} > Query used to repro the issue : > {code} > set hive.stats.fetch.column.stats=ture; > set hive.tez.dynamic.partition.pruning=true; > explain select d_date > from store_sales, date_dim > where > store_sales.ss_sold_date_sk = date_dim.d_date_sk and > date_dim.d_year = 1998; > {code} > Plan -- This message was sent by Atlassian JIRA (v6.3.4#6332)