[ https://issues.apache.org/jira/browse/HIVE-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384917#comment-14384917 ]
Mostafa Mokhtar commented on HIVE-10122: ---------------------------------------- [~sershe] Ok, now I understand what is going on, this is the problematic metastore query {code} SELECT A0.PART_NAME AS NUCORDER0 FROM PARTITIONS A0 LEFT OUTER JOIN TBLS B0 ON A0.TBL_ID = B0.TBL_ID LEFT OUTER JOIN DBS C0 ON B0.DB_ID = C0.DB_ID WHERE C0.NAME = 'tpcds_bin_partitioned_orc_30000' AND B0.TBL_NAME = 'store_sales' ORDER BY NUCORDER0; {code} Then the remaining ones use the PART_NAME and PART_ID from the previous query {code} select PARTITIONS.PART_ID from PARTITIONS inner join TBLS ON PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = 'store_sales' inner join DBS ON TBLS.DB_ID = DBS.DB_ID and DBS.NAME = 'tpcds_bin_partitioned_orc_30000' where PARTITIONS.PART_NAME in ('ss_sold_date_sk=2450816' , 'ss_sold_date_sk=2450817'); {code} {code} select PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS, SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB from PARTITIONS left outer join SDS ON PARTITIONS.SD_ID = SDS.SD_ID left outer join SERDES ON SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (59203 , 58422) order by PART_NAME asc; {code} If filters are on the partitioned column only as in {code} select ss_item_sk rowcount from store_sales where ss_sold_date_sk between 2450816 and 2450817 ; {code} Then PARTITIONS table is queried with a partition filter. {code} select PARTITIONS.PART_ID from PARTITIONS inner join TBLS ON PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = 'store_sales' inner join DBS ON TBLS.DB_ID = DBS.DB_ID and DBS.NAME = 'tpcds_bin_partitioned_orc_30000' inner join PARTITION_KEY_VALS FILTER0 ON FILTER0.PART_ID = PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 where ((((case when TBLS.TBL_NAME = 'store_sales' and DBS.NAME = 'tpcds_bin_partitioned_orc_30000' then cast(FILTER0.PART_KEY_VAL as decimal (21 , 0 )) else null end) >= 2450816) and ((case when TBLS.TBL_NAME = 'store_sales' and DBS.NAME = 'tpcds_bin_partitioned_orc_30000' then cast(FILTER0.PART_KEY_VAL as decimal (21 , 0 )) else null end) <= 2450817))); {code} For 2K partitions there is no measurable performance difference between {code} explain select ss_item_sk rowcount from store_sales where ss_sold_date_sk between 2450816 and 2450817 ; {code} and {code} explain select ss_item_sk rowcount from store_sales where ss_sold_date_sk between 2450816 and 2450817 and ss_ticket_number > 100000000 and ss_item_sk > 50; {code} > Hive metastore filter-by-expression is broken for non-partition expressions > --------------------------------------------------------------------------- > > Key: HIVE-10122 > URL: https://issues.apache.org/jira/browse/HIVE-10122 > Project: Hive > Issue Type: Bug > Components: Metastore > Affects Versions: 0.14.0, 1.0.0, 1.1.0 > Reporter: Sergey Shelukhin > > See > https://issues.apache.org/jira/browse/HIVE-10091?focusedCommentId=14382413&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14382413 > These two lines of code > {noformat} > // Replace virtual columns with nulls. See javadoc for details. > prunerExpr = removeNonPartCols(prunerExpr, extractPartColNames(tab), > partColsUsedInFilter); > // Remove all parts that are not partition columns. See javadoc for > details. > ExprNodeDesc compactExpr = compactExpr(prunerExpr.clone()); > {noformat} > are supposed to take care of this; I see there were bunch of changes to this > code over some time, and now it appears to be broken. > Thanks to [~thejas] for info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)