[ 
https://issues.apache.org/jira/browse/HIVE-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384917#comment-14384917
 ] 

Mostafa Mokhtar commented on HIVE-10122:
----------------------------------------

[~sershe]
Ok, now I understand what is going on, this is the problematic metastore query 

{code}
SELECT 
    A0.PART_NAME AS NUCORDER0
FROM
    PARTITIONS A0
        LEFT OUTER JOIN
    TBLS B0 ON A0.TBL_ID = B0.TBL_ID
        LEFT OUTER JOIN
    DBS C0 ON B0.DB_ID = C0.DB_ID
WHERE
    C0.NAME = 'tpcds_bin_partitioned_orc_30000'
        AND B0.TBL_NAME = 'store_sales'
ORDER BY NUCORDER0;
{code}

Then the remaining ones use the PART_NAME and PART_ID from the previous query
{code}
select 
    PARTITIONS.PART_ID
from
    PARTITIONS
        inner join
    TBLS ON PARTITIONS.TBL_ID = TBLS.TBL_ID
        and TBLS.TBL_NAME = 'store_sales'
        inner join
    DBS ON TBLS.DB_ID = DBS.DB_ID
        and DBS.NAME = 'tpcds_bin_partitioned_orc_30000'
where
    PARTITIONS.PART_NAME in ('ss_sold_date_sk=2450816' , 
'ss_sold_date_sk=2450817');
{code}
{code}
select 
    PARTITIONS.PART_ID,
    SDS.SD_ID,
    SDS.CD_ID,
    SERDES.SERDE_ID,
    PARTITIONS.CREATE_TIME,
    PARTITIONS.LAST_ACCESS_TIME,
    SDS.INPUT_FORMAT,
    SDS.IS_COMPRESSED,
    SDS.IS_STOREDASSUBDIRECTORIES,
    SDS.LOCATION,
    SDS.NUM_BUCKETS,
    SDS.OUTPUT_FORMAT,
    SERDES.NAME,
    SERDES.SLIB
from
    PARTITIONS
        left outer join
    SDS ON PARTITIONS.SD_ID = SDS.SD_ID
        left outer join
    SERDES ON SDS.SERDE_ID = SERDES.SERDE_ID
where
    PART_ID in (59203 , 58422)
order by PART_NAME asc;
{code}


If filters are on the partitioned column only as in 
{code} 
select ss_item_sk  rowcount from store_sales where ss_sold_date_sk between 
2450816 and 2450817  ;
{code}

Then PARTITIONS table is queried with a partition filter.
{code}

select 
    PARTITIONS.PART_ID
from
    PARTITIONS
        inner join
    TBLS ON PARTITIONS.TBL_ID = TBLS.TBL_ID
        and TBLS.TBL_NAME = 'store_sales'
        inner join
    DBS ON TBLS.DB_ID = DBS.DB_ID
        and DBS.NAME = 'tpcds_bin_partitioned_orc_30000'
        inner join
    PARTITION_KEY_VALS FILTER0 ON FILTER0.PART_ID = PARTITIONS.PART_ID
        and FILTER0.INTEGER_IDX = 0
where
    ((((case
        when
            TBLS.TBL_NAME = 'store_sales'
                and DBS.NAME = 'tpcds_bin_partitioned_orc_30000'
        then
            cast(FILTER0.PART_KEY_VAL as decimal (21 , 0 ))
        else null
    end) >= 2450816)
        and ((case
        when
            TBLS.TBL_NAME = 'store_sales'
                and DBS.NAME = 'tpcds_bin_partitioned_orc_30000'
        then
            cast(FILTER0.PART_KEY_VAL as decimal (21 , 0 ))
        else null
    end) <= 2450817)));
{code}


For 2K partitions there is no measurable performance difference between 
{code}
 explain select ss_item_sk  rowcount from store_sales where ss_sold_date_sk 
between 2450816 and 2450817  ;
{code}

and 
{code}
explain select ss_item_sk  rowcount from store_sales where ss_sold_date_sk 
between 2450816 and 2450817  and ss_ticket_number  > 100000000 and ss_item_sk > 
50;
{code}

> Hive metastore filter-by-expression is broken for non-partition expressions
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-10122
>                 URL: https://issues.apache.org/jira/browse/HIVE-10122
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 0.14.0, 1.0.0, 1.1.0
>            Reporter: Sergey Shelukhin
>
> See 
> https://issues.apache.org/jira/browse/HIVE-10091?focusedCommentId=14382413&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14382413
> These two lines of code
> {noformat}
>     // Replace virtual columns with nulls. See javadoc for details.
>     prunerExpr = removeNonPartCols(prunerExpr, extractPartColNames(tab), 
> partColsUsedInFilter);
>     // Remove all parts that are not partition columns. See javadoc for 
> details.
>     ExprNodeDesc compactExpr = compactExpr(prunerExpr.clone());
> {noformat}
> are supposed to take care of this; I see there were bunch of changes to this 
> code over some time, and now it appears to be broken.
> Thanks to [~thejas] for info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to