Yu-Wen Lai created HIVE-23796:
---------------------------------

             Summary: Multiple insert overwrite into a partitioned table 
doesn't gather column statistics for all partitions
                 Key: HIVE-23796
                 URL: https://issues.apache.org/jira/browse/HIVE-23796
             Project: Hive
          Issue Type: Bug
          Components: Statistics
         Environment: Hive 3.1
            Reporter: Yu-Wen Lai


Here I used a simplified sample to illustrate the issue. 
When there are multiple insert overwrite clauses, only the partitions related 
to the last clause will have column statistics. In the sample here, only the 
partition (ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__) has column statistics, 
which is related to the last insert clause.

With "hive.stats.column.autogather", by default, is true, we expect that all 
the partitions' column statistics should be calculated.
{code:sql}
create table web_sales
(
    ws_sold_time_sk           bigint,
    ws_ship_date_sk           bigint,
    ws_item_sk                bigint
)
partitioned by (ws_sold_date_sk           bigint)
stored as orc;
from anotherdb.web_sales ws
insert overwrite table web_sales partition (ws_sold_date_sk)
select
        ws.ws_sold_time_sk,
        ws.ws_ship_date_sk,
        ws.ws_item_sk,
        ws.ws_sold_date_sk
        where ws.ws_sold_date_sk is not null
insert overwrite table web_sales partition (ws_sold_date_sk)
select
        ws.ws_sold_time_sk,
        ws.ws_ship_date_sk,
        ws.ws_item_sk,
        ws.ws_sold_date_sk
        where ws.ws_sold_date_sk is null
        sort by ws.ws_sold_date_sk
;

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to