Rajesh Balamohan created HIVE-27014:
---------------------------------------

             Summary: Iceberg: getSplits/planTasks should filter out relevant 
folders instead of scanning entire table
                 Key: HIVE-27014
                 URL: https://issues.apache.org/jira/browse/HIVE-27014
             Project: Hive
          Issue Type: Improvement
          Components: Iceberg integration
            Reporter: Rajesh Balamohan


With dynamic partition pruning, only relevant folders in fact tables are 
scanned.

In tez, DynamicPartitionPruner will set the relevant filters.In iceberg, these 
filters are applied after "Table:planTasks()" is invoked in iceberg. This 
forces entire table metadata to be scanned and then throw off the unwanted 
partitions. 

This makes split computation expensive (e.g for store_sales, it has to look at 
all 1800+ partitions and throw off unwanted partitions).

For short running queries, it takes 3-5+ seconds for split computation. 
Creating this ticket as a placeholder to make use of the relevant filters from 
DPP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to