Rajesh Balamohan created HIVE-27014:
---------------------------------------
Summary: Iceberg: getSplits/planTasks should filter out relevant
folders instead of scanning entire table
Key: HIVE-27014
URL: https://issues.apache.org/jira/browse/HIVE-27014
Project: Hive
Issue Type: Improvement
Components: Iceberg integration
Reporter: Rajesh Balamohan
With dynamic partition pruning, only relevant folders in fact tables are
scanned.
In tez, DynamicPartitionPruner will set the relevant filters.In iceberg, these
filters are applied after "Table:planTasks()" is invoked in iceberg. This
forces entire table metadata to be scanned and then throw off the unwanted
partitions.
This makes split computation expensive (e.g for store_sales, it has to look at
all 1800+ partitions and throw off unwanted partitions).
For short running queries, it takes 3-5+ seconds for split computation.
Creating this ticket as a placeholder to make use of the relevant filters from
DPP.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)