[
https://issues.apache.org/jira/browse/IMPALA-14075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riza Suminto resolved IMPALA-14075.
-----------------------------------
Fix Version/s: Impala 5.0.0
Target Version: Impala 5.0.0
Resolution: Fixed
> Parallelize delete operations of EXPIRE_SNAPSHOTS
> -------------------------------------------------
>
> Key: IMPALA-14075
> URL: https://issues.apache.org/jira/browse/IMPALA-14075
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Zoltán Borók-Nagy
> Assignee: Riza Suminto
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 5.0.0
>
>
> Currently Impala executes EXPIRE_SNAPSHOTS operation on a single thread. It
> can be really slow on cloud storage systems, especially if the operation
> needs to remove lots of files.
> It is possible to run the delete operations in parallel by passing an
> ExecutorService object to ExpireSnapshots:
> {noformat}
> ExpireSnapshots executeDeleteWith(ExecutorService executorService);{noformat}
> [https://github.com/apache/iceberg/blob/31c315f695aad544a096a5a2ffdde54a97b90b28/api/src/main/java/org/apache/iceberg/ExpireSnapshots.java#L100]
> For reference, Hive uses 4 threads to execute the deletes:
> [https://github.com/apache/hive/blob/08067725bc6e8810579324736a0aac453c06bf7b/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2239-L2241]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]