[
https://issues.apache.org/jira/browse/HADOOP-18656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833169#comment-17833169
]
ASF GitHub Bot commented on HADOOP-18656:
-----------------------------------------
anujmodi2021 commented on PR #6409:
URL: https://github.com/apache/hadoop/pull/6409#issuecomment-2031907051
> hey, in #6494 i'm drafting a bulk delete API where the caller (iceberg
etc) can give a list of file paths for deletion with no guarantees about safety
checks, parent dirs existing afterwards etc. Would this work here too?
From what I understood from your PR, this seems a bit different from bulk
delete. Paginated delete will be supported here but caller won't be able to
specify a list of paths that they want to delete. Only one path that too a
directory path can be passes and everything inside that directory will be
deleted.
Pagination here is only for performing ACL checks and not actual delete.
Delete will still be a single operation performed after ACL check is completed
on whole directory listing.
In case ACL checks ail in between after a few pages, whole delete operation
will fail it won't delete any object. This way this is an atomic delete.
> ABFS: Support for Pagination in Recursive Directory Delete
> -----------------------------------------------------------
>
> Key: HADOOP-18656
> URL: https://issues.apache.org/jira/browse/HADOOP-18656
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.3.5
> Reporter: Sree Bhattacharyya
> Assignee: Anuj Modi
> Priority: Minor
> Labels: pull-request-available
>
> Today, when a recursive delete is issued for a large directory in ADLS Gen2
> (HNS) account, the directory deletion happens in O(1) but in backend ACL
> Checks are done recursively for each object inside that directory which in
> case of large directory could lead to request time out. Pagination is
> introduced in the Azure Storage Backend for these ACL checks.
> More information on how pagination works can be found on public documentation
> of [Azure Delete Path
> API|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/delete?view=rest-storageservices-datalakestoragegen2-2019-12-12].
> This PR contains changes to support this from client side. To trigger
> pagination, client needs to add a new query parameter "paginated" and set it
> to true along with recursive set to true. In return if the directory is
> large, server might return a continuation token back to the caller. If caller
> gets back a continuation token, it has to call the delete API again with
> continuation token along with recursive and pagination set to true. This is
> similar to directory delete of FNS account.
> Pagination is available only in versions "2023-08-03" onwards.
> PR also contains functional tests to verify driver works well with different
> combinations of recursive and pagination features for HNS.
> Full E2E testing of pagination requires large dataset to be created and hence
> not added as part of driver test suite. But extensive E2E testing has been
> performed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]