[ 
https://issues.apache.org/jira/browse/HADOOP-18656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833108#comment-17833108
 ] 

ASF GitHub Bot commented on HADOOP-18656:
-----------------------------------------

anujmodi2021 commented on PR #6409:
URL: https://github.com/apache/hadoop/pull/6409#issuecomment-2031631438

   > how does the pagination work here? do repeated calls need to be made? if 
so, where is this done? it wasn't immediately obvious to me.
   
   Paginated delete will work somehow similar to how recursive delete works for 
FNS accounts.
   
   For HNS Accounts, recursive delete is supposed to be a O(1) operation i.e 
deleting the folder itself. But before deleting the whole folder, server needs 
to do ACL checks on all the children of that folder which is not O(1). If the 
directory is large, this ACL check can take some time and request can timeout. 
To avoid this, server will return a continuation token if ACL check is still 
pending. Client need to loop delete on this continuation token similar to how 
it loop delete for FNS account recursive delete.
   
   Difference is that for FNS every call does delete some objects, in HNS it 
only performs ACL checks and actual delete of directory happens on last delete 
call of loop.
   
   Hope that explains.
   
   Repeated calls are made in abfsStore.deletePath() where continuation token 
check is present. This is common code for FNS and HNS.




> ABFS: Support for Pagination in Recursive Directory Delete 
> -----------------------------------------------------------
>
>                 Key: HADOOP-18656
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18656
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.3.5
>            Reporter: Sree Bhattacharyya
>            Assignee: Anuj Modi
>            Priority: Minor
>              Labels: pull-request-available
>
> Today, when a recursive delete is issued for a large directory in ADLS Gen2 
> (HNS) account, the directory deletion happens in O(1) but in backend ACL 
> Checks are done recursively for each object inside that directory which in 
> case of large directory could lead to request time out. Pagination is 
> introduced in the Azure Storage Backend for these ACL checks.
> More information on how pagination works can be found on public documentation 
> of [Azure Delete Path 
> API|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/delete?view=rest-storageservices-datalakestoragegen2-2019-12-12].
> This PR contains changes to support this from client side. To trigger 
> pagination, client needs to add a new query parameter "paginated" and set it 
> to true along with recursive set to true. In return if the directory is 
> large, server might return a continuation token back to the caller. If caller 
> gets back a continuation token, it has to call the delete API again with 
> continuation token along with recursive and pagination set to true. This is 
> similar to directory delete of FNS account.
> Pagination is available only in versions "2023-08-03" onwards.
> PR also contains functional tests to verify driver works well with different 
> combinations of recursive and pagination features for HNS.
> Full E2E testing of pagination requires large dataset to be created and hence 
> not added as part of driver test suite. But extensive E2E testing has been 
> performed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to