[
https://issues.apache.org/jira/browse/HADOOP-17881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902661#comment-17902661
]
Steve Loughran commented on HADOOP-17881:
-----------------------------------------
I think that this has been assisted by HADOOP-18679, except in directory delete
itself. This *is* throttled, and it pushes all scheduling issues up to the
callers.
rename() and delete() are not yet throttled, but they could be moved to this
API and add lists of deletes (size < page size) into a queue which then submits
though a small worker pool.
> S3A DeleteOperation to parallelize POSTing of bulk deletes
> ----------------------------------------------------------
>
> Key: HADOOP-17881
> URL: https://issues.apache.org/jira/browse/HADOOP-17881
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.0
> Reporter: Steve Loughran
> Priority: Major
>
> Once the need to update the DDB tables is removed, we can't go from a single
> POSTed delete at a time to posting a large set of bulk delete operations in
> parallel.
> The current design is to support incremental update of S3Guard tables,
> including handling partial failures. Not a problem anymore.
> This will significantly improve delete() performance on directory trees with
> many many children/descendants, as it goes from a sequence of children/1000
> POSTs to parallel writes. As each file deleted is still throttled, we will be
> limited to 3500 deletes/second with throttling, so throwing a large pool of
> workers at the problem would be counter-productive and potentially cause
> problems for other applications trying to write down the same directory tree.
> But we can do better than one-POST at a time.
> Proposed
> * if parallel delete is off: no limit
> * parallel delete is on, limit #of parallel to 3000/page-size: you'll never
> have more updates pending than the write limit of a single shard.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]