Manish Bhatt created HADOOP-19381:
-------------------------------------

             Summary: Support Rename and Delete operation over FNS-Blob endpoint
                 Key: HADOOP-19381
                 URL: https://issues.apache.org/jira/browse/HADOOP-19381
             Project: Hadoop Common
          Issue Type: New Feature
          Components: fs/azure
    Affects Versions: 3.5.0
            Reporter: Manish Bhatt
            Assignee: Manish Bhatt


Currently, we only support rename and delete operations on the DFS endpoint. 
The reason for supporting rename and delete operations on the Blob endpoint is 
that the Blob endpoint does not account for hierarchy. We need to ensure that 
the HDFS contracts are maintained when performing rename and delete operations. 
Renaming or deleting a directory over the Blob endpoint requires the client to 
handle the orchestration and rename or delete all the blobs within the 
specified directory.
 
The task outlines the considerations for implementing rename and delete 
operations for the FNS-blob endpoint to ensure compatibility with HDFS 
contracts.
 * {*}Blob Endpoint Usage{*}: The task addresses the need for abstraction in 
the code to maintain HDFS contracts while performing rename and delete 
operations on the blob endpoint, which does not support hierarchy.
 * {*}Rename Operations{*}: The {{AzureBlobFileSystem#rename()}} method will 
use a {{RenameHandler}} instance to handle rename operations, with separate 
handlers for the DFS and blob endpoints. This method includes prechecks, 
destination adjustments, and orchestration of directory renaming for blobs.
 * {*}Atomic Rename{*}: Atomic renaming is essential for blob endpoints, as it 
requires orchestration to copy or delete each blob within the directory. A 
configuration will allow developers to specify directories for atomic renaming, 
with a JSON file to track the status of renames.
 * {*}Delete Operations{*}: Delete operations are simpler than renames, 
requiring fewer HDFS contract checks. For blob endpoints, the client must 
handle orchestration, including managing orphaned directories created by 
Az-copy.
 * {*}Orchestration for Rename/Delete{*}: Orchestration for rename and delete 
operations over blob endpoints involves listing blobs and performing actions on 
each blob. The process must be optimized to handle large numbers of blobs 
efficiently.
 * {*}Need for Optimization{*}: Optimization is crucial because the 
{{ListBlob}} API can return a maximum of 5000 blobs at once, necessitating 
multiple calls for large directories. The task proposes a producer-consumer 
model to handle blobs in parallel, thereby reducing processing time and memory 
usage.
 * {*}Producer-Consumer Design{*}: The proposed design includes a producer to 
list blobs, a queue to store the blobs, and a consumer to process them in 
parallel. This approach aims to improve efficiency and mitigate memory issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to