Devesh Kumar Singh created HDDS-12707:
-----------------------------------------

             Summary: Recon - Streaming-Based Approach for Fetch and Extraction 
of Recon OM DB Snapshot Tar SST files
                 Key: HDDS-12707
                 URL: https://issues.apache.org/jira/browse/HDDS-12707
             Project: Apache Ozone
          Issue Type: Improvement
          Components: Ozone Recon
            Reporter: Devesh Kumar Singh
            Assignee: Devesh Kumar Singh


Instead of storing the full TAR file by Recon and waiting for complete 
transfer, let's *extract files as they arrive* using 
{{{}TarArchiveInputStream{}}}.
This will:
 * Reduce disk I/O

 * Start processing sooner

 * Avoid extra storage needs

h4. Why This is More Efficient
h5. No Temporary TAR File
 * Directly extracts files {*}while streaming{*}, eliminating the need to store 
the full TAR.

h5. *Starts Extracting Immediately*
 * No waiting for the full file to be received; extraction happens {*}as data 
arrives{*}.

h5. Lower Disk I/O & Storage Needs
 * Removes unnecessary {{FileUtils.copyInputStreamToFile()}} call.

 * Avoids writing and re-reading the TAR file.

h5. Handles Both Files & Directories
 * Ensures correct directory structure before writing files.

h4. *Using Multithreading for Parallel Extraction*

To extract files in {*}parallel{*}, we need to:
 # *Use a thread pool* to process multiple files at the same time.

 # *Extract files asynchronously* while maintaining order and efficiency.

 # *Ensure correct handling of directories before writing files.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to