Devesh Kumar Singh created HDDS-12707:
-----------------------------------------
Summary: Recon - Streaming-Based Approach for Fetch and Extraction
of Recon OM DB Snapshot Tar SST files
Key: HDDS-12707
URL: https://issues.apache.org/jira/browse/HDDS-12707
Project: Apache Ozone
Issue Type: Improvement
Components: Ozone Recon
Reporter: Devesh Kumar Singh
Assignee: Devesh Kumar Singh
Instead of storing the full TAR file by Recon and waiting for complete
transfer, let's *extract files as they arrive* using
{{{}TarArchiveInputStream{}}}.
This will:
* Reduce disk I/O
* Start processing sooner
* Avoid extra storage needs
h4. Why This is More Efficient
h5. No Temporary TAR File
* Directly extracts files {*}while streaming{*}, eliminating the need to store
the full TAR.
h5. *Starts Extracting Immediately*
* No waiting for the full file to be received; extraction happens {*}as data
arrives{*}.
h5. Lower Disk I/O & Storage Needs
* Removes unnecessary {{FileUtils.copyInputStreamToFile()}} call.
* Avoids writing and re-reading the TAR file.
h5. Handles Both Files & Directories
* Ensures correct directory structure before writing files.
h4. *Using Multithreading for Parallel Extraction*
To extract files in {*}parallel{*}, we need to:
# *Use a thread pool* to process multiple files at the same time.
# *Extract files asynchronously* while maintaining order and efficiency.
# *Ensure correct handling of directories before writing files.*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]