[
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876525#comment-16876525
]
Mikhail Khludnev commented on SOLR-9961:
----------------------------------------
Design would be:
* {{BackupRepositoryFactory}} holds shared thread pool
* thread pool is injected into created {{BackupRepository}} optionally
* Restore (Backup) operation(s) uses dedicated operation {{listAll(path,
lambda)}} or {{forEach(list/file, lambda)}}
* Repoes, which accepted thread pool, invoke the lambda in threads
* Lambda accepts a repository delegate and expected to operate with it. This
delegate reuses HDFS and close/release it after it's done.
WDYT?
> RestoreCore needs the option to download files in parallel.
> -----------------------------------------------------------
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
> Issue Type: Improvement
> Components: Backup/Restore
> Affects Versions: 6.2.1
> Reporter: Timothy Potter
> Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch,
> SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think
> this is a general problem) takes 8 minutes ... the restore of the same core
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me
> to parallelize the expensive part of this operation (the IO from the remote
> cloud storage service). We need the option to parallelize the download (like
> distcp).
> Also, I tried downloading the same directory using gsutil and it was very
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to
> consider a two-step approach: 1) download in parallel to a temp dir, 2)
> perform all the of the checksum validation against the local temp dir. That
> will save round trips to the remote cloud storage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]