[
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874521#comment-16874521
]
Mikhail Khludnev commented on SOLR-9961:
----------------------------------------
Attached dirty draft. Really dirty. Turns out backup repos aren't closed in the
code ever now. I'm really surprised.
> RestoreCore needs the option to download files in parallel.
> -----------------------------------------------------------
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
> Issue Type: Improvement
> Components: Backup/Restore
> Affects Versions: 6.2.1
> Reporter: Timothy Potter
> Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think
> this is a general problem) takes 8 minutes ... the restore of the same core
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me
> to parallelize the expensive part of this operation (the IO from the remote
> cloud storage service). We need the option to parallelize the download (like
> distcp).
> Also, I tried downloading the same directory using gsutil and it was very
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to
> consider a two-step approach: 1) download in parallel to a temp dir, 2)
> perform all the of the checksum validation against the local temp dir. That
> will save round trips to the remote cloud storage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]