[
https://issues.apache.org/jira/browse/SOLR-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087041#comment-16087041
]
Shalin Shekhar Mangar commented on SOLR-11083:
----------------------------------------------
[~noble.paul] suggested that if we build a core admin API to unload a replica
temporarily i.e. for the next N minutes, then MoveReplica can use that API
first and then add a new replica. Once the N minutes elapse, the old replica
will be loaded again and will discover that it has been replaced and promptly
unload itself. If the overseer fails then a new replica won't exist and the old
replica will come back online.
I won't have time to work on this but wanted to write a potential solution here
in case someone else is interested.
> MoveReplica API can lose replicas for shared file systems on overseer restart
> if source node is live
> ----------------------------------------------------------------------------------------------------
>
> Key: SOLR-11083
> URL: https://issues.apache.org/jira/browse/SOLR-11083
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: hdfs, SolrCloud
> Reporter: Shalin Shekhar Mangar
> Fix For: 7.1
>
>
> MoveReplica unloads the old replica and creates a new one for shared file
> systems. But if the overseer restarts between the two operations then the old
> replica is lost. It is upto the user to detect the failure (using request
> status API) and retry.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]