[ 
https://issues.apache.org/jira/browse/SOLR-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419901#comment-15419901
 ] 

ASF subversion and git services commented on SOLR-9092:
-------------------------------------------------------

Commit 1d9be84cb67ed5e57bcd60ae483f45d3abd09bd5 in lucene-solr's branch 
refs/heads/master from [~varunthacker]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1d9be84 ]

SOLR-9092: In the deletereplica commandand add a live check before calling 
delete core


> Add safety checks to delete replica/shard/collection commands
> -------------------------------------------------------------
>
>                 Key: SOLR-9092
>                 URL: https://issues.apache.org/jira/browse/SOLR-9092
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Varun Thacker
>            Assignee: Varun Thacker
>            Priority: Minor
>         Attachments: SOLR-9092.patch, SOLR-9092.patch
>
>
> We should verify the delete commands against live_nodes to make sure the API 
> can atleast be executed correctly
> If we have a two node cluster, a collection with 1 shard 2 replica. Call the 
> delete replica command against for the replica whose node is currently down.
> You get an exception:
> {code}
> <response>
>    <lst name="responseHeader">
>       <int name="status">0</int>
>       <int name="QTime">5173</int>
>    </lst>
>    <lst name="failure">
>       <str 
> name="192.168.1.101:7574_solr">org.apache.solr.client.solrj.SolrServerException:Server
>  refused connection at: http://192.168.1.101:7574/solr</str>
>    </lst>
> </response>
> {code}
> At this point the entry for the replica is gone from state.json . The client 
> application retries since an error was thrown but the delete command will 
> never succeed now and an error like this will be seen-
> {code}
> <response>
>    <lst name="responseHeader">
>       <int name="status">400</int>
>       <int name="QTime">137</int>
>    </lst>
>    <str name="Operation deletereplica caused 
> exception:">org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>  Invalid replica : core_node3 in shard/collection : shard1/gettingstarted 
> available replicas are core_node1</str>
>    <lst name="exception">
>       <str name="msg">Invalid replica : core_node3 in shard/collection : 
> shard1/gettingstarted available replicas are core_node1</str>
>       <int name="rspCode">400</int>
>    </lst>
>    <lst name="error">
>       <lst name="metadata">
>          <str name="error-class">org.apache.solr.common.SolrException</str>
>          <str 
> name="root-error-class">org.apache.solr.common.SolrException</str>
>       </lst>
>       <str name="msg">Invalid replica : core_node3 in shard/collection : 
> shard1/gettingstarted available replicas are core_node1</str>
>       <int name="code">400</int>
>    </lst>
> </response>
> {code}
> For create collection/add-replica we check the "createNodeSet" and "node" 
> params respectively against live_nodes to make sure it has a chance of 
> succeeding.
> We should add a check against live_nodes for the delete commands as well.
> Another situation where I saw this can be a problem - A second solr cluster 
> cloned from the first but the script didn't correctly change the hostnames in 
> the state.json file. When a delete command was issued against the second 
> cluster Solr deleted the replica from the first cluster.
> In the above case the script was buggy obviously but if we verify against 
> live_nodes then Solr wouldn't have gone ahead and deleted replicas not 
> belonging to its cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to