[
https://issues.apache.org/jira/browse/SOLR-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shalin Shekhar Mangar updated SOLR-7869:
----------------------------------------
Attachment: SOLR-7869.patch
Here's a better fix which discards ZkStateWriter on a BadVersionException and
starts afresh. The previous approach didn't work when an external change was
made on state.json with no changes to /clusterstate.json. Although such changes
can be detected and resolved inside ZkStateWriter but that would make this
class unnecessarily complex.
ZkStateWriter will put itself into an invalid state upon a BadVersionException
and will disallow all future operations. Callers are expected to discard such
an instance and create a fresh ZkStateWriter instance for future use.
I added two tests in ZkStateWriterTest which simulate an external change to
/clusterstate.json and a state.json and asserts that an IllegalStateException
is thrown on any future invocation of enqueueUpdate or writePendingUpdates.
I also added a test in Overseer which asserts that the overseer can keep
processing events on a BadVersionException (indirectly testing that a fresh
ZkStateWriter is created upon said exception).
I also added copious amounts of javadocs to the ZkStateWriter class for future
reference.
> Overseer does not handle BadVersionException correctly
> ------------------------------------------------------
>
> Key: SOLR-7869
> URL: https://issues.apache.org/jira/browse/SOLR-7869
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 5.2.1
> Reporter: Shalin Shekhar Mangar
> Assignee: Shalin Shekhar Mangar
> Labels: difficulty-medium, impact-low
> Fix For: Trunk, 5.4
>
> Attachments: SOLR-7869.patch, SOLR-7869.patch, SOLR-7869.patch
>
>
> If the /clusterstate.json is modified externally then the Overseer can go
> into an infinite loop upon a BadVersionException alternately trying to
> execute main queue and then the work queue:
> {code}
> ERROR - 2015-08-04 18:49:56.224; [ ]
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Exception in Overseer
> work queue loop
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /clusterstate.json
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:362)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:359)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
> at
> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:359)
> at
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:180)
> at
> org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:67)
> at
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:286)
> at
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:168)
> at java.lang.Thread.run(Thread.java:745)
> INFO - 2015-08-04 18:49:56.224; [ ]
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; processMessage:
> queueSize: 1, message = {
> "operation":"state",
> "state":"down",
> "base_url":"http://127.0.1.1:7574/solr",
> "core":"test_shard1_replica1",
> "roles":null,
> "node_name":"127.0.1.1:7574_solr",
> "shard":null,
> "collection":"test",
> "core_node_name":"core_node1"} current state version: 9
> INFO - 2015-08-04 18:49:56.224; [ ]
> org.apache.solr.cloud.overseer.ReplicaMutator; Update state numShards=null
> message={
> "operation":"state",
> "state":"down",
> "base_url":"http://127.0.1.1:7574/solr",
> "core":"test_shard1_replica1",
> "roles":null,
> "node_name":"127.0.1.1:7574_solr",
> "shard":null,
> "collection":"test",
> "core_node_name":"core_node1"}
> INFO - 2015-08-04 18:49:56.224; [ ]
> org.apache.solr.cloud.overseer.ReplicaMutator; shard=shard1 is already
> registered
> ERROR - 2015-08-04 18:49:56.225; [ ]
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Exception in Overseer
> main queue loop
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /clusterstate.json
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:362)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:359)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
> at
> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:359)
> at
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:180)
> at
> org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:67)
> at
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:286)
> at
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:213)
> at java.lang.Thread.run(Thread.java:745)
> INFO - 2015-08-04 18:49:56.225; [ ]
> org.apache.solr.common.cloud.ZkStateReader; Updating data for gettingstarted
> to ver 8
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]