[
https://issues.apache.org/jira/browse/SOLR-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090732#comment-15090732
]
Jeff Wartes commented on SOLR-8531:
-----------------------------------
1. A fully upgraded cluster behaves normally.
2. The problem is only occurs for collections with replicationFactor > 1, but
by definition, this means you only have problems if you're trying an HA upgrade.
Upgraded nodes got in line for leader election as normal, but could not figure
out the current leader on start, and never executed replication recovery and
became active. If I restarted 5.3 nodes for a given shard, the 5.4 shard would
eventually get elected leader, and publish active state without intervention,
but restarting the 5.4 shard again would mean a 5.3 shard got elected, and the
5.4 node would be stuck in 'down' state again. I did not test restarting a 5.3
shard while the 5.4 shard was leader.
In my case I had sufficient production capacity to upgrade half my cluster,
create a new collection in 5.4, copy the data into it, and then upgrade the
rest of the cluster, so I did that.
As mentioned, taking downtime and upgrading the whole cluster at once would
also have worked.
> ZK leader path changed in 5.4
> -----------------------------
>
> Key: SOLR-8531
> URL: https://issues.apache.org/jira/browse/SOLR-8531
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 5.4
> Reporter: Jeff Wartes
>
> While doing a rolling upgrade from 5.3 to 5.4 of a solrcloud cluster, I
> observed that upgraded nodes would not register their shards as active unless
> they were elected the leader for the shard.
> There were no errors, the shards were fully up and responsive, but would not
> publish any change from the "down" state.
> This appears to be because the recovery process never happens, because the ZK
> node containing the current leader can't be found, because the ZK path has
> changed.
> Specifically, the leader data node changed from:
> <collection>/leaders/<shard>
> to
> <collection>/leaders/<shard>/leader
> It looks to me like this happened during SOLR-7844, perhaps accidentally.
> At the least, the "Migrating to Solr 5.4" section of the README should get
> updated with this info, since it means a rolling upgrade of a collection with
> multiple replicas will suffer serious degradation in the number of active
> replicas as nodes are upgraded. It's entirely possible this will reduce some
> shards to a single active replica.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]