Based on the problem description, it seems you might have a work-around of
submitting collection admin commands to the Overseer node instead?  At
least during the upgrade.
Also... I wonder if upgrading to 9.0 instead of 9.6 may help.  Not that I
know of anything specific that's incompatible but I could imagine
hypothetical changes across the 9.x line that 8.x can't accept in terms of
Overseer queue where both receiver and sender must mutually work together
indirectly via ZK, a kind of protocol in a sense.  Again, this is
theoretical.

Sadly, Solr upgrade compatibility is not something the project has an
automated test for, nor even a human script to follow to do.  In the age of
Docker, this shouldn't be hard.  It's a gap for sure.  FWIW we "care" about
it... we think about it and kind of insist on it in terms of standards /
acceptance criteria but without a test... it's a "best effort".

On Wed, Oct 16, 2024 at 2:38 PM Patrick Lok
<patrick....@salesforce.com.invalid> wrote:

> Hi Jan,
>
> Thank you so much for responding. Really appreciate it.
>
> I thought that's a problem with Solr 8.5 or older. We have migrated to Solr
> 8.11.3 and removed the use of the useUnsafeOverseerResponse flag. And from
> the other error message ("consider as bad message and poll out from the
> queue") that I'm seeing, it looks like the overseer is actually able to
> deserialize the message, but it's hitting a KeeperException?
>
> Thanks,
> Patrick
>
>
> On Wed, Oct 16, 2024 at 12:41 AM Jan Høydahl <jan....@cominvent.com>
> wrote:
>
> > Hi
> >
> > I believe that the objects on the Overseer queue are serialized java
> > objects and so you cannot create collections while in the middle of a
> major
> > upgrade.
> > I'd pause such cluster events during the rolling upgrade so that the
> > Overseer queues are empty once the overseer node is upgraded.
> >
> > Jan
> >
> > > 16. okt. 2024 kl. 04:31 skrev Patrick Lok <patrick....@salesforce.com
> > .INVALID>:
> > >
> > > Here's the request we are sending over the wire to Solr 9
> > >
> > >
> > >
> >
> "class":"org.apache.solr.client.solrj.request.CollectionAdminRequest$Create",
> > >  "method":"GET",
> > >  "params.action":"CREATE",
> > >  "params.name":"ftest-collection_1.2",
> > >  "params.collection.configName":"test-collection",
> > >  "params.createNodeSet":"EMPTY",
> > >  "params.numShards":"2",
> > >  "params.router.name":"compositeId",
> > >  "params.nrtReplicas":"1",
> > >  "params.autoAddReplicas":"false"}
> > >
> > >
> > > On Tue, Oct 15, 2024 at 7:20 PM Patrick Lok <
> patrick....@salesforce.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I'm new to Solr and I'm tasked to upgrade our Solr 8.11.3 installation
> > to
> > >> Solr 9.6.1.
> > >>
> > >> I'm running into some trouble with the create collection command when
> > it's
> > >> sent to a Solr 9.6.1 node with Solr 8.11.3 running as overseers.
> > >>
> > >> The command in Java is
> > >>  CollectionAdminRequest.createCollection(collectionName, configName,
> > >> numShards, 0)
> > >>    .setAutoAddReplicas(false)
> > >>    .setRouterName("compositeId")
> > >>    .setCreateNodeSet("EMPTY")
> > >>    .setReplicationFactor(1);
> > >>
> > >> And the error that I see on the overseer can be either of the one
> > below. I
> > >> guess it depends on if the collection has been created (but deleted)
> > before
> > >> or not.
> > >>
> > >> If the collection has been created before but deleted. I'll see in the
> > >> overseer (Solr 8) log
> > >>
> > >> 01:42:43.927 ERROR (OverseerThreadFactory-25-t...:8983_solr) [   ]
> > >> o.a.s.c.a.c.OverseerCollectionMessageHandler      Collection:
> > >> test-collection_1.2 operation: create failed
> > >> org.apache.solr.common.SolrException: Could not fully create
> collection:
> > >> test-collection_1.2
> > >>        at
> > >>
> >
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:218)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:271)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:524)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> > >> ~[?:?]
> > >>        at java.lang.Thread.run(Thread.java:829) ~[?:?]
> > >>
> > >>
> > >>
> > >>
> > >> But if the collection has never been created before, then I see in the
> > >> overseer log
> > >>
> > >> 01:42:14.439 INFO  (OverseerThreadFactory-25-thread-..._solr) [   ]
> > >> o.a.s.c.a.c.CreateCollectionCmd      Create collection
> > test1-collection_1.2
> > >> 01:42:14.442 INFO  (OverseerCollectionConfigSetProcessor-...) [   ]
> > >> o.a.s.c.OverseerTaskQueue      Response ZK path:
> > >> /overseer/collection-queue-work/qnr-0000707821 doesn't exist.
> Requestor
> > may
> > >> have disconnected from ZooKeeper
> > >> 01:42:14.469 ERROR (OverseerStateUpdate-3026498...) [   ]
> > o.a.s.c.Overseer
> > >>     Exception in Overseer main queue loop
> > >> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
> =
> > >> NoNode for /clusterstate.json
> > >>        at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> > ~[?:?]
> > >>        at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> > ~[?:?]
> > >>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2561)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> org.apache.solr.common.cloud.SolrZkClient.lambda$setData$7(SolrZkClient.java:361)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:79)
> > >> ~[?:?]
> > >>        at
> > >>
> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:361)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:291)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:217)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:173)
> > >> ~[?:?]
> > >>        at
> > >>
> >
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:341)
> > >> ~[?:?]
> > >>        at
> > >>
> > org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:271)
> > >> ~[?:?]
> > >>        at java.lang.Thread.run(Thread.java:829) ~[?:?]
> > >> 01:42:14.490 WARN  (OverseerStateUpdate-3026498...) [   ]
> > o.a.s.c.Overseer
> > >>     Exception when process message = {
> > >>  "replicationFactor":1,
> > >>  "fromApi":"true",
> > >>  "collection.configName":"test1-collection",
> > >>  "router.name":"compositeId",
> > >>  "createNodeSet":"EMPTY",
> > >>  "waitForFinalState":null,
> > >>  "pullReplicas":null,
> > >>  "async":"70e3b8e7-9ee1-468d-96f6-470900c4edbb",
> > >>  "router.field":null,
> > >>  "name":"test1-collection_1.2",
> > >>  "nrtReplicas":1,
> > >>  "numShards":2,
> > >>  "tlogReplicas":null,
> > >>  "alias":null,
> > >>  "operation":"create",
> > >>  "perReplicaState":null}, consider as bad message and poll out from
> the
> > >> queue
> > >>
> > >>
> > >> Is there a known incompatibility issue between Solr 9 (data node) and
> > Solr
> > >> 8 (overseer node) with CollectionAdminRequest.createCollection? This
> is
> > >> what we have been doing for a long time and works with both data and
> > >> overseer nodes are running Solr 8. Is there a way to get around this
> > issue?
> > >>
> > >> Thanks,
> > >> Patrick
> > >>
> > >>
> >
> >
>

Reply via email to