Hi Jan, Thank you so much for responding. Really appreciate it.
I thought that's a problem with Solr 8.5 or older. We have migrated to Solr 8.11.3 and removed the use of the useUnsafeOverseerResponse flag. And from the other error message ("consider as bad message and poll out from the queue") that I'm seeing, it looks like the overseer is actually able to deserialize the message, but it's hitting a KeeperException? Thanks, Patrick On Wed, Oct 16, 2024 at 12:41 AM Jan Høydahl <jan....@cominvent.com> wrote: > Hi > > I believe that the objects on the Overseer queue are serialized java > objects and so you cannot create collections while in the middle of a major > upgrade. > I'd pause such cluster events during the rolling upgrade so that the > Overseer queues are empty once the overseer node is upgraded. > > Jan > > > 16. okt. 2024 kl. 04:31 skrev Patrick Lok <patrick....@salesforce.com > .INVALID>: > > > > Here's the request we are sending over the wire to Solr 9 > > > > > > > "class":"org.apache.solr.client.solrj.request.CollectionAdminRequest$Create", > > "method":"GET", > > "params.action":"CREATE", > > "params.name":"ftest-collection_1.2", > > "params.collection.configName":"test-collection", > > "params.createNodeSet":"EMPTY", > > "params.numShards":"2", > > "params.router.name":"compositeId", > > "params.nrtReplicas":"1", > > "params.autoAddReplicas":"false"} > > > > > > On Tue, Oct 15, 2024 at 7:20 PM Patrick Lok <patrick....@salesforce.com> > > wrote: > > > >> Hi, > >> > >> I'm new to Solr and I'm tasked to upgrade our Solr 8.11.3 installation > to > >> Solr 9.6.1. > >> > >> I'm running into some trouble with the create collection command when > it's > >> sent to a Solr 9.6.1 node with Solr 8.11.3 running as overseers. > >> > >> The command in Java is > >> CollectionAdminRequest.createCollection(collectionName, configName, > >> numShards, 0) > >> .setAutoAddReplicas(false) > >> .setRouterName("compositeId") > >> .setCreateNodeSet("EMPTY") > >> .setReplicationFactor(1); > >> > >> And the error that I see on the overseer can be either of the one > below. I > >> guess it depends on if the collection has been created (but deleted) > before > >> or not. > >> > >> If the collection has been created before but deleted. I'll see in the > >> overseer (Solr 8) log > >> > >> 01:42:43.927 ERROR (OverseerThreadFactory-25-t...:8983_solr) [ ] > >> o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: > >> test-collection_1.2 operation: create failed > >> org.apache.solr.common.SolrException: Could not fully create collection: > >> test-collection_1.2 > >> at > >> > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:218) > >> ~[?:?] > >> at > >> > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:271) > >> ~[?:?] > >> at > >> > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:524) > >> ~[?:?] > >> at > >> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218) > >> ~[?:?] > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > >> ~[?:?] > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > >> ~[?:?] > >> at java.lang.Thread.run(Thread.java:829) ~[?:?] > >> > >> > >> > >> > >> But if the collection has never been created before, then I see in the > >> overseer log > >> > >> 01:42:14.439 INFO (OverseerThreadFactory-25-thread-..._solr) [ ] > >> o.a.s.c.a.c.CreateCollectionCmd Create collection > test1-collection_1.2 > >> 01:42:14.442 INFO (OverseerCollectionConfigSetProcessor-...) [ ] > >> o.a.s.c.OverseerTaskQueue Response ZK path: > >> /overseer/collection-queue-work/qnr-0000707821 doesn't exist. Requestor > may > >> have disconnected from ZooKeeper > >> 01:42:14.469 ERROR (OverseerStateUpdate-3026498...) [ ] > o.a.s.c.Overseer > >> Exception in Overseer main queue loop > >> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > >> NoNode for /clusterstate.json > >> at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:118) > ~[?:?] > >> at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:54) > ~[?:?] > >> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2561) > >> ~[?:?] > >> at > >> > org.apache.solr.common.cloud.SolrZkClient.lambda$setData$7(SolrZkClient.java:361) > >> ~[?:?] > >> at > >> > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:79) > >> ~[?:?] > >> at > >> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:361) > >> ~[?:?] > >> at > >> > org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:291) > >> ~[?:?] > >> at > >> > org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:217) > >> ~[?:?] > >> at > >> > org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:173) > >> ~[?:?] > >> at > >> > org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:341) > >> ~[?:?] > >> at > >> > org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:271) > >> ~[?:?] > >> at java.lang.Thread.run(Thread.java:829) ~[?:?] > >> 01:42:14.490 WARN (OverseerStateUpdate-3026498...) [ ] > o.a.s.c.Overseer > >> Exception when process message = { > >> "replicationFactor":1, > >> "fromApi":"true", > >> "collection.configName":"test1-collection", > >> "router.name":"compositeId", > >> "createNodeSet":"EMPTY", > >> "waitForFinalState":null, > >> "pullReplicas":null, > >> "async":"70e3b8e7-9ee1-468d-96f6-470900c4edbb", > >> "router.field":null, > >> "name":"test1-collection_1.2", > >> "nrtReplicas":1, > >> "numShards":2, > >> "tlogReplicas":null, > >> "alias":null, > >> "operation":"create", > >> "perReplicaState":null}, consider as bad message and poll out from the > >> queue > >> > >> > >> Is there a known incompatibility issue between Solr 9 (data node) and > Solr > >> 8 (overseer node) with CollectionAdminRequest.createCollection? This is > >> what we have been doing for a long time and works with both data and > >> overseer nodes are running Solr 8. Is there a way to get around this > issue? > >> > >> Thanks, > >> Patrick > >> > >> > >