Hi I believe that the objects on the Overseer queue are serialized java objects and so you cannot create collections while in the middle of a major upgrade. I'd pause such cluster events during the rolling upgrade so that the Overseer queues are empty once the overseer node is upgraded.
Jan > 16. okt. 2024 kl. 04:31 skrev Patrick Lok > <patrick....@salesforce.com.INVALID>: > > Here's the request we are sending over the wire to Solr 9 > > > "class":"org.apache.solr.client.solrj.request.CollectionAdminRequest$Create", > "method":"GET", > "params.action":"CREATE", > "params.name":"ftest-collection_1.2", > "params.collection.configName":"test-collection", > "params.createNodeSet":"EMPTY", > "params.numShards":"2", > "params.router.name":"compositeId", > "params.nrtReplicas":"1", > "params.autoAddReplicas":"false"} > > > On Tue, Oct 15, 2024 at 7:20 PM Patrick Lok <patrick....@salesforce.com> > wrote: > >> Hi, >> >> I'm new to Solr and I'm tasked to upgrade our Solr 8.11.3 installation to >> Solr 9.6.1. >> >> I'm running into some trouble with the create collection command when it's >> sent to a Solr 9.6.1 node with Solr 8.11.3 running as overseers. >> >> The command in Java is >> CollectionAdminRequest.createCollection(collectionName, configName, >> numShards, 0) >> .setAutoAddReplicas(false) >> .setRouterName("compositeId") >> .setCreateNodeSet("EMPTY") >> .setReplicationFactor(1); >> >> And the error that I see on the overseer can be either of the one below. I >> guess it depends on if the collection has been created (but deleted) before >> or not. >> >> If the collection has been created before but deleted. I'll see in the >> overseer (Solr 8) log >> >> 01:42:43.927 ERROR (OverseerThreadFactory-25-t...:8983_solr) [ ] >> o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: >> test-collection_1.2 operation: create failed >> org.apache.solr.common.SolrException: Could not fully create collection: >> test-collection_1.2 >> at >> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:218) >> ~[?:?] >> at >> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:271) >> ~[?:?] >> at >> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:524) >> ~[?:?] >> at >> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218) >> ~[?:?] >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >> ~[?:?] >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >> ~[?:?] >> at java.lang.Thread.run(Thread.java:829) ~[?:?] >> >> >> >> >> But if the collection has never been created before, then I see in the >> overseer log >> >> 01:42:14.439 INFO (OverseerThreadFactory-25-thread-..._solr) [ ] >> o.a.s.c.a.c.CreateCollectionCmd Create collection test1-collection_1.2 >> 01:42:14.442 INFO (OverseerCollectionConfigSetProcessor-...) [ ] >> o.a.s.c.OverseerTaskQueue Response ZK path: >> /overseer/collection-queue-work/qnr-0000707821 doesn't exist. Requestor may >> have disconnected from ZooKeeper >> 01:42:14.469 ERROR (OverseerStateUpdate-3026498...) [ ] o.a.s.c.Overseer >> Exception in Overseer main queue loop >> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = >> NoNode for /clusterstate.json >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[?:?] >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[?:?] >> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2561) >> ~[?:?] >> at >> org.apache.solr.common.cloud.SolrZkClient.lambda$setData$7(SolrZkClient.java:361) >> ~[?:?] >> at >> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:79) >> ~[?:?] >> at >> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:361) >> ~[?:?] >> at >> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:291) >> ~[?:?] >> at >> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:217) >> ~[?:?] >> at >> org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:173) >> ~[?:?] >> at >> org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:341) >> ~[?:?] >> at >> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:271) >> ~[?:?] >> at java.lang.Thread.run(Thread.java:829) ~[?:?] >> 01:42:14.490 WARN (OverseerStateUpdate-3026498...) [ ] o.a.s.c.Overseer >> Exception when process message = { >> "replicationFactor":1, >> "fromApi":"true", >> "collection.configName":"test1-collection", >> "router.name":"compositeId", >> "createNodeSet":"EMPTY", >> "waitForFinalState":null, >> "pullReplicas":null, >> "async":"70e3b8e7-9ee1-468d-96f6-470900c4edbb", >> "router.field":null, >> "name":"test1-collection_1.2", >> "nrtReplicas":1, >> "numShards":2, >> "tlogReplicas":null, >> "alias":null, >> "operation":"create", >> "perReplicaState":null}, consider as bad message and poll out from the >> queue >> >> >> Is there a known incompatibility issue between Solr 9 (data node) and Solr >> 8 (overseer node) with CollectionAdminRequest.createCollection? This is >> what we have been doing for a long time and works with both data and >> overseer nodes are running Solr 8. Is there a way to get around this issue? >> >> Thanks, >> Patrick >> >>