Hi

I believe that the objects on the Overseer queue are serialized java objects 
and so you cannot create collections while in the middle of a major upgrade.
I'd pause such cluster events during the rolling upgrade so that the Overseer 
queues are empty once the overseer node is upgraded.

Jan

> 16. okt. 2024 kl. 04:31 skrev Patrick Lok 
> <patrick....@salesforce.com.INVALID>:
> 
> Here's the request we are sending over the wire to Solr 9
> 
> 
> "class":"org.apache.solr.client.solrj.request.CollectionAdminRequest$Create",
>  "method":"GET",
>  "params.action":"CREATE",
>  "params.name":"ftest-collection_1.2",
>  "params.collection.configName":"test-collection",
>  "params.createNodeSet":"EMPTY",
>  "params.numShards":"2",
>  "params.router.name":"compositeId",
>  "params.nrtReplicas":"1",
>  "params.autoAddReplicas":"false"}
> 
> 
> On Tue, Oct 15, 2024 at 7:20 PM Patrick Lok <patrick....@salesforce.com>
> wrote:
> 
>> Hi,
>> 
>> I'm new to Solr and I'm tasked to upgrade our Solr 8.11.3 installation to
>> Solr 9.6.1.
>> 
>> I'm running into some trouble with the create collection command when it's
>> sent to a Solr 9.6.1 node with Solr 8.11.3 running as overseers.
>> 
>> The command in Java is
>>  CollectionAdminRequest.createCollection(collectionName, configName,
>> numShards, 0)
>>    .setAutoAddReplicas(false)
>>    .setRouterName("compositeId")
>>    .setCreateNodeSet("EMPTY")
>>    .setReplicationFactor(1);
>> 
>> And the error that I see on the overseer can be either of the one below. I
>> guess it depends on if the collection has been created (but deleted) before
>> or not.
>> 
>> If the collection has been created before but deleted. I'll see in the
>> overseer (Solr 8) log
>> 
>> 01:42:43.927 ERROR (OverseerThreadFactory-25-t...:8983_solr) [   ]
>> o.a.s.c.a.c.OverseerCollectionMessageHandler      Collection:
>> test-collection_1.2 operation: create failed
>> org.apache.solr.common.SolrException: Could not fully create collection:
>> test-collection_1.2
>>        at
>> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:218)
>> ~[?:?]
>>        at
>> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:271)
>> ~[?:?]
>>        at
>> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:524)
>> ~[?:?]
>>        at
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
>> ~[?:?]
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>> ~[?:?]
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>> ~[?:?]
>>        at java.lang.Thread.run(Thread.java:829) ~[?:?]
>> 
>> 
>> 
>> 
>> But if the collection has never been created before, then I see in the
>> overseer log
>> 
>> 01:42:14.439 INFO  (OverseerThreadFactory-25-thread-..._solr) [   ]
>> o.a.s.c.a.c.CreateCollectionCmd      Create collection test1-collection_1.2
>> 01:42:14.442 INFO  (OverseerCollectionConfigSetProcessor-...) [   ]
>> o.a.s.c.OverseerTaskQueue      Response ZK path:
>> /overseer/collection-queue-work/qnr-0000707821 doesn't exist. Requestor may
>> have disconnected from ZooKeeper
>> 01:42:14.469 ERROR (OverseerStateUpdate-3026498...) [   ] o.a.s.c.Overseer
>>     Exception in Overseer main queue loop
>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
>> NoNode for /clusterstate.json
>>        at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[?:?]
>>        at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[?:?]
>>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2561)
>> ~[?:?]
>>        at
>> org.apache.solr.common.cloud.SolrZkClient.lambda$setData$7(SolrZkClient.java:361)
>> ~[?:?]
>>        at
>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:79)
>> ~[?:?]
>>        at
>> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:361)
>> ~[?:?]
>>        at
>> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:291)
>> ~[?:?]
>>        at
>> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:217)
>> ~[?:?]
>>        at
>> org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:173)
>> ~[?:?]
>>        at
>> org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:341)
>> ~[?:?]
>>        at
>> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:271)
>> ~[?:?]
>>        at java.lang.Thread.run(Thread.java:829) ~[?:?]
>> 01:42:14.490 WARN  (OverseerStateUpdate-3026498...) [   ] o.a.s.c.Overseer
>>     Exception when process message = {
>>  "replicationFactor":1,
>>  "fromApi":"true",
>>  "collection.configName":"test1-collection",
>>  "router.name":"compositeId",
>>  "createNodeSet":"EMPTY",
>>  "waitForFinalState":null,
>>  "pullReplicas":null,
>>  "async":"70e3b8e7-9ee1-468d-96f6-470900c4edbb",
>>  "router.field":null,
>>  "name":"test1-collection_1.2",
>>  "nrtReplicas":1,
>>  "numShards":2,
>>  "tlogReplicas":null,
>>  "alias":null,
>>  "operation":"create",
>>  "perReplicaState":null}, consider as bad message and poll out from the
>> queue
>> 
>> 
>> Is there a known incompatibility issue between Solr 9 (data node) and Solr
>> 8 (overseer node) with CollectionAdminRequest.createCollection? This is
>> what we have been doing for a long time and works with both data and
>> overseer nodes are running Solr 8. Is there a way to get around this issue?
>> 
>> Thanks,
>> Patrick
>> 
>> 

Reply via email to