Hi, I run a small 3-node test cluster with Solr Operator and Solr 9.6.1. Have configured the affinity placement plugin as follows
{ "plugin": { ".placement-plugin": { "name": ".placement-plugin", "class": "org.apache.solr.cluster.placement.plugins.AffinityPlacementFactory", "config": {"minimalFreeDiskGB":2,"prioritizedFreeDiskGB":100} } } } There is plenty of free disk and all three PODs are healthy. Now I can create one or a few collections with 3 NRT replicas successfully. The affinity plugin makes sure that each replica is on different PODs (as opposed to the default which is round-robin). Also, if one of the PODs is down, the plugin thows an error so client can re-try creating the collection once all three PODs are online. Now, after some time, creating another collection fails with message "Not enough eligible nodes to place 3 replica(s) of type NRT for shard shard1 of collection foo", even if the cluster is healthy with three nodes online and all three nodes listed in "live_nodes". The full stack trace is here https://gist.github.com/janhoy/a50e48d93be6b849cbf0a6722a89ba21 Looks like the OrderedNodePlacementPlugin somehow believes that two nodes are down or otherwise not eligible. I have to restart/delete one or two PODs for it to work again. I first thought it would be enough to restart the overseer node, but last I tried, the error mssage only became worse: "Only able to place 0 replicas". One or two more restarts may make it work again, before it again becomes locked. Debug logging does not reveal much more. I see a few similar test failures in builds mailing list: - BATS test "Affinity placement plugin using sysprop" fails three times in 2023 - PlacementPluginIntegrationTest fails tree times in 2023 and once June 1st Anyone have any insight? Jan