If recovery failed, then that core is dead, it has given up. So if an agent has just restarted or started a node, then it will wait until all cores have a "stable" or "final" state, before it declares the NODE as healthy, and consider restarting other nodes. If a core (replica of a shard in a collection) is in DOWN state, it has just booted and will soon go into RECOVERING. It will stay in RECOVERING until it either is OK or RECOVERY_FAILED. There is no point in waiting in an endless loop for every single core on a node to come up, we just want them to finish initializing and enter a stable state. I guess other logic in solr-operator will take care of deciding how many replicas for a shard are live, as to whether it is safe to take down the next pod/node.
Jan > 31. okt. 2021 kl. 16:14 skrev 戴晓彬 <xiaobin_...@foxmail.com>: > > I'm a little puzzled, why UNHEALTHY_STATES does not contain > State.RECOVERY_FAILED > >> 2021年10月31日 22:45,Jan Høydahl <jan....@cominvent.com> 写道: >> >> See >> https://solr.apache.org/guide/8_10/implicit-requesthandlers.html#admin-handlers, >> you can query each node with >> >> http://node:8983/api/node/health?requireHealthyCores=true >> >> It will only return HTTP 200 if all active cores on the node are healthy >> (none starting or recovering). >> >> Jan >> >>> 27. okt. 2021 kl. 17:27 skrev Vincenzo D'Amore <v.dam...@gmail.com>: >>> >>> Hi all, >>> >>> when a Solr instance is started I would be sure all the indexes present are >>> up and running, in other words that the instance is healthy. >>> The healthy status (aka liveness/readiness) is especially useful when a >>> Kubernetes SolrCloud cluster has to be restarted for any configuration >>> management needs and you want to apply your change one node at a time. >>> AFAIK I can ping only one index at a time, but there is no way out of the >>> box to test that a bunch of indexes are active (green status). >>> Have you ever faced the same problem? What do you think? >>> >>> Best regards, >>> Vincenzo >>> >>> -- >>> Vincenzo D'Amore >> >