[GitHub] [solr-operator] HoustonPutman commented on issue #504: Probes for readiness and liveness should be different

GitBox Tue, 13 Dec 2022 08:33:39 -0800


HoustonPutman commented on issue #504:
URL: https://github.com/apache/solr-operator/issues/504#issuecomment-1349006624


   > So the liveness probe failing will cause k8s to kill the POD, and the 
readiness probe failing will cause traffic to be temporarily routed to other 
replicas, is that about right?
   
   > Also, if you have 500 cores on a server, and only one is recovering, it 
would be a pity if the pod was flagged as not-ready, since Solr is capable of 
routing traffic to all the other cores. But this is perhaps where the PDB comes 
in...
   
   Yes, but the readiness probe only affects Services that have the 
`PublishNotReadyAddresses` option set to `true`. Our common service (one 
endpoint for all nodes) has this set to true, while the headless service has 
this set to false. Therefore Solr can still route traffic to the example node 
you mentioned, as is necessary for things like recovery (All internal requests 
will go to node-specific endpoints, which are managed by the headless service). 
However users initial requests will not end up on that node if they are using 
the common service.
   
   I don't really have a strong feeling either way on this, whether its a good 
thing or a bad thing. But if we use the healthcheck endpoint and just use it 
for the zk connection then that is safer. We definitely don't want to route 
requests to nodes that can't talk to ZK, when there are other nodes available. 
(Solr will deal with this itself via live_nodes hopefully).
   
   > Also, a rolling restart uses the readiness probe as a sign that it can 
move on to take down the next one? So for that reason we'd like all cores to be 
up.
   
   The readiness probe is also used for this, but luckily for us the 
ManagedUpdate option for Solr really doesn't use it a whole lot. Instead it 
reads the cluster state manually to see when things are healthy enough to move 
on to the next node(s). So the vast majority of people shouldn't be affected to 
much by this aspect. (One note, the readiness probe is used when talking about 
the number of "down" pods, not replicas, so the operator will wait till all 
nodes are "ready" to do the last pod restart, the overseer.)
   
   Overall I think that the readiness check should be fine to use the 
healthcheck handler, just making sure that jetty and ZK are ok. Maybe we can 
add some filesystem stuff there, but I'm not sure how much benefit that will 
give... The liveness check should probably stay the same as it is now until we 
find a better way to check whether Solr should be restarted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[GitHub] [solr-operator] HoustonPutman commented on issue #504: Probes for readiness and liveness should be different

Reply via email to