oops! sent a little too soon ... also wanted to mention that if you're
running Solr 8+, you can use /admin/info/health instead of
/admin/info/system for the probe path (see:
https://issues.apache.org/jira/browse/SOLR-11126), like this:

livenessProbe:
  httpGet:
    path: /admin/info/health
    port: 8983
readinessProbe:
  httpGet:
    path: /admin/info/health
    port: 8983


On Fri, Nov 12, 2021 at 11:11 AM Timothy Potter <thelabd...@apache.org> wrote:
>
> Some things to consider ...
>
> If one out of many Solr cores is down on a pod, I would not want
> Kubelet to restart my Solr pod (if liveness probe fails) or even
> remove it from the load-balancer service (if readiness probe fails)
> because the pod can still serve traffic for the healthy cores.
> Requiring all cores on a pod to be healthy seems like too high of a
> bar for K8s probes.
>
> Killing a busy Solr pod with large collections with active update and
> query traffic can take a "long" time to come back online (long being
> relative to your typical Go based microservice that can restart in
> milliseconds, which is what these probes were designed for)
>
> SolrCloud has its own request routing logic based on a very up-to-date
> cluster state that's wired into ZK watches, so Solr can be resilient
> to downed replicas provided there is at least one per shard that is
> healthy.
>
> Moreover, replicas may take time to recover and the last thing you'd
> want is for K8s to restart a pod while a replica is close to
> recovering and re-entering the mix as a healthy replica.
>
> You could maybe use the request to requireHealthyCores=true for a startup 
> probe.
>
> For me, the liveness / readiness probes are more applicable for
> microservices that are fast to fail and restart and you can have many
> of them so pulling one out of the load-balancer due to a readiness
> probe failure is usually the right answer. Moreover, with
> microservices, you typically have a service that does one thing, but
> Solr pods typically host multiple cores.
>
> Lastly, the Solr operator allows you to customize the probe endpoints,
> see: spec.customSolrKubeOptions.podOptions.(liveness|readiness|startup)Probe.
> We default it to /admin/info/system for the reasons I raised above.
>
> Tim
>
> On Fri, Nov 12, 2021 at 9:15 AM Mathieu Marie
> <mma...@salesforce.com.invalid> wrote:
> >
> > Beware that using
> > http://node:8983/api/node/health?requireHealthyCores=true for
> > your liveness assumes that ZK is up and running.
> > We are all hoping that ZK is never down, but if it happens, your Solr
> > liveness probe will start to fail too, and K8S will restart all our Solr,
> > adding instability to a cluster that is already in a bad shape.
> >
> > We've configured our liveness to /solr/admin/info/system too, and we rely
> > on ZK liveness probe to restart ZK quickly if there is an issue.
> > Liveness probes should never rely on a subsystem being up, else all your
> > services will go down one after the other.
> >
> > Regards,
> > Mathieu

Reply via email to