Thank you Vineeth for creating the PIP. This PIP will be useful to capture a broker's health as a part of metrics or dashboards and we can also set up various alerts on it. Generally we should check the status API of the broker for the HC/liveness probe in k8 instead of sanity because that can bring down the entire broker cluster so, I think even Pulsar helm-chart also does status for liveness or periodic health-check and that should be the preferred way. I think it's good to have more monitoring data points as a part of metrics and it will be useful if we can capture broker's sanity as a part of metrics for better monitoring. However, it must be disabled by default and we should have a configuration to control and enable it.
Thanks, Rajan On Wed, May 24, 2023 at 1:15 PM Enrico Olivelli <eolive...@gmail.com> wrote: > Vineeth, > > Il Mer 24 Mag 2023, 21:57 vineeth p <vineethreddypo...@gmail.com> ha > scritto: > > > Hello, > > > > Broker metrics don't have anything to indicate the health of the broker > (to > > indicate if the broker is active). In Prometheus broker metrics which are > > used for monitoring, it will be useful if metrics also show the broker > > health. This way, Prometheus can automatically scrape the broker state > and > > can be used for monitoring purposes. So we need such a metric to capture > > broker health. > > > > You can review the PIP at https://github.com/apache/pulsar/issues/20389 > > > Are you running on k8s? > Usually you use the heathcheck for k8s probes and this means that the HC is > already periodically executed. > If this is the case we could publish the value of the last HC without > adding a task internal to the broker that triggers the heathcheck. > If we use the default scheduler maybe we could run into some weird > deadlocks, because the HC writes/reads using the local broker. > > > Enrico > > > > > > > Regards, > > Vineeth > > >