amit, This is expected behaviour from counter . If the total count irrespective of the restarts needed to be found, aggregate functions need to be applied on the counter . Example sum(Rate(counter)) https://prometheus.io/docs/prometheus/latest/querying/functions/
Prasanna. On Tue, Jun 15, 2021 at 8:25 AM Amit Bhatia <bhatia.amit1...@gmail.com> wrote: > Hi, > > We have configured jobmanager HA with flink 1.12.1 on the k8s environment. > We have implemented a HA solution using Native K8s HA solution ( > https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink). > We have used deployment controller for both jobmanager & taskmanager pods. > > So whenever a leader jobmanager crashes and the same jobmanager becomes > leader again then everything works fine but whenever a leader jobmanager > crashes and some other standby jobmanager becomes leader then metric count > gets reset and it starts the request count again from 1. Is it the expected > behaviour ? or is there any specific configuration required so that even if > the leader jobmanager changes then instead of resetting the metric count it > continues the count. > > Regards, > Amit >