Also! I do have 2 FlinkDeployments deployed with this operator, but they are in different namespaces, and each of the per namespace metrics reports that it has 2 Deployments in them, even though there is only one according to kubectl.
Actually...we just tried to deploy a change (enabling some checkpointing) that caused one of our FlinkDeployments to fail. Now, both namespace STABLE_Counts each report 1. # curl -s <pod_ip>:<prom_port> | grep flink_k8soperator_namespace_Lifecycle_State_STABLE_Count flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",} 1.0 flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="rdf_streaming_updater",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",} 1.0 It looks like maybe this metric is not reporting per namespace, but a global count. On Mon, May 22, 2023 at 2:56 PM Andrew Otto <o...@wikimedia.org> wrote: > Oh, FWIW, I do have operator HA enabled with 2 replicas running, but in my > examples there, I am curl-ing the leader flink operator pod. > > > > On Mon, May 22, 2023 at 2:47 PM Andrew Otto <o...@wikimedia.org> wrote: > >> Hello! >> >> I'm doing some grafana+prometheus dashboarding for >> flink-kubernetes-operator. Reading metrics docs >> <https://stackoverflow.com/a/61795256>, I see that I have nice per k8s >> namespace lifecycle current count gauge metrics in Prometheus. >> >> Using kubectl, I can see that I have one FlinkDeployment in my namespace: >> >> # kubectl -n stream-enrichment-poc get flinkdeployments >> NAME JOB STATUS LIFECYCLE STATE >> flink-app-main RUNNING STABLE >> >> But, prometheus is reporting that I have 2 FlinkDeployments in the STABLE >> state. >> >> # curl -s <pod_ip>:<prom_port> | grep >> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count >> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",} >> 2.0 >> >> I'm not sure why I see 2.0 reported. >> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count reports only >> one FlinkDeployment. >> >> # curl <pod_ip>:<prom_port>/metrics | grep >> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count >> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",} >> 1.0 >> >> Is it possible that >> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count is being >> reported as an incrementing counter instead of a guage? >> >> Thanks >> -Andrew Otto >> Wikimedia Foundation >> >>