Hi Oliver, I believe you are almost there. One thing I found could improve is that in your job yaml, instead of using: kubernetes.operator.metrics.reporter.prommetrics.reporters: prom
kubernetes.operator.metrics.reporter.prommetrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory kubernetes.operator.metrics.reporter.prom.port: 9249-9250 , you should use metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory metrics.reporter.prom.port: "9249" Configs with the prefix, `kubernetes.operator`, is for the flink k8s operator itself(You may use it if you want to collect the metrics of the operator). For the job config, we do not need it. I created a detailed demo <https://github.com/bgeng777/pyflink-learning/tree/main/flink-k8s-operator-monitor> of using Prometheus to monitor jobs started by flink k8s operator. Maybe it can be helpful. Best, Biao Geng Oliver Schmied <uncharted...@gmx.at> 于2024年5月19日周日 04:21写道: > Dear Apache Flink Community, > > I am currently trying to monitor an Apache Flink cluster deployed on > Kubernetes using Prometheus and Grafana. Despite following the official > guide ( > https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/operations/metrics-logging/) > on how to setup prometheus I have not been able to get Flink-specific > metrics to appear in Prometheus. I am reaching out to seek your assistance, > as I`ve tried many things but nothing worked. > > > > # My setup: > > * Kubernetes > > * flink v.18 deployed as FlinkDeployment > > with this manifest: > > ```apiVersion: flink.apache.org/v1beta1 > kind: FlinkDeployment > metadata: > namespace: default > name: flink-cluster > spec: > image: flink:1.18 > flinkVersion: v1_18 > flinkConfiguration: > taskmanager.numberOfTaskSlots: "2" > #Added > kubernetes.operator.metrics.reporter.prommetrics.reporters: prom > > kubernetes.operator.metrics.reporter.prommetrics.reporter.prom.factory.class: > org.apache.flink.metrics.prometheus.PrometheusReporterFactory > kubernetes.operator.metrics.reporter.prom.port: 9249-9250 > serviceAccount: flink > jobManager: > resource: > memory: "1048m" > cpu: 1 > taskManager: > resource: > memory: "1048m" > cpu: 1 > > ``` > > * Prometheus operator install via > > helm repo add prometheus-community > https://prometheus-community.github.io/helm-chartshelm install prometheus > prometheus-community/kube-prometheus-stack > > > * deployed a pod-monitor.yaml > ``` > apiVersion: monitoring.coreos.com/v1 > kind: PodMonitor > metadata: > name: flink-kubernetes-operator > labels: > release: prometheus > spec: > selector: > matchLabels: > app: flink-cluster > podMetricsEndpoints: > - port: metrics > > ``` > > # The problem > > * I can access prometheus fine and concerning the logs of the pod-monitor, > it seems to collect flink specific metrics, but I can't access these > metrics with flink > * Do I even setup prometheus correctly in my flink deployment manifest? > * I also added the following line to my values.yaml file, but apart from > that I change nothing: > ``` > > metrics: port: 9999 > > ``` > > # My questions > > * Can anyone see the mistake in my deployment? > * Or does anyone have a better idea on how to monitor my flink deployment? > > > I would be very grateful for your answers. Thank you very much. > > Best regards, > Oliver > > > > > > >