Hi Oliver,

I believe you are almost there. One thing I found could improve is that in
your job yaml, instead of using:
    kubernetes.operator.metrics.reporter.prommetrics.reporters: prom

kubernetes.operator.metrics.reporter.prommetrics.reporter.prom.factory.class:
org.apache.flink.metrics.prometheus.PrometheusReporterFactory
    kubernetes.operator.metrics.reporter.prom.port: 9249-9250
, you should use
    metrics.reporter.prom.factory.class:
org.apache.flink.metrics.prometheus.PrometheusReporterFactory
    metrics.reporter.prom.port: "9249"

Configs with the prefix, `kubernetes.operator`, is for the flink k8s
operator itself(You may use it if you want to collect the metrics of the
operator). For the job config, we do not need it.

I created a detailed demo
<https://github.com/bgeng777/pyflink-learning/tree/main/flink-k8s-operator-monitor>
of using Prometheus to monitor jobs started by flink k8s operator. Maybe it
can be helpful.

Best,
Biao Geng


Oliver Schmied <uncharted...@gmx.at> 于2024年5月19日周日 04:21写道:

> Dear Apache Flink Community,
>
> I am currently trying to monitor an Apache Flink cluster deployed on
> Kubernetes using Prometheus and Grafana. Despite following the official
> guide (
> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/operations/metrics-logging/)
> on how to setup prometheus I have not been able to get Flink-specific
> metrics to appear in Prometheus. I am reaching out to seek your assistance,
> as I`ve tried many things but nothing worked.
>
>
>
> # My setup:
>
> * Kubernetes
>
> * flink v.18 deployed as FlinkDeployment
>
> with this manifest:
>
> ```apiVersion: flink.apache.org/v1beta1
> kind: FlinkDeployment
> metadata:
>   namespace: default
>   name: flink-cluster
> spec:
>   image: flink:1.18
>   flinkVersion: v1_18
>   flinkConfiguration:
>     taskmanager.numberOfTaskSlots: "2"
>     #Added
>     kubernetes.operator.metrics.reporter.prommetrics.reporters: prom
>
> kubernetes.operator.metrics.reporter.prommetrics.reporter.prom.factory.class:
> org.apache.flink.metrics.prometheus.PrometheusReporterFactory
>     kubernetes.operator.metrics.reporter.prom.port: 9249-9250
>   serviceAccount: flink
>   jobManager:
>     resource:
>       memory: "1048m"
>       cpu: 1
>   taskManager:
>     resource:
>       memory: "1048m"
>       cpu: 1
>
> ```
>
> * Prometheus operator install via
>
> helm repo add prometheus-community 
> https://prometheus-community.github.io/helm-chartshelm install prometheus 
> prometheus-community/kube-prometheus-stack
>
>
> * deployed a pod-monitor.yaml
> ```
> apiVersion: monitoring.coreos.com/v1
> kind: PodMonitor
> metadata:
>   name: flink-kubernetes-operator
>   labels:
>     release: prometheus
> spec:
>   selector:
>     matchLabels:
>       app: flink-cluster
>   podMetricsEndpoints:
>       - port: metrics
>
> ```
>
> # The problem
>
> * I can access prometheus fine and concerning the logs of the pod-monitor,
> it seems to collect flink specific metrics, but I can't access these
> metrics with flink
> * Do I even setup prometheus correctly in my flink deployment manifest?
> * I also added the following line to my values.yaml file, but apart from
> that I change nothing:
> ```
>
> metrics:  port: 9999
>
> ```
>
> # My questions
>
> * Can anyone see the mistake in my deployment?
> * Or does anyone have a better idea on how to monitor my flink deployment?
>
>
> I would be very grateful for your answers. Thank you very much.
>
> Best regards,
> Oliver
>
>
>
>
>
>
>

Reply via email to