Dear Apache Flink Community,

I am currently trying to monitor an Apache Flink cluster deployed on Kubernetes using Prometheus and Grafana. Despite following the official guide (https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/operations/metrics-logging/)  on how to setup prometheus I have not been able to get Flink-specific metrics to appear in Prometheus. I am reaching out to seek your assistance, as I`ve tried many things but nothing worked.

 

# My setup:

* Kubernetes

* flink v.18 deployed as FlinkDeployment

with this manifest:

```apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  namespace: default
  name: flink-cluster
spec:
  image: flink:1.18
  flinkVersion: v1_18
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    #Added
    kubernetes.operator.metrics.reporter.prommetrics.reporters: prom
    kubernetes.operator.metrics.reporter.prommetrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
    kubernetes.operator.metrics.reporter.prom.port: 9249-9250
  serviceAccount: flink
  jobManager:
    resource:
      memory: "1048m"
      cpu: 1
  taskManager:
    resource:
      memory: "1048m"
      cpu: 1

```

* Prometheus operator install via

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack

 
* deployed a pod-monitor.yaml
```
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: flink-kubernetes-operator
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: flink-cluster
  podMetricsEndpoints:
      - port: metrics
 
```
 
# The problem
 
* I can access prometheus fine and concerning the logs of the pod-monitor, it seems to collect flink specific metrics, but I can't access these metrics with flink
* Do I even setup prometheus correctly in my flink deployment manifest?
* I also added the following line to my values.yaml file, but apart from that I change nothing:
```
metrics:
  port: 9999
```
 
# My questions
 
* Can anyone see the mistake in my deployment?
* Or does anyone have a better idea on how to monitor my flink deployment?
 
 
I would be very grateful for your answers. Thank you very much.
 
Best regards,
Oliver
 
 
 
 

 

Reply via email to