Hi Oliver, I am not experienced in StateFun but its doc says 'Along with the standard metric scopes, Stateful Functions supports Function Scope which one level below operator scope.' So, as long as you can collect flink's metric via Prometheus, ideally, there should be no difference between using StateFun's metrics and using normal flink metrics. Once you have configured the Prometheus metric reporter following the doc <https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/ops/metrics/>, maybe you can check the collected metrics to see if there are some about StateFun.
Best, Biao Geng Oliver Schmied <uncharted...@gmx.at> 于2024年5月23日周四 21:30写道: > Dear Apache Flink community, > > > > I am setting up an apche flink statefun runtime on Kubernetes, following > the flink-playground example: > https://github.com/apache/flink-statefun-playground/tree/main/deployments/k8s > . > > This is the manifest I used for creating the statefun enviroment: > > ```--- > apiVersion: v1 > kind: ConfigMap > metadata: > namespace: statefun > name: flink-config > labels: > app: statefun > data: > flink-conf.yaml: |+ > jobmanager.rpc.address: statefun-master > taskmanager.numberOfTaskSlots: 1 > blob.server.port: 6124 > jobmanager.rpc.port: 6123 > taskmanager.rpc.port: 6122 > classloader.parent-first-patterns.additional: > org.apache.flink.statefun;org.apache.kafka;com.google.protobuf > state.backend: rocksdb > state.backend.rocksdb.timer-service.factory: ROCKSDB > state.backend.incremental: true > parallelism.default: 1 > s3.access-key: minioadmin > s3.secret-key: minioadmin > state.checkpoints.dir: s3://checkpoints/subscriptions > s3.endpoint: http://minio.statefun.svc.cluster.local:9000 > s3.path-style-access: true > jobmanager.memory.process.size: 1g > taskmanager.memory.process.size: 1g > log4j-console.properties: |+ > monitorInterval=30 > rootLogger.level = INFO > rootLogger.appenderRef.console.ref = ConsoleAppender > logger.akka.name = akka > logger.akka.level = INFO > logger.kafka.name= org.apache.kafka > logger.kafka.level = INFO > logger.hadoop.name = org.apache.hadoop > logger.hadoop.level = INFO > logger.zookeeper.name = org.apache.zookeeper > logger.zookeeper.level = INFO > appender.console.name = ConsoleAppender > appender.console.type = CONSOLE > appender.console.layout.type = PatternLayout > appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} > %-5p %-60c %x - %m%n > logger.netty.name = > org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline > logger.netty.level = OFF > --- > apiVersion: v1 > kind: Service > metadata: > name: statefun-master-rest > namespace: statefun > spec: > type: NodePort > ports: > - name: rest > port: 8081 > targetPort: 8081 > selector: > app: statefun > component: master > --- > apiVersion: v1 > kind: Service > metadata: > name: statefun-master > namespace: statefun > spec: > type: ClusterIP > ports: > - name: rpc > port: 6123 > - name: blob > port: 6124 > - name: ui > port: 8081 > selector: > app: statefun > component: master > --- > apiVersion: apps/v1 > kind: Deployment > metadata: > name: statefun-master > namespace: statefun > spec: > replicas: 1 > selector: > matchLabels: > app: statefun > component: master > template: > metadata: > labels: > app: statefun > component: master > spec: > containers: > - name: master > image: apache/flink-statefun:3.3.0 > imagePullPolicy: IfNotPresent > env: > - name: ROLE > value: master > - name: MASTER_HOST > value: statefun-master > ports: > - containerPort: 6123 > name: rpc > - containerPort: 6124 > name: blob > - containerPort: 8081 > name: ui > livenessProbe: > tcpSocket: > port: 6123 > initialDelaySeconds: 30 > periodSeconds: 60 > volumeMounts: > - name: flink-config-volume > mountPath: /opt/flink/conf > - name: module-config-volume > mountPath: /opt/statefun/modules/example > volumes: > - name: flink-config-volume > configMap: > name: flink-config > items: > - key: flink-conf.yaml > path: flink-conf.yaml > - key: log4j-console.properties > path: log4j-console.properties > - name: module-config-volume > configMap: > name: module-config > items: > - key: module.yaml > path: module.yaml > --- > apiVersion: apps/v1 > kind: Deployment > metadata: > name: statefun-worker > namespace: statefun > spec: > replicas: 1 > selector: > matchLabels: > app: statefun > component: worker > template: > metadata: > labels: > app: statefun > component: worker > spec: > containers: > - name: worker > image: apache/flink-statefun:3.3.0 > imagePullPolicy: IfNotPresent > env: > - name: ROLE > value: worker > - name: MASTER_HOST > value: statefun-master > resources: > requests: > memory: "1Gi" > ports: > - containerPort: 6122 > name: rpc > - containerPort: 6124 > name: blob > - containerPort: 8081 > name: ui > livenessProbe: > tcpSocket: > port: 6122 > initialDelaySeconds: 30 > periodSeconds: 60 > volumeMounts: > - name: flink-config-volume > mountPath: /opt/flink/conf > - name: module-config-volume > mountPath: /opt/statefun/modules/example > volumes: > - name: flink-config-volume > configMap: > name: flink-config > items: > - key: flink-conf.yaml > path: flink-conf.yaml > - key: log4j-console.properties > path: log4j-console.properties > - name: module-config-volume > configMap: > name: module-config > items: > - key: module.yaml > path: module.yaml > > ``` > > Problem: > > I could not find any sources that describe how to monitor the flink > metrics of the statefun runtime with Prometheus on Kubernetes. I am > particular interested in the flink statefun specific metrics ( > https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/deployment/metrics/ > ) > > Could someone please guide me on how to set this up, or share any > resources that cover this topic? > > > Any help or suggestions would be greatly appreciated. > > Thanks for your time and help. > > Best regards, > > Oliver >