In flink-conf.yaml: *metrics.reporter.prom.port: 9250-9260* This is based on information provided here <https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter> *port - (optional) the port the Prometheus exporter listens on, defaults to 9249 <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>. In order to be able to run several instances of the reporter on one host (e.g. when one TaskManager is colocated with the JobManager) it is advisable to use a port range like 9250-9260.*
As I am running flink locally, so both jobmanager and taskmanager are colocated. In prometheus.yml: *- job_name: 'flinkprometheus' scrape_interval: 5s static_configs: - targets: ['localhost:9250', 'localhost:9251'] metrics_path: /* This is the whole configuration I have done based on several tutorials and blogs available online. On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <ches...@apache.org> wrote: > These are all JobManager metrics; have you configured prometheus to also > scrape the task manager processes? > > On 06/07/2020 18:35, Manish G wrote: > > The metrics I see on prometheus is like: > > # HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp > lastCheckpointRestoreTimestamp (scope: jobmanager_job) > # TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge > flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} > -1.0 > # HELP flink_jobmanager_job_numberOfFailedCheckpoints > numberOfFailedCheckpoints (scope: jobmanager_job) > # TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge > flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} > 0.0 > # HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: > jobmanager_Status_JVM_Memory_Heap) > # TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge > flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9 > # HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count > (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep) > # TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge > flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} > 2.0 > # HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: > jobmanager_Status_JVM_CPU) > # TYPE flink_jobmanager_Status_JVM_CPU_Time gauge > flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9 > # HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity > (scope: jobmanager_Status_JVM_Memory_Direct) > # TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge > flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} > 604064.0 > # HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job) > # TYPE flink_jobmanager_job_fullRestarts gauge > flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} > 0.0 > > > > > On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <ches...@apache.org> > wrote: > >> You've said elsewhere that you do see some metrics in prometheus, which >> are those? >> >> Why are you configuring the host for the prometheus reporter? This >> option is only for the PrometheusPushGatewayReporter. >> >> On 06/07/2020 18:01, Manish G wrote: >> >> Hi, >> >> So I have following in flink-conf.yml : >> ////////////////////////////////////////////////////// >> metrics.reporter.prom.class: >> org.apache.flink.metrics.prometheus.PrometheusReporter >> metrics.reporter.prom.host: 127.0.0.1 >> metrics.reporter.prom.port: 9999 >> metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter >> metrics.reporter.slf4j.interval: 30 SECONDS >> ////////////////////////////////////////////////////// >> >> And while I can see custom metrics in Taskmanager logs, but prometheus >> dashboard logs doesn't show custom metrics. >> >> With regards >> >> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> You have explicitly configured a reporter list, resulting in the slf4j >>> reporter being ignored: >>> >>> 2020-07-06 13:48:22,191 INFO >>> org.apache.flink.configuration.GlobalConfiguration - Loading >>> configuration property: metrics.reporters, prom >>> 2020-07-06 13:48:23,203 INFO >>> org.apache.flink.runtime.metrics.ReporterSetup - Excluding >>> reporter slf4j, not configured in reporter list (prom). >>> >>> Note that nowadays metrics.reporters is no longer required; the set of >>> reporters is automatically determined based on configured properties; the >>> only use-case is disabling a reporter without having to remove the entire >>> configuration. >>> I'd suggest to just remove the option, try again, and report back. >>> >>> On 06/07/2020 16:35, Chesnay Schepler wrote: >>> >>> Please enable debug logging and search for warnings from the metric >>> groups/registry/reporter. >>> >>> If you cannot find anything suspicious, you can also send the foll log >>> to me directly. >>> >>> On 06/07/2020 16:29, Manish G wrote: >>> >>> Job is an infinite streaming one, so it keeps going. Flink configuration >>> is as: >>> >>> metrics.reporter.slf4j.class: >>> org.apache.flink.metrics.slf4j.Slf4jReporter >>> metrics.reporter.slf4j.interval: 30 SECONDS >>> >>> >>> >>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <ches...@apache.org> >>> wrote: >>> >>>> How long did the job run for, and what is the configured interval? >>>> >>>> >>>> On 06/07/2020 15:51, Manish G wrote: >>>> >>>> Hi, >>>> >>>> Thanks for this. >>>> >>>> I did the configuration as mentioned at the link(changes in >>>> flink-conf.yml, copying the jar in lib directory), and registered the Meter >>>> with metrics group and invoked markEvent() method in the target code. But I >>>> don't see any related logs. >>>> I am doing this all on my local computer. >>>> >>>> Anything else I need to do? >>>> >>>> With regards >>>> Manish >>>> >>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <ches...@apache.org> >>>> wrote: >>>> >>>>> Have you looked at the SLF4J reporter? >>>>> >>>>> >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter >>>>> >>>>> On 06/07/2020 13:49, Manish G wrote: >>>>> > Hi, >>>>> > >>>>> > Is it possible to log Flink metrics in application logs apart from >>>>> > publishing it to Prometheus? >>>>> > >>>>> > With regards >>>>> >>>>> >>>>> >>>> >>> >>> >> >