These are all JobManager metrics; have you configured prometheus to also
scrape the task manager processes?
On 06/07/2020 18:35, Manish G wrote:
The metrics I see on prometheus is like:
# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp
lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
-1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints numberOfFailedCheckpoints
(scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope:
jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count
(scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",}
2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope:
jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity
(scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",}
604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
0.0
On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <[email protected]
<mailto:[email protected]>> wrote:
You've said elsewhere that you do see some metrics in prometheus,
which are those?
Why are you configuring the host for the prometheus reporter? This
option is only for the PrometheusPushGatewayReporter.
On 06/07/2020 18:01, Manish G wrote:
Hi,
So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////
And while I can see custom metrics in Taskmanager logs, but
prometheus dashboard logs doesn't show custom metrics.
With regards
On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler
<[email protected] <mailto:[email protected]>> wrote:
You have explicitly configured a reporter list, resulting in
the slf4j reporter being ignored:
2020-07-06 13:48:22,191 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO
org.apache.flink.runtime.metrics.ReporterSetup - Excluding
reporter slf4j, not configured in reporter list (prom).
Note that nowadays metrics.reporters is no longer required;
the set of reporters is automatically determined based on
configured properties; the only use-case is disabling a
reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report
back.
On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the
metric groups/registry/reporter.
If you cannot find anything suspicious, you can also send
the foll log to me directly.
On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink
configuration is as:
metrics.reporter.slf4j.class:
org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
<[email protected] <mailto:[email protected]>> wrote:
How long did the job run for, and what is the
configured interval?
On 06/07/2020 15:51, Manish G wrote:
Hi,
Thanks for this.
I did the configuration as mentioned at the
link(changes in flink-conf.yml, copying the jar in lib
directory), and registered the Meter with metrics
group and invoked markEvent() method in the target
code. But I don't see any related logs.
I am doing this all on my local computer.
Anything else I need to do?
With regards
Manish
On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
<[email protected] <mailto:[email protected]>> wrote:
Have you looked at the SLF4J reporter?
https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in
application logs apart from
> publishing it to Prometheus?
>
> With regards