Re: Logging Flink metrics

Chesnay Schepler Mon, 06 Jul 2020 09:51:07 -0700

These are all JobManager metrics; have you configured prometheus to alsoscrape the task manager processes?


On 06/07/2020 18:35, Manish G wrote:

The metrics I see on prometheus is like:
# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp 
lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
 -1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints numberOfFailedCheckpoints 
(scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
 0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: 
jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count 
(scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",}
 2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: 
jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity 
(scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 
604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
 0.0

On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <[email protected]<mailto:[email protected]>> wrote:


    You've said elsewhere that you do see some metrics in prometheus,
    which are those?

    Why are you configuring the host for the prometheus reporter? This
    option is only for the PrometheusPushGatewayReporter.

    On 06/07/2020 18:01, Manish G wrote:

    Hi,

    So I have following in flink-conf.yml :
    //////////////////////////////////////////////////////
    metrics.reporter.prom.class:
    org.apache.flink.metrics.prometheus.PrometheusReporter
    metrics.reporter.prom.host: 127.0.0.1
    metrics.reporter.prom.port: 9999
    metrics.reporter.slf4j.class:
    org.apache.flink.metrics.slf4j.Slf4jReporter
    metrics.reporter.slf4j.interval: 30 SECONDS
    //////////////////////////////////////////////////////

    And while I can see custom metrics in Taskmanager logs, but
    prometheus dashboard logs doesn't show custom metrics.

    With regards

    On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler
    <[email protected] <mailto:[email protected]>> wrote:

        You have explicitly configured a reporter list, resulting in
        the slf4j reporter being ignored:

        2020-07-06 13:48:22,191 INFO
        org.apache.flink.configuration.GlobalConfiguration - Loading
        configuration property: metrics.reporters, prom
        2020-07-06 13:48:23,203 INFO
        org.apache.flink.runtime.metrics.ReporterSetup - Excluding
        reporter slf4j, not configured in reporter list (prom).

        Note that nowadays metrics.reporters is no longer required;
        the set of reporters is automatically determined based on
        configured properties; the only use-case is disabling a
        reporter without having to remove the entire configuration.
        I'd suggest to just remove the option, try again, and report
        back.

        On 06/07/2020 16:35, Chesnay Schepler wrote:

        Please enable debug logging and search for warnings from the
        metric groups/registry/reporter.

        If you cannot find anything suspicious, you can also send
        the foll log to me directly.

        On 06/07/2020 16:29, Manish G wrote:

        Job is an infinite streaming one, so it keeps going. Flink
        configuration is as:

        metrics.reporter.slf4j.class:
        org.apache.flink.metrics.slf4j.Slf4jReporter
        metrics.reporter.slf4j.interval: 30 SECONDS



        On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler
        <[email protected] <mailto:[email protected]>> wrote:

            How long did the job run for, and what is the
            configured interval?


            On 06/07/2020 15:51, Manish G wrote:

            Hi,

            Thanks for this.

            I did the configuration as mentioned at the
            link(changes in flink-conf.yml, copying the jar in lib
            directory), and registered the Meter with metrics
            group and invoked markEvent() method in the target
            code. But I don't see any related logs.
            I am doing this all on my local computer.

            Anything else I need to do?

            With regards
            Manish

            On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler
            <[email protected] <mailto:[email protected]>> wrote:

                Have you looked at the SLF4J reporter?

                
https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

                On 06/07/2020 13:49, Manish G wrote:
                > Hi,
                >
                > Is it possible to log Flink metrics in
                application logs apart from
                > publishing it to Prometheus?
                >
                > With regards

Re: Logging Flink metrics

Reply via email to