Re: Logging Flink metrics

Manish G Mon, 06 Jul 2020 10:34:08 -0700

Ok, got it.
I would try to do it manually.

Thanks a lot for your inputs and efforts.


With regards

On Mon, Jul 6, 2020 at 10:58 PM Chesnay Schepler <ches...@apache.org> wrote:

> WSL is a bit buggy when it comes to allocating ports; it happily lets 2
> processes create sockets on the same port, except that the latter one
> doesn't do anything.
> Super annying, and I haven't found a solution to that myself yet.
>
> You'll have to configure the ports explicitly for the JM/TM, which will
> likely entail manually starting the processes and updating the
> configuration in-between, e.g.:
>
> ./bin/jobmanager.sh start
> <update port in config>
> ./bin/taskmanager.sh start
>
> On 06/07/2020 19:16, Manish G wrote:
>
> Yes.
>
> On Mon, Jul 6, 2020 at 10:43 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> Are you running Flink is WSL by chance?
>>
>> On 06/07/2020 19:06, Manish G wrote:
>>
>> In flink-conf.yaml:
>> *metrics.reporter.prom.port: 9250-9260*
>>
>> This is based on information provided here
>> <https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter>
>> *port - (optional) the port the Prometheus exporter listens on, defaults
>> to 9249
>> <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>.
>> In order to be able to run several instances of the reporter on one host
>> (e.g. when one TaskManager is colocated with the JobManager) it is
>> advisable to use a port range like 9250-9260.*
>>
>> As I am running flink locally, so both jobmanager and taskmanager are
>> colocated.
>>
>> In prometheus.yml:
>>
>>
>>
>>
>> *- job_name: 'flinkprometheus'     scrape_interval: 5s
>> static_configs:       - targets: ['localhost:9250', 'localhost:9251']
>> metrics_path: /*
>>
>> This is the whole configuration I have done based on several tutorials
>> and blogs available online.
>>
>>
>>
>>
>> On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <ches...@apache.org>
>> wrote:
>>
>>> These are all JobManager metrics; have you configured prometheus to also
>>> scrape the task manager processes?
>>>
>>> On 06/07/2020 18:35, Manish G wrote:
>>>
>>> The metrics I see on prometheus is like:
>>>
>>> # HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp 
>>> lastCheckpointRestoreTimestamp (scope: jobmanager_job)
>>> # TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
>>> flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
>>>  -1.0
>>> # HELP flink_jobmanager_job_numberOfFailedCheckpoints 
>>> numberOfFailedCheckpoints (scope: jobmanager_job)
>>> # TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
>>> flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
>>>  0.0
>>> # HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: 
>>> jobmanager_Status_JVM_Memory_Heap)
>>> # TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
>>> flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
>>> # HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count 
>>> Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
>>> # TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
>>> flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",}
>>>  2.0
>>> # HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: 
>>> jobmanager_Status_JVM_CPU)
>>> # TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
>>> flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
>>> # HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity 
>>> TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
>>> # TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
>>> flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 
>>> 604064.0
>>> # HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: 
>>> jobmanager_job)
>>> # TYPE flink_jobmanager_job_fullRestarts gauge
>>> flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",}
>>>  0.0
>>>
>>>
>>>
>>>
>>> On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <ches...@apache.org>
>>> wrote:
>>>
>>>> You've said elsewhere that you do see some metrics in prometheus, which
>>>> are those?
>>>>
>>>> Why are you configuring the host for the prometheus reporter? This
>>>> option is only for the PrometheusPushGatewayReporter.
>>>>
>>>> On 06/07/2020 18:01, Manish G wrote:
>>>>
>>>> Hi,
>>>>
>>>> So I have following in flink-conf.yml :
>>>> //////////////////////////////////////////////////////
>>>> metrics.reporter.prom.class:
>>>> org.apache.flink.metrics.prometheus.PrometheusReporter
>>>> metrics.reporter.prom.host: 127.0.0.1
>>>> metrics.reporter.prom.port: 9999
>>>> metrics.reporter.slf4j.class:
>>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>> //////////////////////////////////////////////////////
>>>>
>>>> And while I can see custom metrics in Taskmanager logs, but prometheus
>>>> dashboard logs doesn't show custom metrics.
>>>>
>>>> With regards
>>>>
>>>> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <ches...@apache.org>
>>>> wrote:
>>>>
>>>>> You have explicitly configured a reporter list, resulting in the slf4j
>>>>> reporter being ignored:
>>>>>
>>>>> 2020-07-06 13:48:22,191 INFO
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>>> configuration property: metrics.reporters, prom
>>>>> 2020-07-06 13:48:23,203 INFO
>>>>> org.apache.flink.runtime.metrics.ReporterSetup                - Excluding
>>>>> reporter slf4j, not configured in reporter list (prom).
>>>>>
>>>>> Note that nowadays metrics.reporters is no longer required; the set
>>>>> of reporters is automatically determined based on configured properties;
>>>>> the only use-case is disabling a reporter without having to remove the
>>>>> entire configuration.
>>>>> I'd suggest to just remove the option, try again, and report back.
>>>>>
>>>>> On 06/07/2020 16:35, Chesnay Schepler wrote:
>>>>>
>>>>> Please enable debug logging and search for warnings from the metric
>>>>> groups/registry/reporter.
>>>>>
>>>>> If you cannot find anything suspicious, you can also send the foll log
>>>>> to me directly.
>>>>>
>>>>> On 06/07/2020 16:29, Manish G wrote:
>>>>>
>>>>> Job is an infinite streaming one, so it keeps going. Flink
>>>>> configuration is as:
>>>>>
>>>>> metrics.reporter.slf4j.class:
>>>>> org.apache.flink.metrics.slf4j.Slf4jReporter
>>>>> metrics.reporter.slf4j.interval: 30 SECONDS
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <ches...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> How long did the job run for, and what is the configured interval?
>>>>>>
>>>>>>
>>>>>> On 06/07/2020 15:51, Manish G wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Thanks for this.
>>>>>>
>>>>>> I did the configuration as mentioned at the link(changes in
>>>>>> flink-conf.yml, copying the jar in lib directory), and registered the 
>>>>>> Meter
>>>>>> with metrics group and invoked markEvent() method in the target code. 
>>>>>> But I
>>>>>> don't see any related logs.
>>>>>> I am doing this all on my local computer.
>>>>>>
>>>>>> Anything else I need to do?
>>>>>>
>>>>>> With regards
>>>>>> Manish
>>>>>>
>>>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <ches...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Have you looked at the SLF4J reporter?
>>>>>>>
>>>>>>>
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter
>>>>>>>
>>>>>>> On 06/07/2020 13:49, Manish G wrote:
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > Is it possible to log Flink metrics in application logs apart from
>>>>>>> > publishing it to Prometheus?
>>>>>>> >
>>>>>>> > With regards
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Logging Flink metrics

Reply via email to