Ok, got it. I would try to do it manually. Thanks a lot for your inputs and efforts.
With regards On Mon, Jul 6, 2020 at 10:58 PM Chesnay Schepler <ches...@apache.org> wrote: > WSL is a bit buggy when it comes to allocating ports; it happily lets 2 > processes create sockets on the same port, except that the latter one > doesn't do anything. > Super annying, and I haven't found a solution to that myself yet. > > You'll have to configure the ports explicitly for the JM/TM, which will > likely entail manually starting the processes and updating the > configuration in-between, e.g.: > > ./bin/jobmanager.sh start > <update port in config> > ./bin/taskmanager.sh start > > On 06/07/2020 19:16, Manish G wrote: > > Yes. > > On Mon, Jul 6, 2020 at 10:43 PM Chesnay Schepler <ches...@apache.org> > wrote: > >> Are you running Flink is WSL by chance? >> >> On 06/07/2020 19:06, Manish G wrote: >> >> In flink-conf.yaml: >> *metrics.reporter.prom.port: 9250-9260* >> >> This is based on information provided here >> <https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter> >> *port - (optional) the port the Prometheus exporter listens on, defaults >> to 9249 >> <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>. >> In order to be able to run several instances of the reporter on one host >> (e.g. when one TaskManager is colocated with the JobManager) it is >> advisable to use a port range like 9250-9260.* >> >> As I am running flink locally, so both jobmanager and taskmanager are >> colocated. >> >> In prometheus.yml: >> >> >> >> >> *- job_name: 'flinkprometheus' scrape_interval: 5s >> static_configs: - targets: ['localhost:9250', 'localhost:9251'] >> metrics_path: /* >> >> This is the whole configuration I have done based on several tutorials >> and blogs available online. >> >> >> >> >> On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> These are all JobManager metrics; have you configured prometheus to also >>> scrape the task manager processes? >>> >>> On 06/07/2020 18:35, Manish G wrote: >>> >>> The metrics I see on prometheus is like: >>> >>> # HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp >>> lastCheckpointRestoreTimestamp (scope: jobmanager_job) >>> # TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge >>> flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} >>> -1.0 >>> # HELP flink_jobmanager_job_numberOfFailedCheckpoints >>> numberOfFailedCheckpoints (scope: jobmanager_job) >>> # TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge >>> flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} >>> 0.0 >>> # HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: >>> jobmanager_Status_JVM_Memory_Heap) >>> # TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge >>> flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9 >>> # HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count >>> Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep) >>> # TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge >>> flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} >>> 2.0 >>> # HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: >>> jobmanager_Status_JVM_CPU) >>> # TYPE flink_jobmanager_Status_JVM_CPU_Time gauge >>> flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9 >>> # HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity >>> TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct) >>> # TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge >>> flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} >>> 604064.0 >>> # HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: >>> jobmanager_job) >>> # TYPE flink_jobmanager_job_fullRestarts gauge >>> flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} >>> 0.0 >>> >>> >>> >>> >>> On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <ches...@apache.org> >>> wrote: >>> >>>> You've said elsewhere that you do see some metrics in prometheus, which >>>> are those? >>>> >>>> Why are you configuring the host for the prometheus reporter? This >>>> option is only for the PrometheusPushGatewayReporter. >>>> >>>> On 06/07/2020 18:01, Manish G wrote: >>>> >>>> Hi, >>>> >>>> So I have following in flink-conf.yml : >>>> ////////////////////////////////////////////////////// >>>> metrics.reporter.prom.class: >>>> org.apache.flink.metrics.prometheus.PrometheusReporter >>>> metrics.reporter.prom.host: 127.0.0.1 >>>> metrics.reporter.prom.port: 9999 >>>> metrics.reporter.slf4j.class: >>>> org.apache.flink.metrics.slf4j.Slf4jReporter >>>> metrics.reporter.slf4j.interval: 30 SECONDS >>>> ////////////////////////////////////////////////////// >>>> >>>> And while I can see custom metrics in Taskmanager logs, but prometheus >>>> dashboard logs doesn't show custom metrics. >>>> >>>> With regards >>>> >>>> On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <ches...@apache.org> >>>> wrote: >>>> >>>>> You have explicitly configured a reporter list, resulting in the slf4j >>>>> reporter being ignored: >>>>> >>>>> 2020-07-06 13:48:22,191 INFO >>>>> org.apache.flink.configuration.GlobalConfiguration - Loading >>>>> configuration property: metrics.reporters, prom >>>>> 2020-07-06 13:48:23,203 INFO >>>>> org.apache.flink.runtime.metrics.ReporterSetup - Excluding >>>>> reporter slf4j, not configured in reporter list (prom). >>>>> >>>>> Note that nowadays metrics.reporters is no longer required; the set >>>>> of reporters is automatically determined based on configured properties; >>>>> the only use-case is disabling a reporter without having to remove the >>>>> entire configuration. >>>>> I'd suggest to just remove the option, try again, and report back. >>>>> >>>>> On 06/07/2020 16:35, Chesnay Schepler wrote: >>>>> >>>>> Please enable debug logging and search for warnings from the metric >>>>> groups/registry/reporter. >>>>> >>>>> If you cannot find anything suspicious, you can also send the foll log >>>>> to me directly. >>>>> >>>>> On 06/07/2020 16:29, Manish G wrote: >>>>> >>>>> Job is an infinite streaming one, so it keeps going. Flink >>>>> configuration is as: >>>>> >>>>> metrics.reporter.slf4j.class: >>>>> org.apache.flink.metrics.slf4j.Slf4jReporter >>>>> metrics.reporter.slf4j.interval: 30 SECONDS >>>>> >>>>> >>>>> >>>>> On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <ches...@apache.org> >>>>> wrote: >>>>> >>>>>> How long did the job run for, and what is the configured interval? >>>>>> >>>>>> >>>>>> On 06/07/2020 15:51, Manish G wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Thanks for this. >>>>>> >>>>>> I did the configuration as mentioned at the link(changes in >>>>>> flink-conf.yml, copying the jar in lib directory), and registered the >>>>>> Meter >>>>>> with metrics group and invoked markEvent() method in the target code. >>>>>> But I >>>>>> don't see any related logs. >>>>>> I am doing this all on my local computer. >>>>>> >>>>>> Anything else I need to do? >>>>>> >>>>>> With regards >>>>>> Manish >>>>>> >>>>>> On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <ches...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Have you looked at the SLF4J reporter? >>>>>>> >>>>>>> >>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter >>>>>>> >>>>>>> On 06/07/2020 13:49, Manish G wrote: >>>>>>> > Hi, >>>>>>> > >>>>>>> > Is it possible to log Flink metrics in application logs apart from >>>>>>> > publishing it to Prometheus? >>>>>>> > >>>>>>> > With regards >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >