Re: Get EOF from PrometheusReporter in JM

Tony Wei Mon, 23 Oct 2017 23:12:50 -0700

Hi Max,

Good to know. Thanks very much.


Best Regards,
Tony Wei

2017-10-24 13:52 GMT+08:00 Maximilian Bode <maximilian.b...@tngtech.com>:

> Hi Tony,
>
> thanks for troubleshooting this. I have added a commit to
> https://github.com/apache/flink/pull/4586 that should enable you to use
> the reporter with 1.3.2 as well.
>
> Best regards,
> Max
>
> Tony Wei <tony19920...@gmail.com>
> 23. September 2017 um 13:11
> Hi Chesnay,
>
> I built another flink cluster using version 1.4, set the log level to
> DEBUG, and I found that the root cause might be this exception: 
> *java.lang.NullPointerException:
> Value returned by gauge lastCheckpointExternalPath was null*.
>
> I updated `CheckpointStatsTracker` to ignore external path when it is
> null, and this exception didn't happen again. The prometheus reporter works
> as well.
>
> I have created a Jira issue for it: https://issues.apache.org/
> jira/browse/FLINK-7675 <https://issues.apache.org/jira/browse/FLINK-7675.>,
> and I will submit the PR after I passed Travis CI for my repository.
>
> Best Regards,
> Tony Wei
>
>
>
>
> Tony Wei <tony19920...@gmail.com>
> 22. September 2017 um 16:20
> Hi Chesnay,
>
> I didn't try it in 1.4, so I have no idea if this also occurs in 1.4.
> For my setting for logging, It have already set to INFO level, but there
> wasn't any error or warning in log file as well.
>
> Best Regards,
> Tony Wei
>
>
> Chesnay Schepler <ches...@apache.org>
> 22. September 2017 um 16:07
> The Prometheus reporter should work with 1.3.2.
>
> Does this also occur with the reporter that currently exists in 1.4? (to
> rule out new bugs from the PR).
>
> To investigate this further, please set the logging level to WARN and try
> again, as all errors in the metric system are logged on that level.
>
> On 22.09.2017 10:33, Tony Wei wrote:
>
>
> Tony Wei <tony19920...@gmail.com>
> 22. September 2017 um 10:33
> Hi,
>
> I have built the Prometheus reporter package from this PR
> https://github.com/apache/flink/pull/4586, and used it on Flink 1.3.2 to
> record every default metrics and those from `FlinkKafkaConsumer`.
>
> Originally, everything was fine. I could get those metrics in TM from
> Prometheus just like I saw on Flink Web UI.
> However, when I turned to JM, I found Prometheus gives this error to me: Get
> http://localhost:9249/metrics: EOF.
> I checked the log on JM and saw nothing in it. There was no error message
> and 9249 port was still alive.
>
> To figure out what happened, I created another cluster and I found
> Prometheus could connect to Flink cluster if there is no running job. After
> JM triggered or completed the first checkpoint, Prometheus started getting
> ERR_EMPTY_RESPONSE from JM, but not for TM. There was still no error in
> log file and 9249 port was still alive.
>
> I was wondering where did the error occur. Flink or Prometheus reporter?
> Or It is incorrect to use Prometheus reporter on Flink 1.3.2 ? Thank you.
>
> Best Regards,
> Tony Wei
>
>

Re: Get EOF from PrometheusReporter in JM

Reply via email to