Hi Max, Good to know. Thanks very much.
Best Regards, Tony Wei 2017-10-24 13:52 GMT+08:00 Maximilian Bode <maximilian.b...@tngtech.com>: > Hi Tony, > > thanks for troubleshooting this. I have added a commit to > https://github.com/apache/flink/pull/4586 that should enable you to use > the reporter with 1.3.2 as well. > > Best regards, > Max > > Tony Wei <tony19920...@gmail.com> > 23. September 2017 um 13:11 > Hi Chesnay, > > I built another flink cluster using version 1.4, set the log level to > DEBUG, and I found that the root cause might be this exception: > *java.lang.NullPointerException: > Value returned by gauge lastCheckpointExternalPath was null*. > > I updated `CheckpointStatsTracker` to ignore external path when it is > null, and this exception didn't happen again. The prometheus reporter works > as well. > > I have created a Jira issue for it: https://issues.apache.org/ > jira/browse/FLINK-7675 <https://issues.apache.org/jira/browse/FLINK-7675.>, > and I will submit the PR after I passed Travis CI for my repository. > > Best Regards, > Tony Wei > > > > > Tony Wei <tony19920...@gmail.com> > 22. September 2017 um 16:20 > Hi Chesnay, > > I didn't try it in 1.4, so I have no idea if this also occurs in 1.4. > For my setting for logging, It have already set to INFO level, but there > wasn't any error or warning in log file as well. > > Best Regards, > Tony Wei > > > Chesnay Schepler <ches...@apache.org> > 22. September 2017 um 16:07 > The Prometheus reporter should work with 1.3.2. > > Does this also occur with the reporter that currently exists in 1.4? (to > rule out new bugs from the PR). > > To investigate this further, please set the logging level to WARN and try > again, as all errors in the metric system are logged on that level. > > On 22.09.2017 10:33, Tony Wei wrote: > > > Tony Wei <tony19920...@gmail.com> > 22. September 2017 um 10:33 > Hi, > > I have built the Prometheus reporter package from this PR > https://github.com/apache/flink/pull/4586, and used it on Flink 1.3.2 to > record every default metrics and those from `FlinkKafkaConsumer`. > > Originally, everything was fine. I could get those metrics in TM from > Prometheus just like I saw on Flink Web UI. > However, when I turned to JM, I found Prometheus gives this error to me: Get > http://localhost:9249/metrics: EOF. > I checked the log on JM and saw nothing in it. There was no error message > and 9249 port was still alive. > > To figure out what happened, I created another cluster and I found > Prometheus could connect to Flink cluster if there is no running job. After > JM triggered or completed the first checkpoint, Prometheus started getting > ERR_EMPTY_RESPONSE from JM, but not for TM. There was still no error in > log file and 9249 port was still alive. > > I was wondering where did the error occur. Flink or Prometheus reporter? > Or It is incorrect to use Prometheus reporter on Flink 1.3.2 ? Thank you. > > Best Regards, > Tony Wei > >