Re: Custom Prometheus metrics disappeared in 1.16.2 => 1.17.1 upgrade

Javier Vegas Mon, 04 Dec 2023 23:20:28 -0800

Reason is simple, I migrated to Flink a project that already had
Prometheus metrics integrated.


Thanks,

Javier

El mar, 3 oct 2023 a las 15:51, Mason Chen (<mas.chen6...@gmail.com>) escribió:
>
> Hi Javier,
>
> Is there a particular reason why you aren't leveraging Flink metric API? It 
> seems that functionality was internal to the PrometheusReporter 
> implementation and your usecase should've continued working if it had 
> depended on Flink's  metric API.
>
> Best,
> Mason
>
> On Thu, Sep 28, 2023 at 2:51 AM Javier Vegas <jve...@strava.com> wrote:
>>
>> Thanks! I saw the first change but missed the third one, that is the
>> most that most probably explains my problem, most probably the metrics
>> I was sending with the twitter/finagle statsReceiver ended up in the
>> singleton default registry and were exposed by Flink with all the
>> other Flink metrics, but now that Flink uses its own registry I have
>> no idea where my custom metrics end up
>>
>>
>> El mié, 27 sept 2023 a las 4:56, Kenan Kılıçtepe
>> (<kkilict...@gmail.com>) escribió:
>> >
>> > Have you checked the metric  changes in 1.17.
>> >
>> > From release notes 1.17:
>> > https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-1.17/
>> >
>> > Metric Reporters #
>> > Only support reporter factories for instantiation #
>> > FLINK-24235 #
>> > Configuring reporters by their class is no longer supported. Reporter 
>> > implementations must provide a MetricReporterFactory, and all 
>> > configurations must be migrated to such a factory.
>> >
>> > UseLogicalIdentifier makes datadog consider metric as custom #
>> > FLINK-30383 #
>> > The Datadog reporter now adds a “flink.” prefix to metric identifiers if 
>> > “useLogicalIdentifier” is enabled. This is required for these metrics to 
>> > be recognized as Flink metrics, not custom ones.
>> >
>> > Use separate Prometheus CollectorRegistries #
>> > FLINK-30020 #
>> > The PrometheusReporters now use a separate CollectorRegistry for each 
>> > reporter instance instead of the singleton default registry. This 
>> > generally shouldn’t impact setups, but it may break code that indirectly 
>> > interacts with the reporter via the singleton instance (e.g., a test 
>> > trying to assert what metrics are reported).
>> >
>> >
>> >
>> > On Wed, Sep 27, 2023 at 11:11 AM Javier Vegas <jve...@strava.com> wrote:
>> >>
>> >> I implemented some custom Prometheus metrics that were working on
>> >> 1.16.2, with my configuration
>> >>
>> >> metrics.reporter.prom.factory.class:
>> >> org.apache.flink.metrics.prometheus.PrometheusReporterFactory
>> >> metrics.reporter.prom.port: 9999
>> >>
>> >> I could see both Flink metrics and my custom metrics on port 9999 of
>> >> my task managers
>> >>
>> >> After upgrading to 1.17.1, using the same configuration, I can see
>> >> only the FLink metrics on port 9999 of the task managers,
>> >> the custom metrics are getting lost somewhere.
>> >>
>> >> The release notes for 1.17 mention
>> >> https://issues.apache.org/jira/browse/FLINK-24235
>> >> that removes instantiating reporters by name and forces using a
>> >> factory, which I was already doing in 1.16.2. Do I need to do
>> >> anything extra after those changes so my metrics are aggregated with
>> >> the Flink ones?
>> >>
>> >> I am also seeing this error message on application startup (which I
>> >> was already seeing in 1.16.2): "Multiple implementations of the same
>> >> reporter were found in 'lib' and/or 'plugins' directories for
>> >> org.apache.flink.metrics.prometheus.PrometheusReporterFactory. It is
>> >> recommended to remove redundant reporter JARs to resolve used
>> >> versions' ambiguity." Could that also explain the missing metrics?
>> >>
>> >> Thanks,
>> >>
>> >> Javier Vegas

Re: Custom Prometheus metrics disappeared in 1.16.2 => 1.17.1 upgrade

Reply via email to