[
https://issues.apache.org/jira/browse/FLINK-38704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063514#comment-18063514
]
Mukul Gupta edited comment on FLINK-38704 at 3/6/26 11:44 AM:
--------------------------------------------------------------
This issue is present in Flink 1.19+. Earlier versions were not tested.
The issue occurs because numeric YAML configuration values are stored as
Integer/Long objects, but {{Properties.getProperty()}} only returns Strings.
When {{MetricConfig.getString()}} is called for numeric values like
{{{}metrics.reporter.prom.port{}}}, it returns null, causing reporters to fall
back to default values.
*Note:* The bug only reproduces when a single number is assigned to the port
(e.g., {{{}9999{}}}). It works correctly when a port-range is used (e.g.,
{{{}9000-9100{}}}) because ranges are stored as Strings.
*Safety:* This change is safe and non-breaking. Properties are inherently
String-based (all values are stored and retrieved as Strings), so converting
values to String at insertion time maintains the expected behavior while fixing
the type mismatch issue.
{{Either code changes needs to be done as per pull request or documentation
needs to be corrected}}
*Workaround:* Quote the port value in YAML configuration:
{{metrics.reporter.prom.port: "9999"}}
was (Author: JIRAUSER312410):
This issue is present in Flink 1.19+. Earlier versions were not tested.
The issue occurs because numeric YAML configuration values are stored as
Integer/Long objects, but {{Properties.getProperty()}} only returns Strings.
When {{MetricConfig.getString()}} is called for numeric values like
{{{}metrics.reporter.prom.port{}}}, it returns null, causing reporters to fall
back to default values.
*Note:* The bug only reproduces when a single number is assigned to the port
(e.g., {{{}9999{}}}). It works correctly when a port-range is used (e.g.,
{{{}9000-9100{}}}) because ranges are stored as Strings.
*Safety:* This change is safe and non-breaking. Properties are inherently
String-based (all values are stored and retrieved as Strings), so converting
values to String at insertion time maintains the expected behavior while fixing
the type mismatch issue.
{{Either code changes needs to be done as per pull request or documentation
needs to be corrected}}
*Workaround:* Quote the port value in YAML configuration:
{{metrics.reporter.prom.port: "9999"}}
h5. {{}}
> Metrics reporter setup does not load Prometheus with correct configs/port
> -------------------------------------------------------------------------
>
> Key: FLINK-38704
> URL: https://issues.apache.org/jira/browse/FLINK-38704
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Metrics
> Affects Versions: 2.0.1, 2.2.0, 2.1.1
> Reporter: Mohsen Rezaei
> Priority: Major
> Labels: pull-request-available
>
> Something that was working in 1.x releases, but it doesn't load the correct
> config in 2.x.
> Runtime Flink configurations loaded:
> {code:java}
> 2025-11-20 04:33:51.737 [main] INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading configuration
> property: metrics.reporter.prom.port, 9999
> 2025-11-20 04:33:51.738 [main] INFO
> org.apache.flink.configuration.GlobalConfiguration - Loading configuration
> property: metrics.reporter.prom.factory.class,
> org.apache.flink.metrics.prometheus.PrometheusReporterFactory
> {code}
> But the reporter setup [loads the default
> port|https://github.com/apache/flink/blob/45ab6c816465e717d0eef2ad6672cbb0c1a73a7e/flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusReporterFactory.java#L33]
> {code:java}
> 2025-11-20 04:33:55.520 [main] INFO
> org.apache.flink.metrics.prometheus.PrometheusReporter - Started
> PrometheusReporter HTTP server on port 9249.
> {code}
> and only vending metrics from 9249:
> {code:java}
> flink@jm-0:~$ curl localhost:9999/metrics
> curl: (7) Failed to connect to localhost port 9999 after 0 ms: Couldn't
> connect to server
> flink@jm-0:~$ curl localhost:9249/metrics
> # HELP flink_jobmanager_Status_JVM_GarbageCollector_Copy_TimeMsPerSecond
> TimeMsPerSecond (scope: jobmanager_Status_JVM_GarbageCollector_Copy)
> # TYPE flink_jobmanager_Status_JVM_GarbageCollector_Copy_TimeMsPerSecond gauge
> flink_jobmanager_Status_JVM_GarbageCollector_Copy_TimeMsPerSecond{host="10_155_60_8",}
> 0.0
> ...
> {code}
> This is potentially affecting all the reporters loaded via their factory in
> {{{}ReporterSetup{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)