hello, we build flink report metrics to prometheus pushgateway, the program has been running for a period of time, with a amount of data reported to pushgateway, pushgateway response socket timeout exception, and much of metrics data reported failed. following is the exception:
2023-12-12 04:13:07,812 WARN org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter [] - Failed to push metrics to PushGateway with jobName 00034937_20231211200917_54ede15602bb8704c3a98ec481bea96, groupingKey{}. java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream. socketRead(Native Method) ~[?:1.8.0_281] at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0 281] at java.net.SocketInputStream.read(SocketInputStream. java:171) ~[?:1.8.0 281] at java.net.SocketInputStream.read(SocketInputStream. java:141) ~[?:1.8.0 2811 at java.io.BufferedInputStream.fill (BufferedInputStream. java:246) ~[?:1.8.0 2811 at java.io. BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_281] at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0 281] at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) ~[?:1.8.0_281] at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) ~[?:1.8.0_281] at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593) ~[?:1.8.0_281] at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) ~[?:1.8.0 2811 at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)~[?:1.8.0_281] at io.prometheus.client.exporter.PushGateway.doRequest(PushGateway.java:315)~[flink-metrics-prometheus-1.13.5.jar:1.13.5] at io.prometheus. client.exporter .PushGateway .push (PushGatevay . java:138) ~[flink-metrics-prometheus-1.13.5. jar:1.13.51 at org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter.report(PrometheusPushGatewayReporter.java:63) [flink-metrics-prometheus-1.13.5.jar:1.13.51 at org.apache. flink.runtime.metrics.MetricRegistryImp1$ReporterTask.run (MetricRegistryImpl. java:494) [flink-dist_2.11-1.13.5.jar:1.13.5] after test, it was caused with amount of data reported to pushgateway, then we restart pushgateway server and the exception disappeared, but after sever hours the exception re-emergenced. so i want to know how to config flink or pushgateway to avoid the exception? best regards. leilinee