Hi Juha and Chesnay, I do appreciate your prompt responses! I'll also continue to investigate this issue.
Best, Xingcan On Wed, Jan 27, 2021, 04:32 Chesnay Schepler <ches...@apache.org> wrote: > (setting this field is currently not possible from a Flink user > perspective; it is something I will investigate) > > > On 1/27/2021 10:30 AM, Chesnay Schepler wrote: > > Yes, I could see how the memory issue can occur. > > However, it should be limited to buffering 64 requests; this is the > default limit that okhttp imposes on concurrent calls. > Maybe lowering this value already does the trick. > > On 1/27/2021 5:52 AM, Xingcan Cui wrote: > > Hi all, > > Recently, I tried to use the Datadog reporter to collect some user-defined > metrics. Sometimes when reaching traffic peaks (which are also peaks for > metrics), the HTTP client will throw the following exception: > > ``` > [OkHttp https://app.datadoghq.com/...] WARN > org.apache.flink.metrics.datadog.DatadogHttpClient - Failed sending > request to Datadog > java.net.SocketTimeoutException: timeout > at > okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:593) > at > okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:601) > at > okhttp3.internal.http2.Http2Stream.takeResponseHeaders(Http2Stream.java:146) > at > okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:120) > at > okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > ``` > > I guess this may be caused by the rate limit of the Datadog server since > too many HTTP requests look like a kind of "attack". The real problem is > that after throwing the above exceptions, the JVM heap size of the > taskmanager starts to increase and finally causes OOM. I'm curious if this > may be caused by metrics accumulation, i.e., for some reason, the client > can't reconnect to the Datadog server and send the metrics so that the > metrics data is buffered in memory and causes OOM. > > I'm running Flink 1.11.2 on EMR-6.2.0 with > flink-metrics-datadog-1.11.2.jar. > > Thanks, > Xingcan > > > >