Hi, could this be another symptom of this issue: https://issues.apache.org/jira/browse/FLINK-16611?
I guess you'll have to ask DataDog to check at their end, maybe you are running into some rate limit there? On Fri, Jun 26, 2020 at 5:42 PM seeksst <seek...@163.com> wrote: > > > 原始邮件 > *发件人:* seeksst<seek...@163.com> > *收件人:* Fanbin Bu<fanbin...@coinbase.com> > *发送时间:* 2020年6月26日(周五) 23:36 > *主题:* Re: datadog failed to send report > > Hi, I’m sorry for not explaining it clearly and misread the exception. > > log4j.logger.org.apache.flink.metrics.datadog.DatadogHttpClient=ERROR > > log4j.logger.org.apache.flink.runtime.metrics will not work on flink.metrics, > it effect on flink.runtime.metrics。 > > > if it does work again, you can see that there are many log profiles in the > folder /conf. > > Modifying config is helpful to control the log output. If it doesn’t > work,may be log4j.properties is not being used. > > You can read this artical for answers[1]. If you’re still not sure, you > can change all. A more granular configuration is recommended. > > > > I’m not familiar with datadog (I use influxdb to collect metrics). but i > think if it can collect metrics, and network is not a problem, the > bottleneck may be processing the request but not sure. SocketTimeoutException > can occur in serveral situations: > > 1.the network is down > > you think the network is ok > > 2.server processing is slow > > datadog may deal many requests, and can’t answer fast. > > you can check cpu usage of the datadog machine. Sometimes it depends on > the program, if it use one thread deal all request(this is something that i > don’t know about datadog).if cup usage is high, this may be reason, if not, > need know about datadog. > > 3.slow network transmission > > you need check network,whether the network traffic is full or the machine > physical location is far away. > > you can also find ways to adjust the timeout. > > 4.your job frequently triggered full gc. > > you can check gc log, this need to edit flink-conf.yml > > something like : env.java.opts.taskmanager: > -Xloggc:<LOG_DIR>/taskmanager-gc.log > > Best wish to you. > > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/logging.html > > 原始邮件 > *发件人:* Fanbin Bu<fanbin...@coinbase.com> > *收件人:* seeksst<seek...@163.com> > *发送时间:* 2020年6月26日(周五) 05:38 > *主题:* Re: datadog failed to send report > > this does not help. > > log4j.logger.org.apache.flink.runtime.metrics=ERROR > > > i believe all machines can telnet datadog port since there are other metrics > reported correctly. > > how do i check the number of requests capacity? > > > On Tue, Jun 23, 2020 at 11:32 PM seeksst <seek...@163.com> wrote: > >> Hi, >> >> >> If you don’t care about losing some metrics, you can edit >> log4j.properties to ignore it. >> >> log4j.logger.org.apache.flink.runtime.metrics=ERROR >> >> BTW, Whether all machines can telnet datadog port? >> >> Whether the number of requests exceeds the datadog's processing capacity? >> >> >> 原始邮件 >> *发件人:* Fanbin Bu<fanbin...@coinbase.com> >> *收件人:* user<user@flink.apache.org> >> *发送时间:* 2020年6月24日(周三) 12:05 >> *主题:* datadog failed to send report >> >> Hi, >> >> Does any have any idea on the following error msg: (it flooded my task >> manager log) >> I do have datadog metrics present so this is probably only happens for >> some metrics. >> >> 2020-06-24 03:27:15,362 WARN >> org.apache.flink.metrics.datadog.DatadogHttpClient - Failed >> sending request to Datadog >> java.net.SocketTimeoutException: timeout >> at >> org.apache.flink.shaded.okio.Okio$4.newTimeoutException(Okio.java:227) >> at org.apache.flink.shaded.okio.AsyncTimeout.exit(AsyncTimeout.java:284) >> at >> org.apache.flink.shaded.okio.AsyncTimeout$2.read(AsyncTimeout.java:240) >> at >> org.apache.flink.shaded.okio.RealBufferedSource.indexOf(RealBufferedSource.java:344) >> at >> org.apache.flink.shaded.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:216) >> at >> org.apache.flink.shaded.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:210) >> at >> org.apache.flink.shaded.okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) >> at >> org.apache.flink.shaded.okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75) >> at >> org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) >> at >> org.apache.flink.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) >> at >> org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) >> at >> org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) >> at >> org.apache.flink.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) >> at >> org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) >> at >> org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) >> at >> org.apache.flink.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) >> at >> org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) >> at >> org.apache.flink.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) >> at >> org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) >> at >> org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) >> at >> org.apache.flink.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) >> at >> org.apache.flink.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) >> at >> org.apache.flink.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: java.net.SocketException: Socket closed >> at java.net.SocketInputStream.read(SocketInputStream.java:204) >> at java.net.SocketInputStream.read(SocketInputStream.java:141) >> at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) >> at sun.security.ssl.InputRecord.read(InputRecord.java:503) >> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975) >> at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933) >> at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) >> at org.apache.flink.shaded.okio.Okio$2.read(Okio.java:138) >> at >> org.apache.flink.shaded.okio.AsyncTimeout$2.read(AsyncTimeout.java:236) >> ... 23 more >> >>