Re: Tuning akka.ask.timeout

2022-01-24 Thread Guowei Ma
Hi After 1.14.0 I think Flink should work well even at the 1000*1000 scale + 10s akka.timeout in the deploy stage. So thank you for any further feedback after you investigate. BTW: I think you might look at https://issues.apache.org/jira/browse/FLINK-24295, which might cause the problem. Best, Gu

Re: Tuning akka.ask.timeout

2022-01-24 Thread Paul Lam
Hi Guowei, Thanks a lot for your reply. I’m using 1.14.0. The timeout happens at job deployment time. A subtask would run for a short period of `akka.ask.timeout` before fails due to the timeout. I noticed that jobmanager have a very hight CPU usage at the moment, like 2000%. I’m reasoning abo

Re: Tuning akka.ask.timeout

2022-01-24 Thread Paul Lam
Hi Zhilong, Thanks a lot for your very detailed answer! My setup: Flink 1.14.0 on YARN, jdk1.8_u202 The timeout happens at the job deployment stage. I checked GC logs, both JM and TM look good, but the CPU usage of JM could go up to 2000% for a short time (cgroups are not turned on). I’ve se

Re: Tuning akka.ask.timeout

2022-01-20 Thread Guowei Ma
Hi, Paul Would you like to share some information such as the Flink version you used and the memory of TM and JM. And when does the timeout happen? Such as at begin of the job or during the running of the job Best, Guowei On Thu, Jan 20, 2022 at 4:45 PM Paul Lam wrote: > Hi, > > I’m tuning a

Tuning akka.ask.timeout

2022-01-20 Thread Paul Lam
Hi, I’m tuning a Flink job with 1000+ parallelism, which frequently fails with Akka TimeOutException (it was fine with 200 parallelism). I see some posts recommend increasing `akka.ask.timeout` to 120s. I’m not familiar with Akka but it looks like a very long time compared to the default 10s