Hi
After 1.14.0 I think Flink should work well even at the 1000*1000 scale +
10s akka.timeout in the deploy stage.
So thank you for any further feedback after you investigate.
BTW: I think you might look at
https://issues.apache.org/jira/browse/FLINK-24295, which might cause the
problem.
Best,
Gu
Hi Guowei,
Thanks a lot for your reply.
I’m using 1.14.0. The timeout happens at job deployment time. A subtask would
run for a short period of `akka.ask.timeout` before fails due to the timeout.
I noticed that jobmanager have a very hight CPU usage at the moment, like
2000%. I’m reasoning abo
Hi Zhilong,
Thanks a lot for your very detailed answer!
My setup: Flink 1.14.0 on YARN, jdk1.8_u202
The timeout happens at the job deployment stage. I checked GC logs, both JM and
TM look good, but the CPU usage of JM could go up to 2000% for a short time
(cgroups are not turned on).
I’ve se
Hi, Paul
Would you like to share some information such as the Flink version you used
and the memory of TM and JM.
And when does the timeout happen? Such as at begin of the job or during the
running of the job
Best,
Guowei
On Thu, Jan 20, 2022 at 4:45 PM Paul Lam wrote:
> Hi,
>
> I’m tuning a
Hi,
I’m tuning a Flink job with 1000+ parallelism, which frequently fails with Akka
TimeOutException (it was fine with 200 parallelism).
I see some posts recommend increasing `akka.ask.timeout` to 120s. I’m not
familiar with Akka but it looks like a very long time compared to the default
10s