Hi,

One possible direction is to check your YARN log or TM log to see if the YARN 
RM kills the TM for some reason(e.g. physical memory is over limit) and as a 
result, the JM will try to recover the TM repeatedly according to your restart 
strategy.
The snippet of JM logs you provide is usually not the root cause.

Best,
Biao Geng

From: SmileSmile <a511955...@163.com>
Date: Monday, July 18, 2022 at 8:46 PM
To: user <user@flink.apache.org>
Subject: flink on yarn job always restart
hi all
we meet a situation, parallelism 3000,the job contains multiple agg 
operation,the job recover from checkpoint or savepoint must be unrecoverable, 
the job restarts repeatedly
jm error logorg.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - 
RECEIVED S
IGNAL 15: SIGTERM. Shutting down as requested.
flink version 1.14.5
Have any good ideas for troubleshooting?




Reply via email to