Hi Jing,
Thank you for your reply, that cluster is terminated and will provide the
log if it occurs again.
On Wed, Jun 2, 2021 at 11:17 AM JING ZHANG wrote:
> Hi Kai,
> The reason why job job cannot be recovered maybe not directly related to
> the exception you mentioned in your email.
> Would
Hi Kai,
The reason why job job cannot be recovered maybe not directly related to
the exception you mentioned in your email.
Would you like provide complete jobmanager.log and taskmanager.log. Maybe
we could find some hints there.
Best regards,
JING ZHANG
Kai Fu 于2021年6月2日周三 上午7:23写道:
> HI Till,
HI Till,
Thank you for your response, per my observation that the process lasted for
~1 day, and cannot be recovered and we killed the cluster finally.
On Tue, Jun 1, 2021 at 9:47 PM Till Rohrmann wrote:
> Hi Kai,
>
> The rejection you are seeing should not be serious. The way this can
> happen
Hi Kai,
The rejection you are seeing should not be serious. The way this can happen
is the following: If Yarn restarts the application master, Flink will try
to recover previously started containers. If this is not possible or Yarn
only tells about a subset of the previously allocated containers,
Hi team,
We encountered an issue during recovery from checkpoint. It's recovering
because the downstream Kafka sink is full for a while and the job is failed
and keeps trying to recover(The downstream is full for about 4 hours). The
job cannot recover from checkpoint successfully even if after we