Re: Error about "Rejected TaskExecutor registration at the ResourceManger"

2021-06-01 Thread Kai Fu
Hi Jing, Thank you for your reply, that cluster is terminated and will provide the log if it occurs again. On Wed, Jun 2, 2021 at 11:17 AM JING ZHANG wrote: > Hi Kai, > The reason why job job cannot be recovered maybe not directly related to > the exception you mentioned in your email. > Would

Re: Error about "Rejected TaskExecutor registration at the ResourceManger"

2021-06-01 Thread JING ZHANG
Hi Kai, The reason why job job cannot be recovered maybe not directly related to the exception you mentioned in your email. Would you like provide complete jobmanager.log and taskmanager.log. Maybe we could find some hints there. Best regards, JING ZHANG Kai Fu 于2021年6月2日周三 上午7:23写道: > HI Till,

Re: Error about "Rejected TaskExecutor registration at the ResourceManger"

2021-06-01 Thread Kai Fu
HI Till, Thank you for your response, per my observation that the process lasted for ~1 day, and cannot be recovered and we killed the cluster finally. On Tue, Jun 1, 2021 at 9:47 PM Till Rohrmann wrote: > Hi Kai, > > The rejection you are seeing should not be serious. The way this can > happen

Re: Error about "Rejected TaskExecutor registration at the ResourceManger"

2021-06-01 Thread Till Rohrmann
Hi Kai, The rejection you are seeing should not be serious. The way this can happen is the following: If Yarn restarts the application master, Flink will try to recover previously started containers. If this is not possible or Yarn only tells about a subset of the previously allocated containers,

Error about "Rejected TaskExecutor registration at the ResourceManger"

2021-05-29 Thread Kai Fu
Hi team, We encountered an issue during recovery from checkpoint. It's recovering because the downstream Kafka sink is full for a while and the job is failed and keeps trying to recover(The downstream is full for about 4 hours). The job cannot recover from checkpoint successfully even if after we