Re: map concurrent modification exception analysis when checkpoint

2021-08-23 Thread yidan zhao
The issue has been resolved, as I said in the previous email. It is caused by the async function, every record processed by the async function will be a state in the async operator, which is a map type(UserAccessLog). Arvid Heise 于2021年8月23日周一 下午11:26写道: > I don't see anything suspicious in your

Re: map concurrent modification exception analysis when checkpoint

2021-08-23 Thread Arvid Heise
I don't see anything suspicious in your code. The stacktrace is also for a MapSerializer. Do you have another operator where you put Map into a custom state? On Fri, Aug 20, 2021 at 6:43 PM yidan zhao wrote: > But, I do not know why this leads to the job's failure and recovery > since I have set

Re: map concurrent modification exception analysis when checkpoint

2021-08-20 Thread yidan zhao
But, I do not know why this leads to the job's failure and recovery since I have set the tolerable failed checkpoint to Integer.MAX_VALUE. Due to the failure, my task manager failed because of the task cancel timeout, and about 80% of task managers went down due to cancel timeout. yidan zhao 于202

Re: map concurrent modification exception analysis when checkpoint

2021-08-20 Thread yidan zhao
Ok, thanks. I have some result, and you can give some ensure. Here is the issue code: The async function's implementation. It do async redis query, and fill some data back. In code [ currentBatch.get(i).getD().put("ipLabel", objects.getResponses().get(i)); ] the getD() returns a map attr in Origin

Re: map concurrent modification exception analysis when checkpoint

2021-08-19 Thread Chesnay Schepler
Essentially this exception means that the state was modified while a snapshot was being taken. We usually see this when users hold on to some state value beyond a single call to a user-defined function, particularly from different threads. We may be able to pinpoint the issue if you were to p

map concurrent modification exception analysis when checkpoint

2021-08-19 Thread yidan zhao
Flink web ui shows the exception as follows. In the task (ual_transform_UserLogBlackUidJudger -> ual_transform_IpLabel ), the first one is a broadcast process function, and the second one is an async function. I do not know whether the issues have some relation to it. And the issues not occurred b