Re: Slow flink checkpoint

2018-04-01 Thread makeyang
I have put a lot of efforts on this issue and try to resolve it: 1. let me describe current timers' snapshot path first: a) for each keygroup, invoke InternalTimeServiceManager.snapshotStateForKeyGroup b) InternalTimeServiceManager create a InternalTimerServiceSerializationProxy to write sn

sharebuffer prune code

2018-04-01 Thread aitozi
Hi, i am running into a cep bug : it always running into failed to find previous sharebufferEntry, i think it may be caused by prune the sharebufferEntry wrongly, but when i read the code, i cant understand this : https://gist.github.com/Aitozi/007210bc7ade01a81f8d0fc4ba5a2c99 why when encounted

Restore from a savepoint is very slow

2018-04-01 Thread Dongwon Kim
Hi, While restoring from the latest checkpoint starts immediately after the job is restarted, restoring from a savepoint takes more than five minutes until the job makes progress. During the blackout, I cannot observe any resource usage over the cluster. After that period of time, I observe that

Re: Restore from a savepoint is very slow

2018-04-01 Thread Dongwon Kim
Attached is a log file from a taskmanager. Please take a look at the log file considering the below events: - Around 01:10:47 : the job is submitted to the job manager. - Around 01:16:30 : suddenly source starts to read from and sink starts to write data to Kafka Any help would be greatly appreci