Question about Flink's savepoint

Mu Kong Mon, 04 Sep 2017 18:06:35 -0700

Hi all,

I have some questions about the experience I had with the save point.
So, last night I found my flink cluster's memory usage seemed wired, so I
decided to


1. create a savepoint for the running job(there was only one job running at
the time)
2. and then cancel the job from web UI
3. and restart the cluster

and when I tried to resume the job with the savepoint, there was a
"Truncate did not truncate to right length. Should be 11757 is 56383."
exception.
Because there is also a savepoint being created every 4 a.m. in the
morning, so after I failed to run the job with the savepoint I created
before I canceled the job, I tried to use the 4 a.m. savepoint instead, and
it seemed to work well.

Then this morning, I noticed there is data lost for the time after I cancel
the job and before I resume the job.

I thought if I run the job with savepoint created in 4 a.m., it should
start to process data from 4 a.m., or I'm missing something here?

Also, I didn't add uid to the addSource() function, maybe when I restarted
the cluster the auto-generated id has been changed and that might be the
reason why the recovery didn't go well?

Question about Flink's savepoint

Reply via email to