Glad to hear that your job data was not lost!
Cheers,
Till
On Tue, Sep 29, 2020 at 7:28 PM Paul Lam wrote:
> Hi Till,
>
> Thanks a lot for the pointer! I tried to restore the job using the
> savepoint in a dry run, and it worked!
>
> Guess I've misunderstood the configuration option, and confus
Hi Till,
Thanks a lot for the pointer! I tried to restore the job using the
savepoint in a dry run, and it worked!
Guess I've misunderstood the configuration option, and confused by the
non-existent paths that the metadata contains.
Best,
Paul Lam
Till Rohrmann 于2020年9月29日周二 下午10:30写道:
> Than
Thanks for sharing the logs with me. It looks as if the total size of the
savepoint is 335kb for a job with a parallelism of 60 and a total of 120
tasks. Hence, the average size of a state per task is between 2.5kb - 5kb.
I think that the state size threshold refers to the size of the per task
stat
Hi Paul,
could you share with us the logs of the JobManager? They might help to
better understand in which order each operation occurred.
How big are you expecting the size of the state to be? If it is smaller
than state.backend.fs.memory-threshold, then the state data will be stored
in the _meta
Hi,
We have a Flink job that was stopped erroneously with no available
checkpoint/savepoint to restore,
and are looking for some help to narrow down the problem.
How we ran into this problem:
We stopped the job using cancel with savepoint command (for compatibility
issue), but the command
tim