Hi!
This job is definitely using the old , sync data access.
Where is this limitation mentioned in the docs? It sounds a bit strange that a fundamental behavior of the state backend depends on this. I assumed without the new async api it would be slower but the general characteristics of remote storage would remain the same.
Thanks Gyula Sent from my iPhone On 4 Apr 2025, at 13:24, Zakelly Lan <zakelly....@gmail.com> wrote:
Hi Gyula,
It seems the ForSt is downloading even for a no-rescale start.
It came to me that there is a limitation: the ForSt won't store state files on remote if the synchronous state APIs are using. So is it a datastream job using old state APIs (not state V2), or is it a SQL job without asynchronous state support (listed in [1]). Would you please check the taskmanager log and see if there is 'ForStSync' showing, which means ForSt is running in sync mode with pure local state.
Best, Zakelly This is the flamegrapgh during the no-rescale restart. I couldnt attach it for the mailing list
Hi Gyula,
I assumed it will only download at most 10GB and just start reading from remote and the job should start up "immediately".
It won't start up immediately, instead it clips the state before running. This clipping process is primarily performed on the remote side. This may involve writing new state files, which could be cached on the local disk, but it should not exceed the 10GB limit.
May I ask what checkpoint storage are you using? And would you please try to start the job without a rescale and see if it could start running immediately? And it would be great if you could provide some logs from the taskmanager during the restore. I suspect that state clipping may involve too much file rewriting affecting the speed. I'll do a similar experiment.
Best, Zakelly
Hi All!
I am experimenting with the ForSt state backend on 2.0.0 and I noticed the following thing.
If I have a job with a larger state, let's say 500GB and now I want to start the job with a lower parallelism on a single TaskManager, the job will simply not start as the ForStIncrementalRestoreOperation tries to download all states locally (there is not enough disk space)
I have these configs: "state.backend.type": "forst" "state.backend.forst.cache.size-based-limit": "10GB"
I assumed it will only download at most 10GB and just start reading from remote and the job should start up "immediately".
What am I missing?
Gyula
|