Hi Gyula, I assumed it will only download at most 10GB and just start reading from > remote and the job should start up "immediately".
It won't start up immediately, instead it clips the state before running. This clipping process is primarily performed on the remote side. This may involve writing new state files, which could be cached on the local disk, but it should not exceed the 10GB limit. May I ask what checkpoint storage are you using? And would you please try to start the job without a rescale and see if it could start running immediately? And it would be great if you could provide some logs from the taskmanager during the restore. I suspect that state clipping may involve too much file rewriting affecting the speed. I'll do a similar experiment. Best, Zakelly On Fri, Apr 4, 2025 at 4:28 PM Gyula Fóra <gyula.f...@gmail.com> wrote: > Hi All! > > I am experimenting with the ForSt state backend on 2.0.0 and I noticed the > following thing. > > If I have a job with a larger state, let's say 500GB and now I want to > start the job with a lower parallelism on a single TaskManager, the job > will simply not start as the ForStIncrementalRestoreOperation tries to > download all states locally (there is not enough disk space) > > I have these configs: > > "state.backend.type": "forst" > "state.backend.forst.cache.size-based-limit": "10GB" > > I assumed it will only download at most 10GB and just start reading from > remote and the job should start up "immediately". > > What am I missing? > > Gyula >