Hi Sonam, Pulling in Till (cc'ed), I believe he would likely be able to help you here.
Cheers, Gordon On Fri, Apr 2, 2021 at 8:18 AM Sonam Mandal <soman...@linkedin.com> wrote: > Hello, > > We are experimenting with task local recovery and I wanted to know whether > there is a way to validate that some tasks of the job recovered from the > local state rather than the remote state. > > We've currently set this up to have 2 Task Managers with 2 slots each, and > we run a job with parallelism 4. To simulate failure, we kill one of the > Task Manager pods (we run on Kubernetes). I want to see if the local state > of the other Task Manager was used or not. I do understand that the state > for the killed Task Manager will need to be fetched from the checkpoint. > > Also, do you have any suggestions on how to test such failure scenarios in > a better way? > > Thanks, > Sonam >