Hello, We've been experimenting with Task-local recovery using Kubernetes. We have a way to specify mounting the same disk across Task Manager restarts/deletions for when the pods get recreated. In this scenario, we noticed that task local recovery does not kick in (as expected based on the documentation).
We did try to comment out the code on the shutdown path which cleaned up the task local directories before the pod went down / was restarted. We noticed that remote recovery kicked in even though the task local state was present. I noticed that the slot IDs changed, and was wondering if this is the main reason that the task local state didn't get used in this scenario? Since we're using this shared disk to store the local state across pod failures, would it make sense to allow keeping the task local state so that we can get faster recovery even for situations where the Task Manager itself dies? In some sense, the storage here is disaggregated from the pods and can potentially benefit from task local recovery. Any reason why this is a bad idea in general? Is there a way to preserve the slot IDs across restarts? We setup the Task Manager to pin the resource-id, but that didn't seem to help. My understanding is that the slot ID needs to be reused for task local recovery to kick in. Thanks, Sonam