Thank you Yang and Alexander. It happened while testing restore from k8s cluster backup, likely job was upgraded after “backup”, but before simulation of disaster. Thanks, Alexey ________________________________ From: Yang Wang <danrtsey...@gmail.com> Sent: Tuesday, November 16, 2021 8:00:13 PM To: Alexander Preuß <alexanderpre...@ververica.com> Cc: Alexey Trenikhun <yen...@msn.com>; Flink User Mail List <user@flink.apache.org> Subject: Re: Could not retrieve submitted JobGraph from state handle
Hi Alexey, If you delete the HA data stored in the S3 manually or maybe you configured an automatic clean-up rule, then it could happen that the ConfigMap has the pointers while the concrete data in the S3 is missing. > How to clean the state handle store? Since the handle is stored in the ConfigMap, I think you could use the following command to do the cleanup manually. An easy way is to use the different cluster id. kubectl delete cm --selector='app=<ClusterID>,configmap-type=high-availability' Best, Yang Alexander Preuß <alexanderpre...@ververica.com<mailto:alexanderpre...@ververica.com>> 于2021年11月16日周二 下午10:30写道: Hi Alexey, Are you maybe reusing the cluster-id? Also, could you provide some more information on your setup and a more complete stacktrace? The ConfigMap contains pointers to the actual files on Azure. Best, Alexander On Tue, Nov 16, 2021 at 6:14 AM Alexey Trenikhun <yen...@msn.com<mailto:yen...@msn.com>> wrote: Hello, We are using Kubernetes HA and Azure Blob storage and in rare cases I see following error: Could not retrieve submitted JobGraph from state handle under jobGraph-00000000000000000000000000000000. This indicates that the retrieved state handle is broken. Try cleaning the state handle store. Question is, how exactly can I clean "state handle store"? Delete fsp-dispatcher-leader Config Map? Or some files (which one) in Azure Blob storage? Thanks, Alexey