Hi, I'm currently running Flink 1.13.2 using kubernetes session mode - native kubernetes. When I update the job manager deployment through `kubectl apply flink-jobmanager-deployment.yaml`, a new job manager pod is created. I'd expect all the task manager pods will re-register with the new JM pod. However the new JM pod rejected all the existing task managers that were running before the update. It looks like the new JM deployment does not recognize the existing TM pods. Is this expected? If so, how can I configure the deployment to recover the existing TMs?
Thanks, Sharon JM logs: 2021-10-05 18:00:53,011 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registering TaskManager with ResourceID XXXXX-flink-cluster-local-taskmanager-1-1 (akka.tcp:// flink@10.244.0.191:6122/user/rpc/taskmanager_0) at ResourceManager 2021-10-05 18:00:53,033 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registering TaskManager with ResourceID XXXXX-flink-cluster-local-taskmanager-1-1 (akka.tcp:// flink@10.244.0.191:6122/user/rpc/taskmanager_0) at ResourceManager 2021-10-05 18:00:53,046 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker XXXXXX-flink-cluster-local-taskmanager-1-1 is registered. 2021-10-05 18:01:45,835 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Stopping worker XXXXX-flink-cluster-local-taskmanager-1-1. 2021-10-05 18:01:45,835 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Stopping TaskManager pod XXXXXX-flink-cluster-local-taskmanager-1-1. 2021-10-05 18:01:45,837 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Closing TaskExecutor connection XXXXXX-flink-cluster-local-taskmanager-1-1 because: TaskExecutor exceeded the idle timeout. 2021-10-05 18:01:45,877 WARN org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Discard registration from TaskExecutor XXXXX-flink-cluster-local-taskmanager-1-1 at (akka.tcp:// flink@10.244.0.191:6122/user/rpc/taskmanager_0) because the framework did not recognize it TM logs: 2021-10-05 18:01:45,843 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Close ResourceManager connection 9f664a154b1924918b46d41016324a74. 2021-10-05 18:01:45,844 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Connecting to ResourceManager akka.tcp://flink@XXXXX-flink-cluster-service :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000). 2021-10-05 18:01:45,856 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Resolved ResourceManager address, beginning registration 2021-10-05 18:01:45,883 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Fatal error occurred in TaskExecutor akka.tcp:// flink@10.244.0.191:6122/user/rpc/taskmanager_0. org.apache.flink.util.FlinkException: The TaskExecutor's registration at the ResourceManager akka.tcp://flink@XXXXX-flink-cluster-service:6123/user/rpc/resourcemanager_* has been rejected: Rejected TaskExecutor registration at the ResourceManger because: The ResourceManager does not recognize this TaskExecutor.