Hi All, I have a jobmanager and 3 taskmanager pods running in Kubernetes.
Suddenly my taskmanager has following logs and it got restarted. ``` 2024-12-25 23:11:27.836 WARN org.apache.pekko.remote.transport.netty.NettyTransport [] - Remote connection to [/10.68.16.22:55200] failed with org.jboss.netty.handler.codec.frame.TooLongFrameException: Adjusted frame length exceeds 10485760: 369295620 - discarded 2024-12-25 23:15:30.537 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested. 2024-12-25 23:15:30.538 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested. 2024-12-25 23:15:30.538 INFO org.apache.flink.runtime.blob.TransientBlobCache [] - Shutting down BLOB cache 2024-12-25 23:15:30.538 INFO org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager [] - Shutting down TaskExecutorLocalStateStoresManager. 2024-12-25 23:15:30.538 INFO org.apache.flink.runtime.state.TaskExecutorStateChangelogStoragesManager [] - Shutting down TaskExecutorStateChangelogStoragesManager. 2024-12-25 23:15:30.539 INFO org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager [] - Shutting down TaskExecutorLocalStateStoresManager. 2024-12-25 23:15:30.539 INFO org.apache.flink.runtime.blob.TransientBlobCache [] - Shutting down BLOB cache ``` My Jobmanger also restarted, ``` 2024-12-25 23:15:30.597 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested. 2024-12-25 23:15:30.598 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting StandaloneSessionClusterEntrypoint down with application status UNKNOWN. Diagnostics Cluster entrypoint has been closed externally.. 2024-12-25 23:15:30.600 INFO org.apache.flink.runtime.blob.BlobServer [] - Stopped BLOB server at 0.0.0.0:6124 2024-12-25 23:24:06.523 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -------------------------------------------------------------------------------- ``` >From Grafana, I can see the memory and CPU usage are well within limits. >From grafana, it shows that only taskmanager 0 is restarted but not taskmanager 2&3 I am not getting any hint for jobmanager restart & taskmanager restart. Any suggestions/help will really be appreciated. Thanks, Banu