Hi All,

I have a jobmanager and 3 taskmanager pods running in Kubernetes.

Suddenly my taskmanager has following logs and it got restarted.

```

2024-12-25 23:11:27.836 WARN
org.apache.pekko.remote.transport.netty.NettyTransport       [] - Remote
connection to [/10.68.16.22:55200] failed with
org.jboss.netty.handler.codec.frame.TooLongFrameException: Adjusted frame
length exceeds 10485760: 369295620 - discarded

2024-12-25 23:15:30.537 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - RECEIVED
SIGNAL 15: SIGTERM. Shutting down as requested.

2024-12-25 23:15:30.538 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - RECEIVED
SIGNAL 15: SIGTERM. Shutting down as requested.

2024-12-25 23:15:30.538 INFO
org.apache.flink.runtime.blob.TransientBlobCache             [] - Shutting
down BLOB cache

2024-12-25 23:15:30.538 INFO
org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager [] -
Shutting down TaskExecutorLocalStateStoresManager.

2024-12-25 23:15:30.538 INFO
org.apache.flink.runtime.state.TaskExecutorStateChangelogStoragesManager []
- Shutting down TaskExecutorStateChangelogStoragesManager.

2024-12-25 23:15:30.539 INFO
org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager [] -
Shutting down TaskExecutorLocalStateStoresManager.

2024-12-25 23:15:30.539 INFO
org.apache.flink.runtime.blob.TransientBlobCache             [] - Shutting
down BLOB cache



```

My Jobmanger also restarted,

```

2024-12-25 23:15:30.597 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED
SIGNAL 15: SIGTERM. Shutting down as requested.

2024-12-25 23:15:30.598 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Shutting
StandaloneSessionClusterEntrypoint down with application status UNKNOWN.
Diagnostics Cluster entrypoint has been closed externally..

2024-12-25 23:15:30.600 INFO
org.apache.flink.runtime.blob.BlobServer                     [] - Stopped
BLOB server at 0.0.0.0:6124

2024-12-25 23:24:06.523 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -
--------------------------------------------------------------------------------

```

>From Grafana, I can see the memory and CPU usage are well within limits.
>From grafana, it shows that only taskmanager 0 is restarted but not
taskmanager 2&3

I am not getting any hint for jobmanager restart & taskmanager restart.

Any suggestions/help will really be appreciated.

Thanks,

Banu

Reply via email to