Hi,

I seems that the node `tef-prod-flink-04/10.11.0.51:37505 [
tef-prod-flink-04:38835-e3ca4d ]` exits unexpected, you can check whether
there are some errors in the log of TM or K8S

Best,
Shammon FY


On Sun, Aug 20, 2023 at 5:42 PM Kenan Kılıçtepe <kkilict...@gmail.com>
wrote:

> Hi,
>
> Nothing interesting on Kafka side.Just sone partition delete/create logs.
> Also I can't understand why all task managers stop at the same time
> without any error log.
>
> Thanks
> Kenan
>
>
>
> On Sun, Aug 20, 2023 at 10:49 AM liu ron <ron9....@gmail.com> wrote:
>
>> Hi,
>>
>> Maybe you need to check what changed on the Kafka side at that time.
>>
>> Best,
>> Ron
>>
>> Kenan Kılıçtepe <kkilict...@gmail.com> 于2023年8月20日周日 08:51写道:
>>
>>> Hi,
>>>
>>> I have 4 task manager working on 4 servers.
>>> They all crush at the same time without any useful error logs.
>>> Only log I can see is some disconnection from Kafka for both consumer
>>> and producers.
>>> Any idea or any help is appreciated.
>>>
>>> Some logs from all taskmanagers:
>>>
>>> I think first server 4 is crushing and it causes crush for all
>>> taskmanagers.
>>>
>>> JobManager:
>>>
>>> 2023-08-18 15:16:46,528 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [AdminClient clientId=47539-enumerator-admin-client]
>>> Node 2 disconnected.
>>> 2023-08-18 15:19:00,303 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [AdminClient
>>> clientId=tf_25464-enumerator-admin-client] Node 4 disconnected.
>>> 2023-08-18 15:19:16,668 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [AdminClient
>>> clientId=cpu_59942-enumerator-admin-client] Node 1 disconnected.
>>> 2023-08-18 15:19:16,764 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [AdminClient
>>> clientId=cpu_55128-enumerator-admin-client] Node 3 disconnected.
>>> 2023-08-18 15:19:27,913 WARN  akka.remote.transport.netty.NettyTransport
>>>                   [] - Remote connection to [/10.11.0.51:42778] failed
>>> with java.io.IOException: Connection reset by peer
>>> 2023-08-18 15:19:27,963 WARN  akka.remote.ReliableDeliverySupervisor
>>>                   [] - Association with remote system
>>> [akka.tcp://flink@tef-prod-flink-04:38835] has failed, address is now
>>> gated for [50] ms. Reason: [Disassociated]
>>> 2023-08-18 15:19:27,967 WARN  akka.remote.ReliableDeliverySupervisor
>>>                   [] - Association with remote system
>>> [akka.tcp://flink-metrics@tef-prod-flink-04:46491] has failed, address
>>> is now gated for [50] ms. Reason: [Disassociated]
>>> 2023-08-18 15:19:29,225 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
>>> RouterReplacementAlgorithm -> kafkaSink_sinkFaultyRouter_windowMode: Writer
>>> -> kafkaSink_sinkFaultyRouter_windowMode: Committer (3/4)
>>> (f6fd65e3fc049bd9021093d8f532bbaf_a47f4a3b960228021159de8de51dbb1f_2_0)
>>> switched from RUNNING to FAILED on
>>> injection-assia-3-pro-cloud-tef-gcp-europe-west1:39011-b24b1d @
>>> injection-assia-3-pro-cloud-tef-gcp-europe-west1 (dataPort=35223).
>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>> Connection unexpectedly closed by remote task manager 'tef-prod-flink-04/
>>> 10.11.0.51:37505 [ tef-prod-flink-04:38835-e3ca4d ] '. This might
>>> indicate that the remote task manager was lost.
>>>         at
>>> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:134)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:831)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at java.lang.Thread.run(Thread.java:829) ~[?:?]
>>>
>>>
>>> 2023-08-18 15:19:29,225 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
>>> GroupAggregate[50] (1/1)
>>> (7afc36d5660c8070d4d1e426428904a3_b2294522dce77da76437a18d2de62f66_0_0)
>>> switched from RUNNING to FAILED on
>>> injection-assia-3-pro-cloud-tef-gcp-europe-west1:39011-b24b1d @
>>> injection-assia-3-pro-cloud-tef-gcp-europe-west1 (dataPort=35223).
>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>> Connection unexpectedly closed by remote task manager 'tef-prod-flink-04/
>>> 10.11.0.51:37505 [ tef-prod-flink-04:38835-e3ca4d ] '. This might
>>> indicate that the remote task manager was lost.
>>>         at
>>> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:134)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:831)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>> ~[flink-dist-1.16.2.jar:1.16.2]
>>>         at java.lang.Thread.run(Thread.java:829) ~[?:?]
>>> 2023-08-18 15:19:29,294 INFO
>>>  org.apache.flink.runtime.jobmaster.JobMaster                 [] - 16 tasks
>>> will be restarted to recover the failed task
>>> f6fd65e3fc049bd9021093d8f532bbaf_a47f4a3b960228021159de8de51dbb1f_2_0.
>>> 2023-08-18 15:19:29,294 INFO
>>>  org.apache.flink.runtime.jobmaster.JobMaster                 [] - 114
>>> tasks will be restarted to recover the failed task
>>> 7afc36d5660c8070d4d1e426428904a3_b2294522dce77da76437a18d2de62f66_0_0.
>>> 2023-08-18 15:19:29,294 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job
>>> Health Algorithms V1.2 IntervalMin:180 EmbeddedRocksDBStateBackend x4
>>> (8be8bada21ee7859c596deb49f29bbfd) switched from state RUNNING to
>>> RESTARTING.
>>> 2023-08-18 15:19:29,294 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job
>>> Temperature Algorithms V1.2 IntervalMin:180 EmbeddedRocksDBStateBackend x4
>>> (f6b3823cc5b0d9462f09ca7e21a468b2) switched from state RUNNING to
>>> RESTARTING.
>>>
>>>
>>>
>>> Server1
>>>
>>>
>>> 2023-08-18 15:18:39,898 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-102] Resetting the last
>>> seen epoch of partition alerts.wifiqos-14 to 0 since the associated topicId
>>> changed from null to hGzbb6Q9R-eApP_OwyFNfw
>>> 2023-08-18 15:18:39,898 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-102] Resetting the last
>>> seen epoch of partition alerts.wifiqos-4 to 0 since the associated topicId
>>> changed from null to hGzbb6Q9R-eApP_OwyFNfw
>>> 2023-08-18 15:18:48,568 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-36] Node 4 disconnected.
>>> 2023-08-18 15:18:49,864 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-66] Node 2 disconnected.
>>> 2023-08-18 15:18:49,912 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-67] Node 2 disconnected.
>>> 2023-08-18 15:18:50,156 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-64] Node 4 disconnected.
>>> 2023-08-18 15:19:01,240 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-105] Node 4 disconnected.
>>> 2023-08-18 15:19:17,252 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-41] Node 1 disconnected.
>>> 2023-08-18 15:19:27,612 ERROR
>>> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue [] -
>>> Encountered error while consuming partitions
>>> org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException:
>>> readAddress(..) failed: Connection reset by peer
>>> 2023-08-18 15:19:28,053 INFO
>>>  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend [] -
>>> Closed RocksDB State Backend. Cleaning up RocksDB working directory
>>> /home/assia/data2/flink1/job_9f2d6dc13df39c9f5ed1a0fe69e5deae_op_KeyedProcessOperator_0456b63cf4d1645b74bf4c5eed8d03f0__1_4__uuid_bbb68a27-a502-4fa9-b698-2894a4f6c8f0.
>>> 2023-08-18 15:19:28,053 INFO
>>>  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend [] -
>>> Closed RocksDB State Backend. Cleaning up RocksDB working directory
>>> /home/assia/data1/flink3/job_9f2d6dc13df39c9f5ed1a0fe69e5deae_op_KeyedProcessOperator_b3fea86a6e448bb11ab554a395c1d49f__1_4__uuid_db93d58c-86ad-4985-b6d7-8dfa5a223628.
>>> 2023-08-18 15:19:28,053 INFO
>>>  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend [] -
>>> Closed RocksDB State Backend. Cleaning up RocksDB working directory
>>> /home/assia/data3/flink1/job_b21089450115e102d5f8502110fd8b41_op_SlicingWindowOperator_28af45bb9be1c3c8edc302ec9b4b2ee1__15_18__uuid_6750135d-1896-4083-9016-56e782cff254.
>>> 2023-08-18 15:19:28,053 INFO
>>>  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend [] -
>>> Closed RocksDB State Backend. Cleaning up RocksDB working directory
>>> /home/assia/data1/flink2/job_9f2d6dc13df39c9f5ed1a0fe69e5deae_op_KeyedProcessOperator_29b51ba77663f1dcf21f382adc455811__1_4__uuid_04368858-c202-41f2-a4d9-788dfe0f6e3c.
>>> 2023-08-18 15:19:28,053 INFO
>>>  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend [] -
>>> Closed RocksDB State Backend. Cleaning up RocksDB working directory
>>> /home/assia/data3/flink2/job_b21089450115e102d5f8502110fd8b41_op_SlicingWindowOperator_28af45bb9be1c3c8edc302ec9b4b2ee1__6_18__uuid_62ca2b18-560a-45ae-887e-b3060abe58d3.
>>>
>>> ...
>>> ...
>>> ...
>>>
>>> 2023-08-18 15:19:29,550 WARN  org.apache.flink.runtime.taskmanager.Task
>>>                    [] - WifiStationAndInterfaceQoECalculator ->
>>> Timestamps/Watermarks (6/18)#0
>>> (f003b7bae04aeade2276538c66237dfe_547178fd75c00c015d9d73f02082a7cd_5_0)
>>> switched from RUNNING to FAILED with failure cause:
>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>> Lost connection to task manager 'tef-prod-flink-04/10.11.0.51:37505 [
>>> tef-prod-flink-04:38835-e3ca4d ] '. This indicates that the remote task
>>> manager was lost.
>>>         at
>>> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:165)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.handleReadException(AbstractEpollStreamChannel.java:728)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:821)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>>         at java.base/java.lang.Thread.run(Thread.java:829)
>>> Caused by:
>>> org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException:
>>> readAddress(..) failed: Connection reset by peer
>>>
>>> 2023-08-18 15:19:29,550 WARN  org.apache.flink.runtime.taskmanager.Task
>>>                    [] - GroupAggregate[18] -> ConstraintEnforcer[19] ->
>>> TableToDataSteam (6/18)#0
>>> (f003b7bae04aeade2276538c66237dfe_2a967f37b689e0cf1eb34feac9c5b6c3_5_0)
>>> switched from RUNNING to FAILED with failure cause:
>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>> Lost connection to task manager 'tef-prod-flink-04/10.11.0.51:37505 [
>>> tef-prod-flink-04:38835-e3ca4d ] '. This indicates that the remote task
>>> manager was lost.
>>>         at
>>> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:165)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.handleReadException(AbstractEpollStreamChannel.java:728)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:821)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>>>         at
>>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>>         at java.base/java.lang.Thread.run(Thread.java:829)
>>>
>>>
>>>
>>> Server 4:
>>>
>>> 2023-08-18 15:07:18,225 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-89] Resetting the last
>>> seen epoch of partition alerts.noise-10 to 0 since the associated topicId
>>> changed from null to mjudIQc3Tg688a11hC78Zw
>>> 2023-08-18 15:07:18,225 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-89] Resetting the last
>>> seen epoch of partition alerts.noise-15 to 0 since the associated topicId
>>> changed from null to mjudIQc3Tg688a11hC78Zw
>>> 2023-08-18 15:07:18,225 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-89] Resetting the last
>>> seen epoch of partition alerts.noise-2 to 0 since the associated topicId
>>> changed from null to mjudIQc3Tg688a11hC78Zw
>>> 2023-08-18 15:07:18,225 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-89] Resetting the last
>>> seen epoch of partition alerts.noise-11 to 0 since the associated topicId
>>> changed from null to mjudIQc3Tg688a11hC78Zw
>>> 2023-08-18 15:07:18,225 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-89] Resetting the last
>>> seen epoch of partition alerts.noise-3 to 0 since the associated topicId
>>> changed from null to mjudIQc3Tg688a11hC78Zw
>>> 2023-08-18 15:07:18,225 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-89] Resetting the last
>>> seen epoch of partition alerts.noise-6 to 0 since the associated topicId
>>> changed from null to mjudIQc3Tg688a11hC78Zw
>>> 2023-08-18 15:07:18,225 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-89] Resetting the last
>>> seen epoch of partition alerts.noise-14 to 0 since the associated topicId
>>> changed from null to mjudIQc3Tg688a11hC78Zw
>>> 2023-08-18 15:07:18,225 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-89] Resetting the last
>>> seen epoch of partition alerts.noise-5 to 0 since the associated topicId
>>> changed from null to mjudIQc3Tg688a11hC78Zw
>>> 2023-08-18 15:07:18,225 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-89] Resetting the last
>>> seen epoch of partition alerts.noise-13 to 0 since the associated topicId
>>> changed from null to mjudIQc3Tg688a11hC78Zw
>>> ---
>>> ---
>>>
>>> 2023-08-18 15:08:50,878 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-5] Node 4 disconnected.
>>> 2023-08-18 15:09:05,583 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-22] Node 2 disconnected.
>>> 2023-08-18 15:09:07,602 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-58] Node 1 disconnected.
>>> 2023-08-18 15:09:13,006 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-24] Node 3 disconnected.
>>> 2023-08-18 15:09:13,106 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-24] Node 1 disconnected.
>>> 2023-08-18 15:09:14,359 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-8] Node 3 disconnected.
>>> 2023-08-18 15:09:14,359 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-17] Node 4 disconnected.
>>> 2023-08-18 15:09:14,362 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-16] Node 2 disconnected.
>>> 2023-08-18 15:09:15,998 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-20] Node 3 disconnected.
>>>
>>>
>>>
>>> 2023-08-18 15:12:51,108 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Consumer clientId=mem_2731-1,
>>> groupId=memalgogroup1-prod] Cancelled in-flight FETCH request with
>>> correlation id 1987537 due to node 3 being disconnected (elapsed time since
>>> creation: 365797ms, elapsed time since send: 365797ms, request timeout:
>>> 30000ms)
>>> 2023-08-18 15:12:51,108 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Consumer clientId=mem_2731-1,
>>> groupId=memalgogroup1-prod] Disconnecting from node 4 due to request
>>> timeout.
>>> 2023-08-18 15:12:51,108 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Consumer clientId=mem_2731-1,
>>> groupId=memalgogroup1-prod] Cancelled in-flight FETCH request with
>>> correlation id 1987536 due to node 4 being disconnected (elapsed time since
>>> creation: 365800ms, elapsed time since send: 365800ms, request timeout:
>>> 30000ms)
>>> 2023-08-18 15:12:51,108 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Consumer clientId=mem_2731-0,
>>> groupId=memalgogroup1-prod] Disconnecting from node 2 due to request
>>> timeout.
>>> 2023-08-18 15:12:51,108 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Consumer clientId=mem_2731-0,
>>> groupId=memalgogroup1-prod] Cancelled in-flight FETCH request with
>>> correlation id 1985751 due to node 2 being disconnected (elapsed time since
>>> creation: 365801ms, elapsed time since send: 365801ms, request timeout:
>>> 30000ms)
>>> 2023-08-18 15:12:51,108 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Consumer clientId=mem_2731-0,
>>> groupId=memalgogroup1-prod] Cancelled in-flight METADATA request with
>>> correlation id 1985752 due to node 2 being disconnected (elapsed time since
>>> creation: 0ms, elapsed time since send: 0ms, request timeout: 30000ms)
>>> 2023-08-18 15:12:51,108 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Consumer clientId=mem_2731-0,
>>> groupId=memalgogroup1-prod] Disconnecting from node 4 due to request
>>> timeout.
>>> 2023-08-18 15:12:51,108 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Consumer clientId=mem_2731-0,
>>> groupId=memalgogroup1-prod] Cancelled in-flight FETCH request with
>>> correlation id 1985750 due to node 4 being disconnected (elapsed time since
>>> creation: 365803ms, elapsed time since send: 365803ms, request timeout:
>>> 30000ms)
>>> 2023-08-18 15:12:51,162 INFO
>>>  org.apache.kafka.clients.FetchSessionHandler                 [] -
>>> [Consumer clientId=mem_2731-0, groupId=memalgogroup1-prod] Error sending
>>> fetch request (sessionId=475536967, epoch=111003) to node 2:
>>> org.apache.kafka.common.errors.DisconnectException: null
>>> 2023-08-18 15:12:51,162 INFO
>>>  org.apache.kafka.clients.FetchSessionHandler                 [] -
>>> [Consumer clientId=mem_2731-1, groupId=memalgogroup1-prod] Error sending
>>> fetch request (sessionId=294196072, epoch=110689) to node 3:
>>> org.apache.kafka.common.errors.DisconnectException: null
>>> 2023-08-18 15:12:51,162 INFO
>>>  org.apache.kafka.clients.FetchSessionHandler                 [] -
>>> [Consumer clientId=mem_2731-0, groupId=memalgogroup1-prod] Error sending
>>> fetch request (sessionId=985795042, epoch=111503) to node 4:
>>> org.apache.kafka.common.errors.DisconnectException: null
>>> 2023-08-18 15:12:51,162 INFO
>>>  org.apache.kafka.clients.FetchSessionHandler                 [] -
>>> [Consumer clientId=mem_2731-1, groupId=memalgogroup1-prod] Error sending
>>> fetch request (sessionId=1034101367, epoch=110920) to node 4:
>>> org.apache.kafka.common.errors.DisconnectException: null
>>> 2023-08-18 15:12:51,282 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-127] Node 1 disconnected.
>>> 2023-08-18 15:12:57,362 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-12] Node 1 disconnected.
>>> 2023-08-18 15:12:57,363 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-12] Node 4 disconnected.
>>> 2023-08-18 15:13:02,690 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-57] Node 2 disconnected.
>>>
>>> ...
>>> ...
>>> ...
>>>
>>> 2023-08-18 15:18:23,910 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-15] Resetting the last
>>> seen epoch of partition messages.invalid-13 to 0 since the associated
>>> topicId changed from null to axH1mwlLQNGmYe95C78Bew
>>> 2023-08-18 15:18:23,910 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-15] Resetting the last
>>> seen epoch of partition messages.invalid-15 to 0 since the associated
>>> topicId changed from null to axH1mwlLQNGmYe95C78Bew
>>> 2023-08-18 15:18:23,910 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-15] Resetting the last
>>> seen epoch of partition messages.invalid-2 to 0 since the associated
>>> topicId changed from null to axH1mwlLQNGmYe95C78Bew
>>> 2023-08-18 15:18:23,910 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-15] Resetting the last
>>> seen epoch of partition messages.invalid-14 to 0 since the associated
>>> topicId changed from null to axH1mwlLQNGmYe95C78Bew
>>> 2023-08-18 15:18:23,910 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-15] Resetting the last
>>> seen epoch of partition messages.invalid-10 to 0 since the associated
>>> topicId changed from null to axH1mwlLQNGmYe95C78Bew
>>> 2023-08-18 15:18:23,910 INFO  org.apache.kafka.clients.Metadata
>>>                    [] - [Producer clientId=producer-15] Resetting the last
>>> seen epoch of partition messages.invalid-9 to 0 since the associated
>>> topicId changed from null to axH1mwlLQNGmYe95C78Bew
>>> 2023-08-18 15:18:28,270 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-173] Node 3 disconnected.
>>> 2023-08-18 15:18:28,429 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-175] Node 4 disconnected.
>>> 2023-08-18 15:18:28,982 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-174] Node 4 disconnected.
>>> 2023-08-18 15:18:36,480 INFO
>>>  org.apache.kafka.clients.consumer.internals.Fetcher          [] -
>>> [Consumer clientId=wifialgogroup1WifiInterface-6,
>>> groupId=wifialgogroup1WifiInterface] Fetch position
>>> FetchPosition{offset=69028579, offsetEpoch=Optional[0],
>>> currentLeader=LeaderAndEpoch{leader=Optional[10.11.0.41:9092 (id: 2
>>> rack: null)], epoch=0}} is out of range for partition
>>> telemetry.wifi.interface-13, resetting offset
>>> 2023-08-18 15:18:36,481 INFO
>>>  org.apache.kafka.clients.consumer.internals.SubscriptionState [] -
>>> [Consumer clientId=wifialgogroup1WifiInterface-6,
>>> groupId=wifialgogroup1WifiInterface] Resetting offset for partition
>>> telemetry.wifi.interface-13 to position FetchPosition{offset=71593021,
>>> offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[
>>> 10.11.0.41:9092 (id: 2 rack: null)], epoch=0}}.
>>> 2023-08-18 15:18:36,546 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-15] Node 3 disconnected.
>>> 2023-08-18 15:18:36,547 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-15] Node 1 disconnected.
>>> 2023-08-18 15:18:37,170 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-13] Node 4 disconnected.
>>> 2023-08-18 15:18:44,718 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-14] Node 2 disconnected.
>>> 2023-08-18 15:18:46,521 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-59] Node 3 disconnected.
>>> 2023-08-18 15:18:48,530 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-47] Node 4 disconnected.
>>> 2023-08-18 15:18:50,018 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-61] Node 4 disconnected.
>>> 2023-08-18 15:18:50,219 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-130] Node 3 disconnected.
>>> 2023-08-18 15:18:55,206 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-5] Node 2 disconnected.
>>> 2023-08-18 15:19:06,115 INFO  org.apache.kafka.clients.NetworkClient
>>>                   [] - [Producer clientId=producer-22] Node 4 disconnected.
>>>
>>>
>>> And system crushes here.
>>>
>>>
>>> TaskManager2:
>>>
>>>

Reply via email to