Hello,
we have Kafka v3.5.1 in KRAFT mode running, with two datacenters (via
10Gb/s Darkfiber):
DC 1 : 3 nodes Controller / Broker
DC 2 : 2 nodes Controller / Broker
DC 2 : 1 node Broker
Exactly at the same time: 21:01:00 (CEST) the cluster is unstable and no
producer / consumer can access the cluster
Every node has:
* Own node ID
* RACK ID
grep -E '(id|rack)' /etc/kafka/server.properties
broker.rack=0
node.id=1
broker.rack=0 -> DC1, broker.rack=1 -> DC2
We have the complete same setup also on our test system, but it runs
without any issues. The only differences, are the missing darkfiber and
different hostnames / certs. The rest is the same,because we use Puppet
for CFG management.
The logs looks like this:
DC 1, Node 1:
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,135] INFO [RaftManager id=1] Completed transition to
Unattached(epoch=1494, voters=[1, 2, 3, 4, 5], electionTimeoutMs=1638)
from FollowerState(fetchTimeoutMs=2000, epoch=1493, leaderId=5,
voters=[1, 2, 3, 4, 5],
highWatermark=Optional[LogOffsetMetadata(offset=20072183, me>
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,137] INFO [RaftManager id=1] Vote request
VoteRequestData(clusterId='Rnpnd4EcRBeWo8vUrWlOIQ',
topics=[TopicData(topicName='__cluster_metadata',
partitions=[PartitionData(partitionIndex=0, candidateEpoch=1494,
candidateId=2, lastOffsetEpoch=1493, lastOffset=20072146)])]) w>
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,137] INFO [QuorumController id=1] In the new epoch 1494, the
leader is (none). (org.apache.kafka.controller.QuorumController)
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,257] INFO [RaftManager id=1] Completed transition to
Unattached(epoch=1495, voters=[1, 2, 3, 4, 5], electionTimeoutMs=1511)
from Unattached(epoch=1494, voters=[1, 2, 3, 4, 5],
electionTimeoutMs=1638) (org.apache.kafka.raft.QuorumState)
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,258] INFO [RaftManager id=1] Vote request
VoteRequestData(clusterId='Rnpnd4EcRBeWo8vUrWlOIQ',
topics=[TopicData(topicName='__cluster_metadata',
partitions=[PartitionData(partitionIndex=0, candidateEpoch=1495,
candidateId=2, lastOffsetEpoch=1493, lastOffset=20072146)])]) w>
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,258] INFO [QuorumController id=1] In the new epoch 1495, the
leader is (none). (org.apache.kafka.controller.QuorumController)
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,378] INFO [RaftManager id=1] Completed transition to
Unattached(epoch=1496, voters=[1, 2, 3, 4, 5], electionTimeoutMs=1391)
from Unattached(epoch=1495, voters=[1, 2, 3, 4, 5],
electionTimeoutMs=1511) (org.apache.kafka.raft.QuorumState)
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,378] INFO [RaftManager id=1] Vote request
VoteRequestData(clusterId='Rnpnd4EcRBeWo8vUrWlOIQ',
topics=[TopicData(topicName='__cluster_metadata',
partitions=[PartitionData(partitionIndex=0, candidateEpoch=1496,
candidateId=2, lastOffsetEpoch=1493, lastOffset=20072146)])]) w>
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,378] INFO [QuorumController id=1] In the new epoch 1496, the
leader is (none). (org.apache.kafka.controller.QuorumController)
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,902] INFO [RaftManager id=1] Completed transition to
Unattached(epoch=1497, voters=[1, 2, 3, 4, 5], electionTimeoutMs=870)
from Unattached(epoch=1496, voters=[1, 2, 3, 4, 5],
electionTimeoutMs=1391) (org.apache.kafka.raft.QuorumState)
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,902] INFO [RaftManager id=1] Vote request
VoteRequestData(clusterId='Rnpnd4EcRBeWo8vUrWlOIQ',
topics=[TopicData(topicName='__cluster_metadata',
partitions=[PartitionData(partitionIndex=0, candidateEpoch=1497,
candidateId=2, lastOffsetEpoch=1493, lastOffset=20072146)])]) w>
Jan 28 21:01:05 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:05,902] INFO [QuorumController id=1] In the new epoch 1497, the
leader is (none). (org.apache.kafka.controller.QuorumController)
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,198] INFO [BrokerToControllerChannelManager id=1
name=heartbeat] Client requested disconnect from node 5
(org.apache.kafka.clients.NetworkClient)
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,349] INFO [RaftManager id=1] Completed transition to
Unattached(epoch=1498, voters=[1, 2, 3, 4, 5], electionTimeoutMs=422)
from Unattached(epoch=1497, voters=[1, 2, 3, 4, 5],
electionTimeoutMs=870) (org.apache.kafka.raft.QuorumState)
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,350] INFO [QuorumController id=1] In the new epoch 1498, the
leader is (none). (org.apache.kafka.controller.QuorumController)
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,357] INFO [RaftManager id=1] Completed transition to
Voted(epoch=1498, votedId=3, voters=[1, 2, 3, 4, 5],
electionTimeoutMs=1456) from Unattached(epoch=1498, voters=[1, 2, 3, 4,
5], electionTimeoutMs=422) (org.apache.kafka.raft.QuorumState)
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,357] INFO [RaftManager id=1] Vote request
VoteRequestData(clusterId='Rnpnd4EcRBeWo8vUrWlOIQ',
topics=[TopicData(topicName='__cluster_metadata',
partitions=[PartitionData(partitionIndex=0, candidateEpoch=1498,
candidateId=3, lastOffsetEpoch=1493, lastOffset=20072184)])]) w>
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,388] INFO [RaftManager id=1] Completed transition to
FollowerState(fetchTimeoutMs=2000, epoch=1498, leaderId=3, voters=[1, 2,
3, 4, 5], highWatermark=Optional[LogOffsetMetadata(offset=20072183,
metadata=Optional.empty)], fetchingSnapshot=Optional.empty) from
Voted(epoch=1>
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,389] INFO [QuorumController id=1] In the new epoch 1498, the
leader is 3. (org.apache.kafka.controller.QuorumController)
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,401] INFO [broker-1-to-controller-heartbeat-channel-manager]:
Recorded new controller, from now on will use node
qh-a08-kafka-03.example.com:9093 (id: 3 rack: null)
(kafka.server.BrokerToControllerRequestThread)
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,428] INFO [BrokerToControllerChannelManager id=1
name=heartbeat] Client requested disconnect from node 3
(org.apache.kafka.clients.NetworkClient)
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,428] INFO [broker-1-to-controller-heartbeat-channel-manager]:
Recorded new controller, from now on will use node
qh-a08-kafka-03.example.com:9093 (id: 3 rack: null)
(kafka.server.BrokerToControllerRequestThread)
Jan 28 21:01:06 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:06,479] INFO [broker-1-to-controller-heartbeat-channel-manager]:
Recorded new controller, from now on will use node
qh-a08-kafka-03.example.com:9093 (id: 3 rack: null)
(kafka.server.BrokerToControllerRequestThread)
Jan 28 21:01:18 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:18,479] INFO [RaftManager id=1] Become candidate due to fetch
timeout (org.apache.kafka.raft.KafkaRaftClient)
Jan 28 21:01:18 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:18,486] INFO [RaftManager id=1] Completed transition to
CandidateState(localId=1, epoch=1499, retries=1, voteStates={1=GRANTED,
2=UNRECORDED, 3=UNRECORDED, 4=UNRECORDED, 5=UNRECORDED},
highWatermark=Optional[LogOffsetMetadata(offset=20072204,
metadata=Optional.empty)], elect>
Jan 28 21:01:18 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:18,487] INFO [QuorumController id=1] In the new epoch 1499, the
leader is (none). (org.apache.kafka.controller.QuorumController)
Jan 28 21:01:18 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:18,488] INFO [RaftManager id=1] Disconnecting from node 3 due to
request timeout. (org.apache.kafka.clients.NetworkClient)
Jan 28 21:01:18 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:18,489] INFO [RaftManager id=1] Cancelled in-flight FETCH request
with correlation id 428513 due to node 3 being disconnected (elapsed
time since creation: 2008ms, elapsed time since send: 2007ms, request
timeout: 2000ms) (org.apache.kafka.clients.NetworkClient)
Jan 28 21:01:18 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:18,522] INFO [RaftManager id=1] Vote request
VoteRequestData(clusterId='Rnpnd4EcRBeWo8vUrWlOIQ',
topics=[TopicData(topicName='__cluster_metadata',
partitions=[PartitionData(partitionIndex=0, candidateEpoch=1499,
candidateId=4, lastOffsetEpoch=1498, lastOffset=20072204)])]) w>
Jan 28 21:01:18 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:18,535] INFO [RaftManager id=1] Vote request
VoteRequestData(clusterId='Rnpnd4EcRBeWo8vUrWlOIQ',
topics=[TopicData(topicName='__cluster_metadata',
partitions=[PartitionData(partitionIndex=0, candidateEpoch=1499,
candidateId=5, lastOffsetEpoch=1498, lastOffset=20072204)])]) w>
Jan 28 21:01:19 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:19,734] INFO [RaftManager id=1] Completed transition to
Unattached(epoch=1500, voters=[1, 2, 3, 4, 5], electionTimeoutMs=445)
from CandidateState(localId=1, epoch=1499, retries=1,
voteStates={1=GRANTED, 2=UNRECORDED, 3=UNRECORDED, 4=REJECTED,
5=REJECTED}, highWatermark=Optio>
Jan 28 21:01:19 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:19,734] INFO [QuorumController id=1] In the new epoch 1500, the
leader is (none). (org.apache.kafka.controller.QuorumController)
Jan 28 21:01:19 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:19,739] INFO [RaftManager id=1] Completed transition to
Voted(epoch=1500, votedId=5, voters=[1, 2, 3, 4, 5],
electionTimeoutMs=1624) from Unattached(epoch=1500, voters=[1, 2, 3, 4,
5], electionTimeoutMs=445) (org.apache.kafka.raft.QuorumState)
Jan 28 21:01:19 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:19,739] INFO [RaftManager id=1] Vote request
VoteRequestData(clusterId='Rnpnd4EcRBeWo8vUrWlOIQ',
topics=[TopicData(topicName='__cluster_metadata',
partitions=[PartitionData(partitionIndex=0, candidateEpoch=1500,
candidateId=5, lastOffsetEpoch=1498, lastOffset=20072204)])]) w>
Jan 28 21:01:19 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:19,773] INFO [RaftManager id=1] Completed transition to
FollowerState(fetchTimeoutMs=2000, epoch=1500, leaderId=5, voters=[1, 2,
3, 4, 5], highWatermark=Optional[LogOffsetMetadata(offset=20072204,
metadata=Optional.empty)], fetchingSnapshot=Optional.empty) from
Voted(epoch=1>
Jan 28 21:01:19 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:19,774] INFO [QuorumController id=1] In the new epoch 1500, the
leader is 5. (org.apache.kafka.controller.QuorumController)
Jan 28 21:01:20 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:20,781] INFO [RaftManager id=1] Disconnecting from node 2 due to
request timeout. (org.apache.kafka.clients.NetworkClient)
Jan 28 21:01:20 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:20,781] INFO [RaftManager id=1] Cancelled in-flight VOTE request
with correlation id 428514 due to node 2 being disconnected (elapsed
time since creation: 2294ms, elapsed time since send: 2254ms, request
timeout: 2000ms) (org.apache.kafka.clients.NetworkClient)
Jan 28 21:01:20 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:20,783] INFO [RaftManager id=1] Disconnecting from node 3 due to
request timeout. (org.apache.kafka.clients.NetworkClient)
Jan 28 21:01:20 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:20,783] INFO [RaftManager id=1] Cancelled in-flight VOTE request
with correlation id 428521 due to node 3 being disconnected (elapsed
time since creation: 2238ms, elapsed time since send: 2215ms, request
timeout: 2000ms) (org.apache.kafka.clients.NetworkClient)
Jan 28 21:01:21 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:21,017] INFO [BrokerToControllerChannelManager id=1
name=heartbeat] Disconnecting from node 3 due to request timeout.
(org.apache.kafka.clients.NetworkClient)
Jan 28 21:01:21 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:21,017] INFO [BrokerToControllerChannelManager id=1
name=heartbeat] Cancelled in-flight BROKER_HEARTBEAT request with
correlation id 107067 due to node 3 being disconnected (elapsed time
since creation: 4501ms, elapsed time since send: 4501ms, request
timeout: 4500ms) (org.a>
Jan 28 21:01:21 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:21,017] INFO [broker-1-to-controller-heartbeat-channel-manager]:
Recorded new controller, from now on will use node
fc-r01-kafka-02.example.com:9093 (id: 5 rack: null)
(kafka.server.BrokerToControllerRequestThread)
Jan 28 21:01:21 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:21,017] INFO [BrokerLifecycleManager id=1] Unable to send a
heartbeat because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
Jan 28 21:01:28 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:28,810] INFO [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0]
Partition company21_pc21_transaction-2 has an older epoch (89) than the
current leader. Will await the new LeaderAndIsr state before resuming
fetching. (kafka.server.ReplicaFetcherThread)
Jan 28 21:01:28 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:28,811] WARN [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0]
Partition company21_pc21_transaction-2 marked as failed
(kafka.server.ReplicaFetcherThread)
Jan 28 21:01:28 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:28,842] INFO [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0]
Partition kafka_proxy_test2-0 has an older epoch (86) than the current
leader. Will await the new LeaderAndIsr state before resuming fetching.
(kafka.server.ReplicaFetcherThread)
Jan 28 21:01:28 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:28,842] WARN [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0]
Partition kafka_proxy_test2-0 marked as failed
(kafka.server.ReplicaFetcherThread)
...
Jan 28 21:01:28 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:28,881] INFO [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0]
Partition chargebacks-3 has an older epoch (86) than the current leader.
Will await the new LeaderAndIsr state before resuming fetching.
(kafka.server.ReplicaFetcherThread)
Jan 28 21:01:28 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:28,881] WARN [ReplicaFetcher replicaId=1, leaderId=5, fetcherId=0]
Partition chargebacks-3 marked as failed (kafka.server.ReplicaFetcherThread)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,300] INFO [ReplicaFetcherManager on broker 1] Removed fetcher
for partitions Set(mm2-offsets.FC-R02.internal-24,
__consumer_offsets-48, __consumer_offsets-13, kafka_proxy_test1-0,
mm2-configs.FC-R02.internal-0, __consumer_offsets-20,
mm2-status.FC-R02.internal-1, __consum>
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,371] INFO [ReplicaFetcher replicaId=1, leaderId=4, fetcherId=0]
Partition company21_pc21_transaction-9 has an older epoch (89) than the
current leader. Will await the new LeaderAndIsr state before resuming
fetching. (kafka.server.ReplicaFetcherThread)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,372] WARN [ReplicaFetcher replicaId=1, leaderId=4, fetcherId=0]
Partition company21_pc21_transaction-9 marked as failed
(kafka.server.ReplicaFetcherThread)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,439] INFO [ReplicaFetcherManager on broker 1] Removed fetcher
for partitions Set(blacklist_transactions-9, kafka_proxy_test2-0,
mm2-offsets.FC-R02.internal-18, mm2-offsets.FC-R02.internal-23,
company21_pc21_transaction-2, chargebacks-1,
company21_pc21_transaction-0, p>
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,443] INFO [ReplicaFetcherManager on broker 1] Added fetcher to
broker 2 for partitions HashMap(blacklist_transactions-9 ->
InitialFetchState(Some(RxifGFHPQsGMWP5Sq_rSFg),BrokerEndPoint(id=2,
host=qh-a08-kafka-02.example.com:9092),100,1748933),
mm2-offsets.FC-R02.internal-2>
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,444] INFO [ReplicaFetcherManager on broker 1] Added fetcher to
broker 5 for partitions HashMap(company21_pc21_transaction-2 ->
InitialFetchState(Some(oruOHN6SSLOsuxgB4YGuyw),BrokerEndPoint(id=5,
host=fc-r01-kafka-02.example.com:9092),90,6704572), kafka_proxy_test2-0
-> I>
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,444] INFO [ReplicaFetcherManager on broker 1] Added fetcher to
broker 4 for partitions HashMap(chargebacks-1 ->
InitialFetchState(Some(Df7E7Y3-TxKjd5QIBB2mgg),BrokerEndPoint(id=4,
host=fc-r01-kafka-01.example.com:9092),73,0),
company21_pc21_transaction-0 -> InitialFetchS>
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,446] INFO [ReplicaFetcherThread-0-3]: Shutting down
(kafka.server.ReplicaFetcherThread)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,447] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0]
Client requested connection close from node 3
(org.apache.kafka.clients.NetworkClient)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,448] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0]
Cancelled in-flight FETCH request with correlation id 191052 due to node
3 being disconnected (elapsed time since creation: 5306ms, elapsed time
since send: 5306ms, request timeout: 30000ms) (org.apache.kafka>
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,448] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0]
Error sending fetch request (sessionId=359115694, epoch=191052) to node
3: (org.apache.kafka.clients.FetchSessionHandler)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: java.io.IOException:
Client was shutdown before response was read
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:108)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:113)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
scala.Option.foreach(Option.scala:437)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: at
org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:127)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,450] INFO [ReplicaFetcherThread-0-3]: Stopped
(kafka.server.ReplicaFetcherThread)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,450] INFO [ReplicaFetcherThread-0-3]: Shutdown completed
(kafka.server.ReplicaFetcherThread)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,454] INFO [GroupCoordinator 1]: Elected as the group
coordinator for partition 48 in epoch 50
(kafka.coordinator.group.GroupCoordinator)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,454] INFO [GroupMetadataManager brokerId=1] Scheduling loading
of offsets and group metadata from __consumer_offsets-48 for epoch 50
(kafka.coordinator.group.GroupMetadataManager)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,455] INFO [GroupCoordinator 1]: Elected as the group
coordinator for partition 13 in epoch 50
(kafka.coordinator.group.GroupCoordinator)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,455] INFO [GroupMetadataManager brokerId=1] Scheduling loading
of offsets and group metadata from __consumer_offsets-13 for epoch 50
(kafka.coordinator.group.GroupMetadataManager)
Jan 28 21:01:29 qh-a08-kafka-01 kafka[1936210]: [2024-01-28
21:01:29,455] INFO [GroupCoordinator 1]: Elected as the group
coordinator for partition 30 in epoch 63
(kafka.coordinator.group.GroupCoordinator)
...
after a while ~10-20secs later, all is fine again.
Before we switched to this new cluster, we had mm2 configured, to sync
all from the old 3.5.1 Cluster, but with Zookeeper to the new one, with
KRAFT enabled.
The whole config for broker / controller is:
===============================
advertised.listeners=INTERNAL://qh-a08-kafka-01.example.com:9092,CLIENT://:9095,EXTERNAL://qh-a08-kafka-01.example.com:63796
allow.everyone.if.no.acl.found=true
authorizer.class.name=org.apache.kafka.metadata.authorizer.StandardAuthorizer
auto.create.topics.enable=false
broker.rack=0
controller.listener.names=CONTROLLER
controller.quorum.voters=1...@qh-a08-kafka-01.example.com:9093,2...@qh-a08-kafka-02.example.com:9093,3...@qh-a08-kafka-03.example.com:9093,4...@fc-r01-kafka-01.example.com:9093,5...@fc-r01-kafka-02.example.com:9093
default.replication.factor=3
early.start.listeners=CONTROLLER
inter.broker.listener.name=INTERNAL
listener.name.controller.ssl.client.auth=required
listener.security.protocol.map=INTERNAL:SASL_SSL,CLIENT:SASL_SSL,CONTROLLER:SSL,EXTERNAL:SASL_SSL
listeners=INTERNAL://:9092,CLIENT://:9095,CONTROLLER://:9093,
EXTERNAL://:9094
log.cleanup.policy=delete
log.dirs=/data/kafka/
log.retention.check.interval.ms=300000
log.retention.hours=24
log.segment.bytes=1073741824
min.insync.replicas=2
node.id=1
num.io.threads=8
num.network.threads=3
num.partitions=4
num.recovery.threads.per.data.dir=3
offsets.topic.replication.factor=2
process.roles=broker,controller
sasl.enabled.mechanisms=PLAIN,SASL_SSL
sasl.mechanism.inter.broker.protocol=PLAIN
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
ssl.cipher.suites=TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,TLS_RSA_WITH_AES_256_CBC_SHA256,TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384,TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
ssl.enabled.protocols=TLSv1.2
ssl.key.password=KnezAhKPNKn-53f.99unuuCp,EwfXq
ssl.keystore.location=/etc/ssl/private/kafka_example_chain.crt
ssl.keystore.type=PEM
ssl.truststore.type=PEM
ssl.truststore.location=/etc/ssl/private/kafka_example_chain.crt
super.users=User:CN=*.example.com
transaction.state.log.min.isr=2
transaction.state.log.replication.factor=2
============================================
The config from the broker only, looks this:
============================================
advertised.listeners=INTERNAL://fc-r01-kafka-03.example.com:9092,CLIENT://:9095,EXTERNAL://fc-r01-kafka-03.example.com:63796
allow.everyone.if.no.acl.found=true
authorizer.class.name=org.apache.kafka.metadata.authorizer.StandardAuthorizer
auto.create.topics.enable=false
broker.rack=1
controller.listener.names=CONTROLLER
controller.quorum.voters=1...@qh-a08-kafka-01.example.com:9093,2...@qh-a08-kafka-02.example.com:9093,3...@qh-a08-kafka-03.example.com:9093,4...@fc-r01-kafka-01.example.com:9093,5...@fc-r01-kafka-02.example.com:9093
default.replication.factor=3
inter.broker.listener.name=INTERNAL
listener.name.controller.ssl.client.auth=required
listener.security.protocol.map=INTERNAL:SASL_SSL,CLIENT:SASL_SSL,CONTROLLER:SSL,EXTERNAL:SASL_SSL
listeners=INTERNAL://:9092,CLIENT://:9095,EXTERNAL://:9094
log.cleanup.policy=delete
log.dirs=/data/kafka/
log.retention.check.interval.ms=300000
log.retention.hours=24
log.segment.bytes=1073741824
min.insync.replicas=2
node.id=6
num.io.threads=8
num.network.threads=3
num.partitions=4
num.recovery.threads.per.data.dir=3
offsets.topic.replication.factor=2
process.roles=broker
sasl.enabled.mechanisms=PLAIN,SASL_SSL
sasl.mechanism.inter.broker.protocol=PLAIN
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
ssl.cipher.suites=TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,TLS_RSA_WITH_AES_256_CBC_SHA256,TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384,TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
ssl.enabled.protocols=TLSv1.2
ssl.key.password=KnezAhKPNKn-53f.99unuuCp,EwfXq
ssl.keystore.location=/etc/ssl/private/kafka_example_chain.crt
ssl.keystore.type=PEM
ssl.truststore.type=PEM
super.users=User:CN=*.example.com
transaction.state.log.min.isr=2
transaction.state.log.replication.factor=2
Any suggestion, what can be the issue ?
cu denny