Hi all!
We are running Kafka in a 3 node setup with Kafka and Zookeeper on each node.
The topics have 1 partition and 2 replicas, like:
Topic:someTopic PartitionCount:1 ReplicationFactor:2
Configs:retention.ms=600000
Topic: someTopic Partition: 0 Leader: 2 Replicas: 2,0 Isr: 2,0
We uses the following settings
Consumer settings:
fetch.min.bytes=1
enable.auto.commit=true
max.partition.fetch.bytes=1073741824
Producer settings:
metadata.fetch.timeout.ms=1000
If we stop Kafka and Zookeeper on one node with 'kill -9', Kafka detects that
the leader is missing within seconds and switches leader to the other replica
and consumers will continue to receive messages.
If we on the other hand bring down the network for the same node with 'ifdown
eth0' (which will break the connection to both Kafka and Zookeeper on that
node) it seems like Kafka have problems detecting that the broker is missing
and it takes up to 2 minutes until any more messages can be consumed on
affected topics.
The following log can be seen on the consumer :
[2017-05-04 15:44:26,916] WARN Auto offset commit failed for group
console-consumer-75510: Commit offsets failed with retriable exception. You
should retry committing offsets.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
and on the producer:
May 04 15:44:18: 15:44:18.420 [kafka-producer-network-thread | producer-2]
ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server
disconnected before a response was received.
May 04 15:44:18: 15:44:18.435 [kafka-producer-network-thread | producer-2]
ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server
disconnected before a response was received.
May 04 15:44:18: 15:44:18.440 [kafka-producer-network-thread | producer-2]
ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server
disconnected before a response was received.
May 04 15:44:18: 15:44:18.442 [kafka-producer-network-thread | producer-2]
ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server
disconnected before a response was received.
May 04 15:44:18: 15:44:18.444 [kafka-producer-network-thread | producer-2]
ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server
disconnected before a response was received.
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch
containing 31 record(s) expired due to timeout while requesting metadata from
brokers for someTopic-0
May 04 15:44:18: 15:44:18.446 [kafka-producer-network-thread | producer-2]
ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch
containing 31 record(s) expired due to timeout while requesting metadata from
brokers for someTopic-0
May 04 15:44:18: 15:44:18.448 [kafka-producer-network-thread | producer-2]
ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch
containing 31 record(s) expired due to timeout while requesting metadata from
brokers for someTopic-0
May 04 15:44:18: 15:44:18.449 [kafka-producer-network-thread | producer-2]
ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
... will continue print those for a while
________________________
This email was scanned by Bitdefender