Hi guys, When broker goes down due to restart, running and new consumers starting to fail with: [Consumer clientId=consumer-42169, groupId=] Connection to node 67108872 could not be established. Broker may not be available. [Consumer clientId=consumer-46213, groupId=] Connection to node -4 could not be established. Broker may not be available. It continues until the broker is up and running.
After broker goes up, producer is not able to publish messages: Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 5000 ms. Recreating producer helps in that case. Moreover, if I restart only one broker I see everything described above and cluster comes back to normal state. If i do rolling restart, Kafka consumer is not able to recover from it ('...Failed to update metadata...') and I have to restart machines with consumers completely. Along the way, I continuously see: INFO Created a new error FetchContext for session id 2070234568: no such session ID found. INFO [ReplicaFetcher replicaId=67108868, leaderId=67108871, fetcherId=2] Node 67108871 was unable to process the fetch request with (sessionId=1942914170, epoch=8716): FETCH_SESSION_ID_NOT_FOUND. (org.apache.kafka.clients.FetchSessionHandler) Could you please clarify what wrong is here, and how I can improve the cluster state, when restarting one broker? I have noticed this behaviour for version 1.1.1, previously running 0.10.1.1 was switching leadership just fine and metadata was updated in time. Cluster is 6 nodes with replication factor of 3, min insync replicas of 2 and acks is all. -- Thanks, Andrey