Hi I'm new to Kafka and am looking to set up a 3 node cluster in our data centers to handle local log traffic. Currently, our traffic is sent via UDP so we can tolerate some message loss. What I can't tolerate is a complete loss of the Kafka cluster because we would quickly fill up our UDP buffers and start dropping message en mass.
After testing some configurations in AWS, I believe a 3 node (4cpu 16GB) cluster is sufficient to handle our peak messaging rate. I was planning on running both Kafka and Zookeeper instances on these nodes. I am concerned, however, that my setup can only tolerate a single broker outage. I experimented a bit and found that if I managed Zookeeper separately (like in a container setup), and I set parameters such as min.insync.replicas and unclean leader election, I was able to keep Kafka running and processing with enough throughput to keep our system afloat. My questions are: - Although this can be done, should I? - With a setup like this, is it ever possible for Kafka to develop a split-brain situation, or is that impossible because Zookeeper is running an must maintain quorum?