Hi Andrew, > We could use LVS or some other load balancer/proxy for the Kafka > connections, and automatically switch between clusters based on > availability. But, what would this do to live producers and their > metadata? Would they be able to handle a total switch of cluster > metadata?
This should be fine - if your remote DC Kafka broker goes down, the producer should re-issue metadata requests through the load balancer which (based on my understanding of your topology) should then go to the main DC's Kafka cluster. The producer will then establish connections to the main DC's brokers for subsequent sends. (I recall from earlier in the list that you are using librdkafka - it should be doing something similar though.) I'm a bit unclear on your set up - by non-HA broker do you mean non-HA by virtue of it being a single broker with no replication? You would still need to get it registered in a ZooKeeper cluster right? Also, where are the events going to be ultimately consumed? I'm assuming in the main DC - in which case you would anyway need to ship your Kafka logs from the remote DC to the main DC correct? Joel On Wed, Nov 27, 2013 at 12:47:01PM -0500, Andrew Otto wrote: > Wikimedia is close to using Kafka to collect webrequest access logs from > multiple data centers. I know that MirrorMaker is the recommended way to do > cross-DC Kafka, but this is a lot of overhead for our remote DCs. To set up > a highly available Kafka Cluster, we need to add a few more nodes in each DC > (brokers and zookeepers). Our remote DCs are used mainly for frontend web > caching, and we'd like to keep them that way. We don't want to have to add > multiple nodes to each DC just for log delivery. > > We are attempting to produce messages from the remote DCs directly to our > main DC's Kafka cluster, but we are worried about data loss during potential > times of high latency or link packet loss (we actually had this problem last > weekend). Most of the time this works, but it isn't reliable. > > Would it be possible to somehow set up a single non-HA Kafka Broker in our > remote DC and produce to that, but then failover to cross-DC production to > our main DC Kafka Cluster? > > We could use LVS or some other load balancer/proxy for the Kafka connections, > and automatically switch between clusters based on availability. But, what > would this do to live producers and their metadata? Would they be able to > handle a total switch of cluster metadata? > > Thanks! > -Andrew Otto > > > > >