It seems the producer has been designed to be initialized and pointed at one Kafka cluster only.
Is it not possible to change the Kafka cluster (i.e. use a new value for topic metadata and force a re-initialization) of an initialized producer? If I want the producer start sending to region #2 (Kafka cluster #2) upon connection failure to region #1, it seems I should (based on my understanding of how Kafka works so far): - create two producers using "sync" mode because "async" mode will batch messages and upon failure, the pending messages cannot be sent. This is because a producer cannot be re-directed to a different Kafka cluster and there does not appear to be a way to copy the pending batched messages to a new producer. - Batch pending messages in the parent producer application, alternating between producer1 and producer2 as needed (each in sync mode) utilizing request.required.acks = 0 (I assume connection failures still throw exceptions) and send(messages: KeyedMessage[K,V]*) in hopes of achieving async-like performance Thanks! On Sun, Jun 23, 2013 at 7:11 AM, Ran RanUser <ranuse...@gmail.com> wrote: > Offset preserving mirroring would be a great addition, allowing for > offsite backups which closely match production. It would be much cleaner > than running rsync repeatedly. > > Regarding the broader discussion of maximizing availability while > minimizing operational complexity, I've been considering the following > (please feel free to share your thoughts): > > - multi-datacenter is ideal since a whole range of outage problems can > occur at rack-level or datacenter-level (power, network, natural disaster) > > - consider avoiding or augmenting replication since it's intended for > same-datacenter deployment > > - deploy Kafka in two datacenters with identical brokers and replication > factor 1 > > - producers will send to one broker, if connection exception is thrown, > then send to the other > > - consumers will implement SimpleConsumer and read from both broker pairs > > If the application layer can deal with producing and consuming from Kafka > pairs in this way, it seems to me you get multi-region backed availability > with less instances/moving parts ? > > > Thanks! > > > > > On Tue, Jun 18, 2013 at 4:22 AM, Jun Rao <jun...@gmail.com> wrote: > >> We can look into offset preserving mirroring the the future. Note that >> even >> with this approach, the offsets in the target cluster will be slightly >> behind those in the source cluster since the mirroring will be async. >> Since >> not all offsets will be preserved. >> >> Thanks, >> >> Jun >> >> >> On Sun, Jun 16, 2013 at 3:02 PM, Ran RanUser <ranuse...@gmail.com> wrote: >> >> > I've been researching Kafka for our requirements and am trying to figure >> > out the best way to implement multi-region failover (lowest complexity). >> > >> > One requirement we have is that the offsets of the backup must match the >> > primary. As I understand it, MirrorMaker does not (currently) guarantee >> > that the target Kafka instance will have the same log offsets as the >> source >> > Kafka instance. Our message processing pipeline will be strictly >> relying >> > on topic-broker-partition-offset to avoid re-processing messages. >> > >> > Here's what I'm leaning towards, please share any crticism or thoughts: >> > >> > Assuming: >> > >> > - Two regions, Region1 (primary) and Region2 (backup) >> > >> > - Region2 must have the same offsets per topic-broker-partition-offset >> > state >> > >> > - A few minutes of lost messages can be tolerated if Region1 is ever >> lost. >> > >> > - That it would be a mistake to attempt Kafka replication across regions >> > and maintain a Zookeeper cluster across regions (because they weren't >> > designed for the higher latency and link-loss issues and that there >> could >> > be operational edge case bugs we won't catch/understand, etc) >> > >> > - That Region1 has multiple topics, brokers, partitions, replicas and a >> > Zookeeper cluster. Only Region1 is in use operationally (gets all >> producer >> > and consumer traffic). >> > >> > - That Region2 has the same configuration but receives no operational >> > traffic (no producers, no consumers) but gets periodic rsync from >> Region1 >> > >> > - If Region1 is lost, we will start Kafka in Region2, it should startup >> at >> > the appropriate offset (from last rysnc backup). Producers will be >> > instructed to use Region2. >> > >> > - Region2 is now the new primary Kafka instance until we decide to >> switch >> > back to Region1. >> > >> > This is quite simple and there is more data loss than I'd like, but the >> > loss would be acceptable for our use case, considering the loss of >> Region1 >> > should be a rare event (if ever). >> > >> > Questions: >> > >> > 1. Do you see any pitfalls or better ways to proceed? It seems this >> Kafka >> > feature request would be a better solution (adding a MirrorMaker mode to >> > maintain offsets https://issues.apache.org/jira/browse/KAFKA-658 ) one >> > day. >> > >> > 2. What is the Rsync backup is interrupted when Region1 is lost? Is >> there >> > the possibility the 2nd Kafka instance could be left in an un-workable >> > state? For example, if a .log file is copied, but the corresponding >> .index >> > is not completed. Can the .index file be re-created? It appears it >> can in >> > 8.1 >> > >> > >> https://issues.apache.org/jira/browse/KAFKA-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> > >> > >> > Thank you! >> > >> > >