Hey Ben, Thank you for the heads up. I must have misunderstood the docs...my reading was that a commit doesn't complete until all the in-sync replicas (in this case, 1) have acknowledged committing to the log. Would I be incorrect in extrapolating that if an entire data center lost power, a 3+ node cluster would also have the same risk of data loss? It sounds like the old argument about MongoDB's lack of single server durability.
Zookeeper in general is kind of a pain as extra clockwork...its always nice deployment-wise when distributed software doesn't require dependencies on 3rd party distributed software. -J On Sun, Jan 17, 2016 at 2:29 PM, Ben Stopford <b...@confluent.io> wrote: > Jason > > Don’t forget that Kafka relies on redundant replicas for fault tolerance > rather than disk persistence, so your single instances might lose messages > straight out of the box if they’re not terminated cleanly. You could set > flush.messages to 1 though. Don’t forget about Zookeeper either. That has > to go somewhere. > > For what it’s worth I’ve seen one installation move away from this type of > pattern as it was a little painful to manage. Your milage may vary though. > But you’re certainly not alone with wanting to do something like this. > There is a buffering producer on the roadmap, although it may end up being > a slightly different thing. > > B > > > > On 16 Jan 2016, at 00:12, Jason J. W. Williams < > jasonjwwilli...@gmail.com> wrote: > > > > Hey Luke, > > > > Thank you for the reply and encouragement. I'm going to start hacking on > a > > small PoC. > > > > -J > > > > On Fri, Jan 15, 2016 at 12:01 PM, Luke Steensen < > > luke.steen...@braintreepayments.com> wrote: > > > >> Not an expert, but that sounds like a very reasonable use case for > Kafka. > >> The log.retention.* configs on the edge brokers should cover your TTL > >> needs. > >> > >> > >> On Thu, Jan 14, 2016 at 3:37 PM, Jason J. W. Williams < > >> jasonjwwilli...@gmail.com> wrote: > >> > >>> Hello, > >>> > >>> We historically have been a RabbitMQ environment, but we're looking at > >>> using Kafka for a new project and I'm wondering if the following > >>> topology/setup would work well in Kafka (for RMQ we'd use federation): > >>> > >>> * Multiple remote datacenters consisting each of a single server > running > >> an > >>> HTTP application that receives client data and generates events. Each > >>> server would also run single-node Kafka "cluster". The application > would > >>> write events as messages into the single-node Kafka "cluster" running > on > >>> the same machine. > >>> * A hub datacenter that the remote data centers are connected to via > SSL. > >>> The hub data center would run a multi-node Kafka cluster (3 nodes). > >>> * Use mirrormaker in the hub data center to mirror event messages from > >> each > >>> of the remote single-node servers into the hub's central Kafka cluster, > >>> where all of the real consumers are listening. > >>> > >>> The problem set is each of the remote servers is collecting data from > >>> customers over HTTP and returning responses, but those remote servers > are > >>> also generating events from those customer interactions. We want to > >> publish > >>> those events into a central hub data center for analytics. We want the > >>> event messages at the remote servers to queue up when their network > >>> connections to the hub data center is unreliable, and automatically > relay > >>> queued messages to the hub data center when the network comes > >> back...making > >>> the event relay system tolerant to WAN network faults. We'd also want > to > >>> set up some kind of TTL on queued messages, so if the WAN connection to > >> the > >>> hub is down for an extended period of time, the messages queued on the > >>> remote servers don't build up infinitely. > >>> > >>> Any thoughts on if this setup is advisable/inadvisable with Kafka (or > any > >>> other thoughts on it) would be greatly appreciated. > >>> > >>> -J > >>> > >> > >