Hey Ben,

Thank you for the heads up. I must have misunderstood the docs...my reading
was that a commit doesn't complete until all the in-sync replicas (in this
case, 1) have acknowledged committing to the log. Would I be incorrect in
extrapolating that if an entire data center lost power, a 3+ node cluster
would also have the same risk of data loss? It sounds like the old argument
about MongoDB's lack of single server durability.


Zookeeper in general is kind of a pain as extra clockwork...its always nice
deployment-wise when distributed software doesn't require dependencies on
3rd party distributed software.

-J

On Sun, Jan 17, 2016 at 2:29 PM, Ben Stopford <b...@confluent.io> wrote:

> Jason
>
> Don’t forget that Kafka relies on redundant replicas for fault tolerance
> rather than disk persistence, so your single instances might lose messages
> straight out of the box if they’re not terminated cleanly. You could set
> flush.messages to 1 though. Don’t forget about Zookeeper either. That has
> to go somewhere.
>
> For what it’s worth I’ve seen one installation move away from this type of
> pattern as it was a little painful to manage. Your milage may vary though.
> But you’re certainly not alone with wanting to do something like this.
> There is a buffering producer on the roadmap, although it may end up being
> a slightly different thing.
>
> B
>
>
> > On 16 Jan 2016, at 00:12, Jason J. W. Williams <
> jasonjwwilli...@gmail.com> wrote:
> >
> > Hey Luke,
> >
> > Thank you for the reply and encouragement. I'm going to start hacking on
> a
> > small PoC.
> >
> > -J
> >
> > On Fri, Jan 15, 2016 at 12:01 PM, Luke Steensen <
> > luke.steen...@braintreepayments.com> wrote:
> >
> >> Not an expert, but that sounds like a very reasonable use case for
> Kafka.
> >> The log.retention.* configs on the edge brokers should cover your TTL
> >> needs.
> >>
> >>
> >> On Thu, Jan 14, 2016 at 3:37 PM, Jason J. W. Williams <
> >> jasonjwwilli...@gmail.com> wrote:
> >>
> >>> Hello,
> >>>
> >>> We historically have been a RabbitMQ environment, but we're looking at
> >>> using Kafka for a new project and I'm wondering if the following
> >>> topology/setup would work well in Kafka (for RMQ we'd use federation):
> >>>
> >>> * Multiple remote datacenters consisting each of a single server
> running
> >> an
> >>> HTTP application that receives client data and generates events. Each
> >>> server would also run single-node Kafka "cluster". The application
> would
> >>> write events as messages into the single-node Kafka "cluster" running
> on
> >>> the same machine.
> >>> * A hub datacenter that the remote data centers are connected to via
> SSL.
> >>> The hub data center would run a multi-node Kafka cluster (3 nodes).
> >>> * Use mirrormaker in the hub data center to mirror event messages from
> >> each
> >>> of the remote single-node servers into the hub's central Kafka cluster,
> >>> where all of the real consumers are listening.
> >>>
> >>> The problem set is each of the remote servers is collecting data from
> >>> customers over HTTP and returning responses, but those remote servers
> are
> >>> also generating events from those customer interactions. We want to
> >> publish
> >>> those events into a central hub data center for analytics. We want the
> >>> event messages at the remote servers to queue up when their network
> >>> connections to the hub data center is unreliable, and automatically
> relay
> >>> queued messages to the hub data center when the network comes
> >> back...making
> >>> the event relay system tolerant to WAN network faults. We'd also want
> to
> >>> set up some kind of TTL on queued messages, so if the WAN connection to
> >> the
> >>> hub is down for an extended period of time, the messages queued on the
> >>> remote servers don't build up infinitely.
> >>>
> >>> Any thoughts on if this setup is advisable/inadvisable with Kafka (or
> any
> >>> other thoughts on it) would be greatly appreciated.
> >>>
> >>> -J
> >>>
> >>
>
>

Reply via email to