If you want 0 data loss you should also look into the min.insync.repica setting in 0.8.2.1 as it guarantees data in multiple racks.
If you don't have that set then you have this scenario as possible. lets say 1 topic, 1 partition, replication 3. You are producing with ACK=-1 b1, b2, b3 (where b=broker and b1 is leader, b2, b3 replicas). b1,b2 dies, b3 is leader. so far all is well. 10 minutes go by and b3 dies 1 minute later b1 comes back online, it will truncate essentially 45 minutes of data upstream thought was saved. but now, you can have ACK=-1 get a failure if you don't have a enough replica to survive data loss guarantees. min.isr=2 min.sir=3 //depends on data Also take a look at https://github.com/stealthly/go_kafka_client/tree/master/mirrormaker it might be helpful for what you are looking for. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Fri, May 1, 2015 at 7:43 AM, Joong Lee <jo...@me.com> wrote: > It is based on our understanding from reading the documents. > > We aren't concerned of data duplication as that is going to be handled by > elasticsearch. > > > On May 1, 2015, at 12:15 AM, Daniel Compton < > daniel.compton.li...@gmail.com> wrote: > > > > When we evaluated MirrorMaker last year we didn't find any risk of data > > loss, only duplicate messages in the case of a network partition. > > > > Did you discover data loss in your tests, or were you just looking at the > > docs? > > On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin <j...@linkedin.com.invalid> > > wrote: > > > >> Which mirror maker version did you look at? The MirrorMaker in trunk > >> should not have data loss if you just use the default setting. > >> > >>> On 4/30/15, 7:53 PM, "Joong Lee" <jo...@me.com> wrote: > >>> > >>> Hi, > >>> We are exploring Kafka to keep two data centers (primary and DR) > running > >>> hosts of elastic search nodes in sync. One key requirement is that we > >>> can't lose any data. We POC'd use of MirrorMaker and felt it may not > meet > >>> out data loss requirement. > >>> > >>> I would like ask the community if we should look for another solution > or > >>> would Kafka be the right solution considering zero data loss > requirement. > >>> > >>> Thanks > >> > >> >