Thanks you all! Blake, for your comment:
It'll require having a HA cluster running in another region, of course. One other caveat is that it doesn't preserve the offsets of the records -> I believe I can't afford to keep another cluster running due to cost reasons.Can you elaborate on the offset part, if offset is not preserved then how the backup cluster know where to start processing for each topic? For example, you could use a Kafka Connect s3 sink. You'd have to write some disaster-recovery code to restore lost data from s3 into Kafka. -> again here the same question, does s3 also store offset for each topic as it is modified in kafka? If not then when the back is restored back into kafka cluster, how it will know where to process each topic from? On Sat, Mar 6, 2021 at 4:44 PM Himanshu Shukla <himanshushukla...@gmail.com> wrote: > Hi Pushkar, > > you could also look at the available Kafka-connect plugins. It provides > many connectors which could be leveraged to move the data in/out from > Kafka. > > On Sat, Mar 6, 2021 at 10:18 AM Blake Miller <blak3mil...@gmail.com> > wrote: > > > MirrorMaker is one reasonable way to do this, certainly it can replicate > to > > another region, with most of the latency being the unavoidable kind, if > you > > give it enough resources. > > > > It'll require having a HA cluster running in another region, of course. > One > > other caveat is that it doesn't preserve the offsets of the records. > That's > > probably okay for your use-case, but you should be aware of it. > > > > Since what you want is a backup, there are many ways to do that which > might > > be cheaper than another Kafka cluster. > > > > For example, you could use a Kafka Connect s3 sink. You'd have to write > > some disaster-recovery code to restore lost data from s3 into Kafka. > > > > https://www.confluent.io/blog/apache-kafka-to-amazon-s3-exactly-once/ > > > > There are many other sinks available, but s3 might be a reasonable choice > > for backup. It's inexpensive and reliable. > > > > On Fri, Mar 5, 2021, 2:48 AM Pushkar Deole <pdeole2...@gmail.com> wrote: > > > > > Yes.. so the requirement for me is to have data backed up or replicated > > in > > > a different 'region' to cater for disaster scenarios and recover from > > them > > > > > > On Fri, Mar 5, 2021 at 3:01 PM Ran Lupovich <ranlupov...@gmail.com> > > wrote: > > > > > > > I guess that in case of avoiding data lose you would need to use 3 > > > replica > > > > in different rack/sites awareness to avoid data lose, Confluent's > > > > Replicator or MirrorMaker are for copying data from one cluster to > > > another > > > > usually in different dc / regions, If I am not mistaken > > > > > > > > בתאריך יום ו׳, 5 במרץ 2021, 11:21, מאת Pushkar Deole < > > > > pdeole2...@gmail.com > > > > >: > > > > > > > > > Thanks Luke... is the mirror maker asynchronous? What will be > typical > > > lag > > > > > between the replicated cluster and running cluster and in case of > > > > disaster, > > > > > what are the chances of data loss? > > > > > > > > > > On Fri, Mar 5, 2021 at 11:37 AM Luke Chen <show...@gmail.com> > wrote: > > > > > > > > > > > Hi Pushkar, > > > > > > MirrorMaker is what you're looking for. > > > > > > ref: > > > > https://kafka.apache.org/documentation/#georeplication-mirrormaker > > > > > > > > > > > > Thanks. > > > > > > Luke > > > > > > > > > > > > On Fri, Mar 5, 2021 at 1:50 PM Pushkar Deole < > pdeole2...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > I was looking for some options to backup a running kafka > cluster, > > > for > > > > > > > disaster recovery requirements. Can someone provide what are > the > > > > > > available > > > > > > > options to backup and restore a running cluster in case the > > entire > > > > > > cluster > > > > > > > goes down? > > > > > > > > > > > > > > Thanks.. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Regards, > Himanshu Shukla >