> the cold storage is eventually consistent with Kafka Correct, and yes this arrangement is not 100% robust, but no system is. At least in my experience, this DR plan has been effective at rebuilding and fixing data stores downstream of Kafka, which I submit is a more common issue than total loss of ISRs.
And note that Kafka's internal replication is already a mechanism to deal with HW failure within Kafka itself. > concerning the likely disaster scenario of bad messages.... This could result in an enormous amount of bad messages and bad state in cold storage. Yep. To deal with this, I've set up a flow that performs log-compaction on the ETL'd logs in HDFS, s.t. we have each event recorded for a limited number of days, but the compacted key-value pairs stored indefinitely. With this in place, DR involves re-ingesting the compacted logs, not the entire event stream. And even though this results in the re-ingested data being appended to the end of the logs, they are then compacted themselves, which results in no net difference in storage space. I built a system that has survived this operation dozens of times. Admittedly this is a lot of moving parts, but this has been my approach. > Does the statement “Don't rely on unlimited retention in Kafka” imply some flaw in the implementation...? This reflects my opinion and experience, not an innate problem with Kafka. My opinion is based on the fact that Kafka is not directly indexable, searchable, or mutable, which makes it hard to trust as a source-of-truth or system-of-record on its own. Contrast that with e.g. HDFS + Spark. Generally, you'll have other data sources upstream and downstream of Kafka, and these are often more viable for long-term storage. In a post-GDPR world, Kafka for long-term storage makes even less sense. Consider if a user asks you to delete their data from your servers. How do you find and delete these individual records from an append-only log? Instead, you can keep data in Kafka for a limited time, ETL to some other data store, and then purge data from there as needed. Again, this is just how I've set things up. I'm also interested to hear other solutions. Ryanne On Tue, Sep 4, 2018 at 9:10 AM Henning Røigaard-Petersen <h...@edlund.dk> wrote: > Thank you for your answer Ryanne. It’s always a pleasure to be presented > with such unambiguous advice. It really gives you something to work with > :). > > To any other readers, I am very interested in hearing of other approaches > to DR in Kafka. > > Ryanne, I agree with your statement as to the probability difference of > the different DR scenarios, and I get in principle how your approach would > allow us to recover from “bad” messages, but we must of course ensure that > we have counter measures for all the scenarios. > > To that end, I have a couple of questions to your approach to DR. > > Q1) > It is my understanding that you produce messages to Kafka partitions using > the normal producer API and then subsequently ETL them to some cold storage > using one or more consumers, i.e. the cold storage is eventually consistent > with Kafka!? > > If this is true, isn’t your approach prone to the same loss-of-tail issues > as regular multi cluster replication in case of total ISR loss? That is, we > may end up with an inconsistent cold storage, because downstream messages > may be backed up before the corresponding upstream messages are backed up? > > I guess some ways around this would be to have only one partition (not > feasible) or to store state changes directly to other storage and ETL those > changes back to Kafka for downstream consummation. However, I believe that > is not what you are suggesting. > > Q2) > I am unsure how you approach should work in practice, concerning the > likely disaster scenario of bad messages. > > Assume a bad message is produced and ETL’ed to the cold storage. > > As an isolated message, we could simply wipe the Kafka partition and > reproduce all relevant messages or compact the bad message with a newer > version. This all makes sense. > However, more likely, it will not be an isolated bad message, but rather a > plethora of downstream consumers will process it and in turn produce > derived bad messages, which are further processed downstream. This could > result in an enormous amount of bad messages and bad state in cold storage. > > How would you recover in this case? > > It might be possible to iterate through the entirety of the state to > detect bad messages, but updating with the correct data seems impossible. > > I guess one very crude fallback solution may be to identify the root bad > message, and somehow restore to a previous consistent state for the entire > system. This however, requires some global message property across the > entire system. You mention Timestamps, but traditionally these are > intrinsically unreliable, especially in a distributed environment, and will > most likely lead to loss of messages with timestamps close to the root bad > message. > > Q3) > Does the statement “Don't rely on unlimited retention in Kafka” imply some > flaw in the implementation, or is it simply a reference to the advice of > not using Kafka as Source of Truth due to the DR issues? > > Thank you for your time > > Henning Røigaard-Petersen > > -----Original Message----- > From: Ryanne Dolan <ryannedo...@gmail.com> > Sent: 3. september 2018 20:27 > To: users@kafka.apache.org > Subject: Re: Official Kafka Disaster Recovery is insufficient - > Suggestions needed > > Sorry to have misspelled your name Henning. > > On Mon, Sep 3, 2018, 1:26 PM Ryanne Dolan <ryannedo...@gmail.com> wrote: > > > Hanning, > > > > In mission-critical (and indeed GDPR-related) applications, I've ETL'd > > Kafka to a secondary store e.g. HDFS, and built tooling around > > recovering state back into Kafka. I've had situations where data is > > accidentally or incorrectly ingested into Kafka, causing downstream > > systems to process bad data. This, in my experience, is astronomically > > more likely than the other DR scenarios you describe. But my approach is > the same: > > > > - don't treat Kafka as a source-of-truth. It is hard to fix data in an > > append-only log, so we can't trust it to always be correct. > > > > - ETL Kafka topics to a read-only, append-only, indexable log e.g. in > > HDFS, and then build tooling to reingest data from HDFS back into Kafka. > > That way in the event of disaster, data can be recovered from cold > storage. > > Don't rely on unlimited retention in Kafka. > > > > - build everything around log compaction, keys, and idempotency. > > People think compaction is just to save space, but it is also the only > > way to layer new records over existing records in an otherwise > > append-only environment. I've built pipelines that let me surgically > > remove or fix records at rest and then submit them back to Kafka. > > Since these records have the same key, processors will treat them as > > replacements to earlier records. Additionally, processors should honor > > timestamps and/or sequence IDs and not the actual order of records in > > a partition. That way old data can be ingested from HDFS -> Kafka > idempotently. > > > > Imagine that one record out of millions is bad and you don't notice it > > for months. You can grab this record from HDFS, modify it, and then > > submit it back to Kafka. Even tho it is in the stream months later > > than real-time, processors will treat it as a replacement for the old > > bad record and the entire system will end up exactly as if the record > > was never bad. If you can achieve these semantics consistently, DR is > straightforward. > > > > - don't worry too much about offsets wrt consumer progress. Timestamps > > are usually more important. If the above is in place, it doesn't > > matter if you skip some records during failover. Just reprocess a few > > hours from cold storage and it's like the failure never happened. > > > > Ryanne > > > > On Mon, Sep 3, 2018, 9:49 AM Henning Røigaard-Petersen <h...@edlund.dk> > > wrote: > > > >> I am looking for advice on how to handle disasters not covered by the > >> official methods of replication, whether intra-cluster replication > >> (via replication factor and producer acks) or multi-cluster > >> replication (using Confluent Replicator). > >> > >> > >> > >> We are looking into using Kafka not only as a broker for decoupling, > >> but also as event store. > >> > >> For us, data is mission critical and has unlimited retention with the > >> option to compact (required by GDPR). > >> > >> > >> > >> We are especially interested in two types of disasters: > >> > >> 1. Unintentional or malicious use of the Kafka API to create or > >> compact messages, as well as deletions of topics or logs. > >> > >> 2. Loss of tail data from partitions if all ISR fail in the > >> primary cluster. A special case of this is the loss of the tail from > >> the commit topic, which results in loss of consumer progress. > >> > >> > >> > >> Ad (1) > >> > >> Replication does not guard against disaster (1), as the resulting > >> messages and deletions spread to the replicas. > >> > >> A naive solution is to simply secure the cluster, and ensure that > >> there are no errors in the code (…), but anyone with an ounce of > >> experience with software knows that stuff goes wrong. > >> > >> > >> > >> Ad (2) > >> > >> The potential of disaster (2) is much worse. In case of total data > >> center loss, the secondary cluster will lose the tail of every > >> partition not fully replicated. Besides the data loss itself, there > >> is now inconsistencies between related topics and partitions, which > breaks the state of the system. > >> > >> Granted, the likelihood of total center loss is not great, but there > >> is a reason we have multi-center setups. > >> > >> The special case with loss of consumer offsets results in double > >> processing of events, once we resume processing from an older offset. > >> While idempotency is a solution, it might not always be possible nor > desirable. > >> > >> > >> > >> Once any of these types of disaster has occurred, there is no means > >> to restore lost data, and even worse, we cannot restore the cluster > >> to a point in time where it is consistent. We could probably look our > >> customers in the eyes and tell them that they lost a days worth of > >> progress, but we cannot inform them of a state of perpetual > inconsistency. > >> > >> > >> > >> The only solution we can think of right now is to shut the primary > >> cluster down (ensuring that no new events are produced), and then > >> copy all files to some secure location, i.e. effectively creating a > >> backup, allowing restoration to a consistency point in time. Though > >> the tail (backup-wise) will be lost in case of disaster, we are > >> ensured a consistent state to restore to. > >> > >> As a useful side effect, such backup-files can also be used to create > >> environments for test or other destructive purposes. > >> > >> > >> > >> Does anyone have a better idea? > >> > >> > >> > >> > >> > >> Venlig hilsen / Best regards > >> > >> *Henning Røigaard-Petersen * > >> Principal Develope > <https://maps.google.com/?q=cipal+Develope&entry=gmail&source=g>r > >> MSc in Mathematics > >> > >> h...@edlund.dk > >> > >> Direct phone > >> > >> +45 36150272 <+45%2036%2015%2002%2072> > >> > >> > >> [image: https://www.edlund.dk/sites/default/files/edlundnylogo1.png] > >> <http://www.edlund.dk/> > >> > >> Edlund A/S > >> La Cours Vej 7 > >> <https://maps.google.com/?q=La+Cours+Vej+7&entry=gmail&source=g> > >> DK-2000 Frederiksberg > >> Tel +45 36150630 <+45%2036%2015%2006%2030> > >> www.edlund.dk > >> > > >