I'm also very interested in using Kafka as a persistent, distributed commit log – essentially the write side of a distributed database, with the read side being an array of various query stores (Elasticsearch, Redis, whatever) and stream processing systems.
The benefit of retaining data in Kafka indefinitely is the easy with which it's possible to bootstrap new read-side technologies. I really feel that there should be a standardized Kafka configuration optimized for this case, with long-term durability in mind. On Tue, Feb 16, 2016 at 10:26 PM Ted Swerve <ted.swe...@gmail.com> wrote: > I guess I was just drawn in by the elegance of having everything available > in one well-defined Kafka topic should I start up some new code. > > If instead the Kafka topics were on a retention period of say 7 days, that > would involve firing up a topic to load the warehoused data from HDFS (or a > more traditional load), and then switching over to the live topic? > > On Tue, Feb 16, 2016 at 8:32 AM, Ben Stopford <b...@confluent.io> wrote: > > > Ted - it depends on your domain. More conservative approaches to long > > lived data protect against data corruption, which generally means > snapshots > > and cold storage. > > > > > > > On 15 Feb 2016, at 21:31, Ted Swerve <ted.swe...@gmail.com> wrote: > > > > > > HI Ben, Sharninder, > > > > > > Thanks for your responses, I appreciate it. > > > > > > Ben - thanks for the tips on settings. A backup could certainly be a > > > possibility, although if only with similar durability guarantees, I'm > not > > > sure what the purpose would be? > > > > > > Sharninder - yes, we would only be using the logs as forward-only > > streams - > > > i.e. picking an offset to read from and moving forwards - and would be > > > setting retention time to essentially infinite. > > > > > > Regards, > > > Ted. > > > > > > On Tue, Feb 16, 2016 at 5:05 AM, Sharninder Khera < > sharnin...@gmail.com> > > > wrote: > > > > > >> This topic comes up often on this list. Kafka can be used as a > datastore > > >> if that’s what your application wants with the caveat that Kafka isn’t > > >> designed to keep data around forever. There is a default retention > time > > >> after which older data gets deleted. The high level consumer > essentially > > >> reads data as a stream and while you can do sort of random access with > > the > > >> low level consumer, its not ideal. > > >> > > >> > > >> > > >>> On 15-Feb-2016, at 10:26 PM, Ted Swerve <ted.swe...@gmail.com> > wrote: > > >>> > > >>> Hello, > > >>> > > >>> Is it viable to use infinite-retention Kafka topics as a master data > > >>> store? I'm not talking massive volumes of data here, but still > > >> potentially > > >>> extending into tens of terabytes. > > >>> > > >>> Are there any drawbacks or pitfalls to such an approach? It seems > > like a > > >>> compelling design, but there seem to be mixed messages about its > > >>> suitability for this kind of role. > > >>> > > >>> Regards, > > >>> Ted > > >> > > >> > > > > >