Re: Kafka as master data store

2016-02-17 Thread Daniel Schierbeck
I'm also very interested in using Kafka as a persistent, distributed commit log – essentially the write side of a distributed database, with the read side being an array of various query stores (Elasticsearch, Redis, whatever) and stream processing systems. The benefit of retaining data in Kafka i

Re: Kafka as master data store

2016-02-17 Thread Damian Guy
Hi Ted - if the data is keyed you can use a key compacted topic and essentially keep the data 'forever',i.e., you'll always have the latest version of the data for a given key. However, you'd still want to backup the data someplace else just-in-case. On 16 February 2016 at 21:25, Ted Swerve wrote

Re: Kafka as master data store

2016-02-16 Thread Ted Swerve
I guess I was just drawn in by the elegance of having everything available in one well-defined Kafka topic should I start up some new code. If instead the Kafka topics were on a retention period of say 7 days, that would involve firing up a topic to load the warehoused data from HDFS (or a more tr

Re: Kafka as master data store

2016-02-15 Thread Ben Stopford
Ted - it depends on your domain. More conservative approaches to long lived data protect against data corruption, which generally means snapshots and cold storage. > On 15 Feb 2016, at 21:31, Ted Swerve wrote: > > HI Ben, Sharninder, > > Thanks for your responses, I appreciate it. > > Ben

Re: Kafka as master data store

2016-02-15 Thread Ted Swerve
HI Ben, Sharninder, Thanks for your responses, I appreciate it. Ben - thanks for the tips on settings. A backup could certainly be a possibility, although if only with similar durability guarantees, I'm not sure what the purpose would be? Sharninder - yes, we would only be using the logs as forw

Re: Kafka as master data store

2016-02-15 Thread Sharninder Khera
This topic comes up often on this list. Kafka can be used as a datastore if that’s what your application wants with the caveat that Kafka isn’t designed to keep data around forever. There is a default retention time after which older data gets deleted. The high level consumer essentially reads d

Re: Kafka as master data store

2016-02-15 Thread Ben Stopford
Hi Ted This is an interesting question. Kafka has similar resilience properties to other distributed stores such as Cassandra, which are used as master data stores (obviously without the query functions). You’d need to set unclean.leader.election.enable=false and configure sufficient replicat

Kafka as master data store

2016-02-15 Thread Ted Swerve
Hello, Is it viable to use infinite-retention Kafka topics as a master data store? I'm not talking massive volumes of data here, but still potentially extending into tens of terabytes. Are there any drawbacks or pitfalls to such an approach? It seems like a compelling design, but there seem to

Kafka as master data store

2016-02-15 Thread Ted Swerve
Hello, Is it viable to use infinite-retention Kafka topics as a master data store? I'm not talking massive volumes of data here, but still potentially extending into tens of terabytes. Are there any drawbacks or pitfalls to such an approach? It seems like a compelling design, but there seem to