I'm also very interested in using Kafka as a persistent, distributed commit
log – essentially the write side of a distributed database, with the read
side being an array of various query stores (Elasticsearch, Redis,
whatever) and stream processing systems.
The benefit of retaining data in Kafka i
Hi Ted - if the data is keyed you can use a key compacted topic and
essentially keep the data 'forever',i.e., you'll always have the latest
version of the data for a given key. However, you'd still want to backup
the data someplace else just-in-case.
On 16 February 2016 at 21:25, Ted Swerve wrote
I guess I was just drawn in by the elegance of having everything available
in one well-defined Kafka topic should I start up some new code.
If instead the Kafka topics were on a retention period of say 7 days, that
would involve firing up a topic to load the warehoused data from HDFS (or a
more tr
Ted - it depends on your domain. More conservative approaches to long lived
data protect against data corruption, which generally means snapshots and cold
storage.
> On 15 Feb 2016, at 21:31, Ted Swerve wrote:
>
> HI Ben, Sharninder,
>
> Thanks for your responses, I appreciate it.
>
> Ben
HI Ben, Sharninder,
Thanks for your responses, I appreciate it.
Ben - thanks for the tips on settings. A backup could certainly be a
possibility, although if only with similar durability guarantees, I'm not
sure what the purpose would be?
Sharninder - yes, we would only be using the logs as forw
This topic comes up often on this list. Kafka can be used as a datastore if
that’s what your application wants with the caveat that Kafka isn’t designed to
keep data around forever. There is a default retention time after which older
data gets deleted. The high level consumer essentially reads d
Hi Ted
This is an interesting question.
Kafka has similar resilience properties to other distributed stores such as
Cassandra, which are used as master data stores (obviously without the query
functions). You’d need to set unclean.leader.election.enable=false and
configure sufficient replicat
Hello,
Is it viable to use infinite-retention Kafka topics as a master data
store? I'm not talking massive volumes of data here, but still potentially
extending into tens of terabytes.
Are there any drawbacks or pitfalls to such an approach? It seems like a
compelling design, but there seem to
Hello,
Is it viable to use infinite-retention Kafka topics as a master data
store? I'm not talking massive volumes of data here, but still potentially
extending into tens of terabytes.
Are there any drawbacks or pitfalls to such an approach? It seems like a
compelling design, but there seem to