Hi Ted This is an interesting question.
Kafka has similar resilience properties to other distributed stores such as Cassandra, which are used as master data stores (obviously without the query functions). You’d need to set unclean.leader.election.enable=false and configure sufficient replication to get good resiliency. One objection to doing this would be that the majority of Kafka usage is for transitory data. This is fair and I’ve not seen Kafka used as a master data store per se. I have seen it used for reliable messaging, which means not losing data and hence requires similar properties. Certainly there is nothing I can think of that would suggest Kafka would be any worse than other distributed data stores, but to further mitigate concerns, you could use Connect to create a backup in HDFS, SAN etc. All the best B > On 15 Feb 2016, at 08:56, Ted Swerve <ted.swe...@gmail.com> wrote: > > Hello, > > Is it viable to use infinite-retention Kafka topics as a master data > store? I'm not talking massive volumes of data here, but still potentially > extending into tens of terabytes. > > Are there any drawbacks or pitfalls to such an approach? It seems like a > compelling design, but there seem to be mixed messages about its > suitability for this kind of role. > > Regards, > Ted