A couple of things: - Compacted topics provide a useful way to retain meaningful datasets inside the broker, which don’t grow indefinitely. If you have an update-in-place use case, where the event sourced approach doesn’t buy you much, these will keep the reload time down when you regenerate materialised views. - When going down the master data store route a few different problems may conflate. Disaster recovery, historic backups, regenerating data in non production environments.
B > On 14 Mar 2016, at 09:56, Jens Rantil <jens.ran...@tink.se> wrote: > > This is definitely an interesting use case. However, you need to be aware > that changing the broker topology won't rebalance the preexisting data from > the previous brokers. That is, you risk loosing data. > > Cheers, > Jens > > On Wed, Mar 9, 2016 at 2:10 PM Daniel Schierbeck <da...@zendesk.com.invalid> > wrote: > >> I'm considering an architecture where Kafka acts as the primary datastore, >> with infinite retention of messages. The messages in this case will be >> domain events that must not be lost. Different downstream consumers would >> ingest the events and build up various views on them, e.g. aggregated >> stats, indexes by various properties, full text search, etc. >> >> The important bit is that I'd like to avoid having a separate datastore for >> long-term archival of events, since: >> >> 1) I want to make it easy to spin up new materialized views based on past >> events, and only having to deal with Kafka is simpler. >> 2) Instead of having some sort of two-phased import process where I need to >> first import historical data and then do a switchover to the Kafka topics, >> I'd rather just start from offset 0 in the Kafka topics. >> 3) I'd like to be able to use standard tooling where possible, and most >> tools for ingesting events into e.g. Spark Streaming would be difficult to >> use unless all the data was in Kafka. >> >> I'd like to know if anyone here has tried this use case. Based on the >> presentations by Jay Kreps and Martin Kleppmann I would expect that someone >> had actually implemented some of the ideas they're been pushing. I'd also >> like to know what sort of problems Kafka would pose for long-term storage – >> would I need special storage nodes, or would replication be sufficient to >> ensure durability? >> >> Daniel Schierbeck >> Senior Staff Engineer, Zendesk >> > -- > > Jens Rantil > Backend Developer @ Tink > > Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden > For urgent matters you can reach me at +46-708-84 18 32.