I would like to read an answer to this question as well. This is a similar 
architecture as I am planning. Dealing with secondary data store for old 
messages would indeed make things complicated.

Clark Haskins wrote that the partition size is limited by machines capacity (I 
assume disk space): 
https://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3ce7b3c4a4-bb72-43f2-8848-9e09d0dcb...@kafka.guru%3E
 
<https://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3ce7b3c4a4-bb72-43f2-8848-9e09d0dcb...@kafka.guru%3E>.
 So in theory one could grow a single partition to terabytes-scale. But don’t 
take my word for it, as I have not tried it.

Cheers, Giidox


> On 09 Mar 2016, at 15:10, Daniel Schierbeck <da...@zendesk.com.INVALID> wrote:
> 
> I'm considering an architecture where Kafka acts as the primary datastore,
> with infinite retention of messages. The messages in this case will be
> domain events that must not be lost. Different downstream consumers would
> ingest the events and build up various views on them, e.g. aggregated
> stats, indexes by various properties, full text search, etc.
> 
> The important bit is that I'd like to avoid having a separate datastore for
> long-term archival of events, since:
> 
> 1) I want to make it easy to spin up new materialized views based on past
> events, and only having to deal with Kafka is simpler.
> 2) Instead of having some sort of two-phased import process where I need to
> first import historical data and then do a switchover to the Kafka topics,
> I'd rather just start from offset 0 in the Kafka topics.
> 3) I'd like to be able to use standard tooling where possible, and most
> tools for ingesting events into e.g. Spark Streaming would be difficult to
> use unless all the data was in Kafka.
> 
> I'd like to know if anyone here has tried this use case. Based on the
> presentations by Jay Kreps and Martin Kleppmann I would expect that someone
> had actually implemented some of the ideas they're been pushing. I'd also
> like to know what sort of problems Kafka would pose for long-term storage –
> would I need special storage nodes, or would replication be sufficient to
> ensure durability?
> 
> Daniel Schierbeck
> Senior Staff Engineer, Zendesk

Reply via email to