Partitions being limited by disk size is no different from e.g. a SQL
store. This would not be used for extremely high throughput. If,
eventually, there was a good case for not requiring that an entire
partition must be stored on a single machine, it would be possible to use
the log segments for distribution.

On Mon, Mar 14, 2016 at 9:29 AM Giidox <a...@marmelandia.com> wrote:

> I would like to read an answer to this question as well. This is a similar
> architecture as I am planning. Dealing with secondary data store for old
> messages would indeed make things complicated.
>
> Clark Haskins wrote that the partition size is limited by machines
> capacity (I assume disk space):
> https://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3ce7b3c4a4-bb72-43f2-8848-9e09d0dcb...@kafka.guru%3E
> <https://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3ce7b3c4a4-bb72-43f2-8848-9e09d0dcb...@kafka.guru%3E>
>  <
> https://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3ce7b3c4a4-bb72-43f2-8848-9e09d0dcb...@kafka.guru%3E
> <https://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3ce7b3c4a4-bb72-43f2-8848-9e09d0dcb...@kafka.guru%3E>>.
>  So in theory one
> could grow a single partition to terabytes-scale. But don’t take my word
> for it, as I have not tried it.
>
> Cheers, Giidox
>
>
>
> > On 09 Mar 2016, at 15:10, Daniel Schierbeck <da...@zendesk.com.INVALID>
> wrote:
> >
> > I'm considering an architecture where Kafka acts as the primary
> datastore,
> > with infinite retention of messages. The messages in this case will be
> > domain events that must not be lost. Different downstream consumers would
> > ingest the events and build up various views on them, e.g. aggregated
> > stats, indexes by various properties, full text search, etc.
> >
> > The important bit is that I'd like to avoid having a separate datastore
> for
> > long-term archival of events, since:
> >
> > 1) I want to make it easy to spin up new materialized views based on past
> > events, and only having to deal with Kafka is simpler.
> > 2) Instead of having some sort of two-phased import process where I need
> to
> > first import historical data and then do a switchover to the Kafka
> topics,
> > I'd rather just start from offset 0 in the Kafka topics.
> > 3) I'd like to be able to use standard tooling where possible, and most
> > tools for ingesting events into e.g. Spark Streaming would be difficult
> to
> > use unless all the data was in Kafka.
> >
> > I'd like to know if anyone here has tried this use case. Based on the
> > presentations by Jay Kreps and Martin Kleppmann I would expect that
> someone
> > had actually implemented some of the ideas they're been pushing. I'd also
> > like to know what sort of problems Kafka would pose for long-term
> storage –
> > would I need special storage nodes, or would replication be sufficient to
> > ensure durability?
> >
> > Daniel Schierbeck
> > Senior Staff Engineer, Zendesk
>
>

Reply via email to