Re: Topic retention configuration clarification

Tom Crayford Mon, 02 May 2016 09:23:55 -0700

Hi Lawrence,

Why do you need so much retention? We've generally found that all use of
Kafka that wants really long retention (e.g. for compliance or replay
reasons) are better served by consuming from the topic and putting the data
on S3 (or some other longer term storage) for anything beyond a few days of
retention (LinkedIn use 4, the Kafka default is 7).
https://github.com/pinterest/secor is a good sample project that does this.

Storing long term data in Kafka is generally a pretty bad idea, because
it's really not designed for it. A big part of that is failure handling. If
a broker goes down and another broker has to catch up from the replicas,
that could mean transferring terabytes across the network. For example,
LinkedIn keep about 25-40GB of retention in a partition for 4 days. If you
multiply that out to a year, that's ~3.6 TB on each partition. Considering
a single failing broker can and could have many partitions, things will be
extremely problematic.

Thanks

Tom Crayford,
Heroku Kafka

On Mon, May 2, 2016 at 8:42 AM, Lawrence Weikum <lwei...@pandora.com> wrote:

> Using 0.9.0.1.
>
> I'm building a new topic that should keep data for much longer than the
> brokers'  default, say at least a year, before deleting messages.
> http://kafka.apache.org/documentation.html says setting the "retention.ms"
> for the topic will adjust the time, but I cannot find out what unit of time
> Kafka uses for this.  "ms" would suggest "milliseconds", so a year would be
> about 3.154e+13 milliseconds.  This seems like an uncomforatably-high
> number to give.
>
> Can anyone else confirm this time unit for "retention.ms" for the topic
> config is in milliseconds?  Is there also a "retention.minutes" that's just
> undocumented?
>
> Thanks!
>
>
> Lawrence Weikum
>

Re: Topic retention configuration clarification

Reply via email to