I would like to propose the docs change for retention.bytes
I see it's in this repo: https://github.com/apache/kafka-site
How could I get permission for opening PR or new issue?

‫בתאריך יום ג׳, 25 בפבר׳ 2025 ב-11:25 מאת ‪Brebner, Paul‬‏
<‪paul.breb...@netapp.com.invalid‬‏>:‬

> Well spotted I think – I was briefly puzzled with the time retention
> behaviour, as segments seemed to live longer than advertised – until I
> realised it was min time,  deletion is lazy – can occur at some (distant?)
> time in the future (and is async I think) – this was particularly
> noticeable for tiered storage (only time I’ve really understood how Kafka
> segments work and looked closely), Paul
>
> From: Matthias J. Sax <mj...@apache.org>
> Date: Tuesday, 25 February 2025 at 1:16 pm
> To: users@kafka.apache.org <users@kafka.apache.org>
> Subject: Re: Documentation and meaning of configuration 'retention.bytes'
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments
>
>
>
>
> I think you are right. Technically, it's a "minimum" not a "maximum".
>
> The cleanup happens async by the background log-cleaner thread. Segments
> which go beyond the "retention.bytes" config can be removed.
>
> I think it's just a difference between "technically correct" (ie,
> engineering / nerd language) and "regular English", ie, how normal
> people speak.
>
> I regular English one would say, "I limit the size to 1GB", even if 1GB
> is not a strict limit (never larger then 1GB), but technically a lower
> bound.
>
>
> > I would appreciate if you could fix and clarify that in the
> documentation.
>
>
> Feel free to open a PR for it :)
>
>
>
>
> -Matthias
>
>
> On 2/23/25 10:59 AM, אורי אהרוני wrote:
> > Hi,
> > I encountered a misunderstanding and I would like you to explain it to me
> > or if possible change the documentation.
> >
> > The Kafka docs describes 'retention.bytes' configuration as:
> > This configuration controls the maximum size a partition (which consists
> of
> > log segments) can grow to before we will discard old log segments to free
> > up space if we are using the "delete" retention policy
> >
> > Unfortunately I didn't fully understand the meaning of this field.
> > I interpret that as once a log segment reaches the 'retention.bytes'
> field
> > - old segments will be deleted.
> > But for my understanding it is not the situation because like
> > retention.hours I believe it is a guarantee for the (minimum) size of
> bytes
> > will be left for a partition.
> >
> > I will give an example for the differences:
> > An example from IBM:
> > A topic with retention.bytes of 1 GB, and with a log segment size of 512
> MB:
> >
> > With one partition, it would reserve about 1.5 GB of storage.
> > In this case, the reserved size is significantly larger than the
> retention
> > size.
> >
> > In this example, there's a guarantee that our topic size won't be LESS
> THAN
> > 1 GB.
> > But from the docs I expect that once the topic reaches 1GB (or a bit
> more),
> > old segments will be deleted.
> > In this example I would expect that when it reaches 1 GB, a segment will
> be
> > automatically deleted and so the partition will be approximately 1 GB and
> > not 1.5 GB as said.
> >
> > My question is if I understood correctly the definition of the field.
> > If not - I would be happy if you could explain what I missed.
> > If I'm correct that the definition is not well explained, I would
> > appreciate if you could fix and clarify that in the documentation.
> > Thanks,
> > Ori.
> >
>


-- 
*Ori Aharoni*

Reply via email to