Well spotted I think – I was briefly puzzled with the time retention behaviour, 
as segments seemed to live longer than advertised – until I realised it was min 
time,  deletion is lazy – can occur at some (distant?) time in the future (and 
is async I think) – this was particularly noticeable for tiered storage (only 
time I’ve really understood how Kafka segments work and looked closely), Paul

From: Matthias J. Sax <mj...@apache.org>
Date: Tuesday, 25 February 2025 at 1:16 pm
To: users@kafka.apache.org <users@kafka.apache.org>
Subject: Re: Documentation and meaning of configuration 'retention.bytes'
EXTERNAL EMAIL - USE CAUTION when clicking links or attachments




I think you are right. Technically, it's a "minimum" not a "maximum".

The cleanup happens async by the background log-cleaner thread. Segments
which go beyond the "retention.bytes" config can be removed.

I think it's just a difference between "technically correct" (ie,
engineering / nerd language) and "regular English", ie, how normal
people speak.

I regular English one would say, "I limit the size to 1GB", even if 1GB
is not a strict limit (never larger then 1GB), but technically a lower
bound.


> I would appreciate if you could fix and clarify that in the documentation.


Feel free to open a PR for it :)




-Matthias


On 2/23/25 10:59 AM, אורי אהרוני wrote:
> Hi,
> I encountered a misunderstanding and I would like you to explain it to me
> or if possible change the documentation.
>
> The Kafka docs describes 'retention.bytes' configuration as:
> This configuration controls the maximum size a partition (which consists of
> log segments) can grow to before we will discard old log segments to free
> up space if we are using the "delete" retention policy
>
> Unfortunately I didn't fully understand the meaning of this field.
> I interpret that as once a log segment reaches the 'retention.bytes' field
> - old segments will be deleted.
> But for my understanding it is not the situation because like
> retention.hours I believe it is a guarantee for the (minimum) size of bytes
> will be left for a partition.
>
> I will give an example for the differences:
> An example from IBM:
> A topic with retention.bytes of 1 GB, and with a log segment size of 512 MB:
>
> With one partition, it would reserve about 1.5 GB of storage.
> In this case, the reserved size is significantly larger than the retention
> size.
>
> In this example, there's a guarantee that our topic size won't be LESS THAN
> 1 GB.
> But from the docs I expect that once the topic reaches 1GB (or a bit more),
> old segments will be deleted.
> In this example I would expect that when it reaches 1 GB, a segment will be
> automatically deleted and so the partition will be approximately 1 GB and
> not 1.5 GB as said.
>
> My question is if I understood correctly the definition of the field.
> If not - I would be happy if you could explain what I missed.
> If I'm correct that the definition is not well explained, I would
> appreciate if you could fix and clarify that in the documentation.
> Thanks,
> Ori.
>

Reply via email to