I think you are right. Technically, it's a "minimum" not a "maximum".

The cleanup happens async by the background log-cleaner thread. Segments which go beyond the "retention.bytes" config can be removed.

I think it's just a difference between "technically correct" (ie, engineering / nerd language) and "regular English", ie, how normal people speak.

I regular English one would say, "I limit the size to 1GB", even if 1GB is not a strict limit (never larger then 1GB), but technically a lower bound.


I would appreciate if you could fix and clarify that in the documentation.


Feel free to open a PR for it :)




-Matthias


On 2/23/25 10:59 AM, אורי אהרוני wrote:
Hi,
I encountered a misunderstanding and I would like you to explain it to me
or if possible change the documentation.

The Kafka docs describes 'retention.bytes' configuration as:
This configuration controls the maximum size a partition (which consists of
log segments) can grow to before we will discard old log segments to free
up space if we are using the "delete" retention policy

Unfortunately I didn't fully understand the meaning of this field.
I interpret that as once a log segment reaches the 'retention.bytes' field
- old segments will be deleted.
But for my understanding it is not the situation because like
retention.hours I believe it is a guarantee for the (minimum) size of bytes
will be left for a partition.

I will give an example for the differences:
An example from IBM:
A topic with retention.bytes of 1 GB, and with a log segment size of 512 MB:

With one partition, it would reserve about 1.5 GB of storage.
In this case, the reserved size is significantly larger than the retention
size.

In this example, there's a guarantee that our topic size won't be LESS THAN
1 GB.
But from the docs I expect that once the topic reaches 1GB (or a bit more),
old segments will be deleted.
In this example I would expect that when it reaches 1 GB, a segment will be
automatically deleted and so the partition will be approximately 1 GB and
not 1.5 GB as said.

My question is if I understood correctly the definition of the field.
If not - I would be happy if you could explain what I missed.
If I'm correct that the definition is not well explained, I would
appreciate if you could fix and clarify that in the documentation.
Thanks,
Ori.


Reply via email to