Well spotted I think – I was briefly puzzled with the time retention behaviour, as segments seemed to live longer than advertised – until I realised it was min time, deletion is lazy – can occur at some (distant?) time in the future (and is async I think) – this was particularly noticeable for tiered storage (only time I’ve really understood how Kafka segments work and looked closely), Paul
From: Matthias J. Sax <mj...@apache.org> Date: Tuesday, 25 February 2025 at 1:16 pm To: users@kafka.apache.org <users@kafka.apache.org> Subject: Re: Documentation and meaning of configuration 'retention.bytes' EXTERNAL EMAIL - USE CAUTION when clicking links or attachments I think you are right. Technically, it's a "minimum" not a "maximum". The cleanup happens async by the background log-cleaner thread. Segments which go beyond the "retention.bytes" config can be removed. I think it's just a difference between "technically correct" (ie, engineering / nerd language) and "regular English", ie, how normal people speak. I regular English one would say, "I limit the size to 1GB", even if 1GB is not a strict limit (never larger then 1GB), but technically a lower bound. > I would appreciate if you could fix and clarify that in the documentation. Feel free to open a PR for it :) -Matthias On 2/23/25 10:59 AM, אורי אהרוני wrote: > Hi, > I encountered a misunderstanding and I would like you to explain it to me > or if possible change the documentation. > > The Kafka docs describes 'retention.bytes' configuration as: > This configuration controls the maximum size a partition (which consists of > log segments) can grow to before we will discard old log segments to free > up space if we are using the "delete" retention policy > > Unfortunately I didn't fully understand the meaning of this field. > I interpret that as once a log segment reaches the 'retention.bytes' field > - old segments will be deleted. > But for my understanding it is not the situation because like > retention.hours I believe it is a guarantee for the (minimum) size of bytes > will be left for a partition. > > I will give an example for the differences: > An example from IBM: > A topic with retention.bytes of 1 GB, and with a log segment size of 512 MB: > > With one partition, it would reserve about 1.5 GB of storage. > In this case, the reserved size is significantly larger than the retention > size. > > In this example, there's a guarantee that our topic size won't be LESS THAN > 1 GB. > But from the docs I expect that once the topic reaches 1GB (or a bit more), > old segments will be deleted. > In this example I would expect that when it reaches 1 GB, a segment will be > automatically deleted and so the partition will be approximately 1 GB and > not 1.5 GB as said. > > My question is if I understood correctly the definition of the field. > If not - I would be happy if you could explain what I missed. > If I'm correct that the definition is not well explained, I would > appreciate if you could fix and clarify that in the documentation. > Thanks, > Ori. >