I would like to propose the docs change for retention.bytes I see it's in this repo: https://github.com/apache/kafka-site How could I get permission for opening PR or new issue?
בתאריך יום ג׳, 25 בפבר׳ 2025 ב-11:25 מאת Brebner, Paul <paul.breb...@netapp.com.invalid>: > Well spotted I think – I was briefly puzzled with the time retention > behaviour, as segments seemed to live longer than advertised – until I > realised it was min time, deletion is lazy – can occur at some (distant?) > time in the future (and is async I think) – this was particularly > noticeable for tiered storage (only time I’ve really understood how Kafka > segments work and looked closely), Paul > > From: Matthias J. Sax <mj...@apache.org> > Date: Tuesday, 25 February 2025 at 1:16 pm > To: users@kafka.apache.org <users@kafka.apache.org> > Subject: Re: Documentation and meaning of configuration 'retention.bytes' > EXTERNAL EMAIL - USE CAUTION when clicking links or attachments > > > > > I think you are right. Technically, it's a "minimum" not a "maximum". > > The cleanup happens async by the background log-cleaner thread. Segments > which go beyond the "retention.bytes" config can be removed. > > I think it's just a difference between "technically correct" (ie, > engineering / nerd language) and "regular English", ie, how normal > people speak. > > I regular English one would say, "I limit the size to 1GB", even if 1GB > is not a strict limit (never larger then 1GB), but technically a lower > bound. > > > > I would appreciate if you could fix and clarify that in the > documentation. > > > Feel free to open a PR for it :) > > > > > -Matthias > > > On 2/23/25 10:59 AM, אורי אהרוני wrote: > > Hi, > > I encountered a misunderstanding and I would like you to explain it to me > > or if possible change the documentation. > > > > The Kafka docs describes 'retention.bytes' configuration as: > > This configuration controls the maximum size a partition (which consists > of > > log segments) can grow to before we will discard old log segments to free > > up space if we are using the "delete" retention policy > > > > Unfortunately I didn't fully understand the meaning of this field. > > I interpret that as once a log segment reaches the 'retention.bytes' > field > > - old segments will be deleted. > > But for my understanding it is not the situation because like > > retention.hours I believe it is a guarantee for the (minimum) size of > bytes > > will be left for a partition. > > > > I will give an example for the differences: > > An example from IBM: > > A topic with retention.bytes of 1 GB, and with a log segment size of 512 > MB: > > > > With one partition, it would reserve about 1.5 GB of storage. > > In this case, the reserved size is significantly larger than the > retention > > size. > > > > In this example, there's a guarantee that our topic size won't be LESS > THAN > > 1 GB. > > But from the docs I expect that once the topic reaches 1GB (or a bit > more), > > old segments will be deleted. > > In this example I would expect that when it reaches 1 GB, a segment will > be > > automatically deleted and so the partition will be approximately 1 GB and > > not 1.5 GB as said. > > > > My question is if I understood correctly the definition of the field. > > If not - I would be happy if you could explain what I missed. > > If I'm correct that the definition is not well explained, I would > > appreciate if you could fix and clarify that in the documentation. > > Thanks, > > Ori. > > > -- *Ori Aharoni*