Hi Yike, The current code implementation of the retention policy looks a little strange to me.
The biggest problem is we have coupled the backlog quota and retention policy together, we cannot retain historical data without setting the backlog quota, say, if I want to retain 10GB of acknowledged messages, then I have to set a backlog quota. The backlog quota will block message publishing or acknowledge messages automatically, in some cases it's unacceptable. Personally, I prefer the description of the retention policy in the official document, it's independent. Thanks, Tao Jiuming Yike Xiao <km...@live.com> 于2024年4月13日周六 23:32写道: > Hi Jiuming, > > Thank you for bringing this up. From a Pulsar admin perspective, the > current retention policy implementation does not ensure that users can seek > back to a position within a specific size limit or have to pay extra cost > to achieve that. For example, to guarantee able to seek back to a position > 10GB earlier, users need to set the `retention policy = backlog quota + > 10GB`. However, the backlog quota is typically set quite large to allow for > significant data accumulation. Therefore, users must bear the cost of a > large backlog quota (e.g., 100GB) to ensure they can revert to a position > 10GB earlier, even if there isn't backlog in subscription. > > Regards, > Yike > ________________________________ > From: 太上玄元道君 <dao...@apache.org> > Sent: Thursday, April 11, 2024 18:20 > To: dev@pulsar.apache.org <dev@pulsar.apache.org> > Subject: [Discuss] Pulsar retention policy > > Hi, Pulsar community, > > I'm opening this thread to discuss the retention policy for managed > ledgers. > > Currently, the retention policy is defined as a time/size-based policy to > retain messages in the ledger, but there is a difference between the > official documentation and the actual code implementation. > > The official documentation states that the retention policy is to retain > the messages that were *acknowledged*. For example, if the retention size > is set to 10GB and there are 20GB of messages acknowledged, Pulsar will > retain 10GB and delete the rest. > > However, the actual code implementation is different. It retains the > messages that were *written* to the ledger, including *backlog messages* > and *acknowledged messages*. For instance, if there are 10GB of messages in > the backlog and 10GB of messages were acknowledged: > 1. If the retention size is set to 10GB, Pulsar will only retain the 10GB > of messages in the backlog, and the 10GB of messages that were acknowledged > will be deleted. > 2. If the retention size is set to 20GB, Pulsar will retain the 10GB of > messages in the backlog and the 10GB of messages that were acknowledged. > 3. If the retention size is set to 5GB, Pulsar will retain the 10GB of > messages in the backlog, but the 10GB of messages that were acknowledged > will be deleted. > 4. If the retention size is set to 15GB, Pulsar will retain the 10GB of > messages in the backlog and the 5GB of messages that were acknowledged. The > rest of the acknowledged messages will be deleted. > > From Pulsar open source to the present, the code implementation has never > changed, but the meaning of the official documentation has gradually > shifted. So I'm just considering which one is better: the official > documentation or the code implementation? Does the change in the meaning of > the document align more with expectations? Does it indicate that users want > to retain the messages that were acknowledged? > > For a long time, users have believed that the Retention Policy is for > retaining messages that were acknowledged. If we change the document to > match the code implementation, will it meet users' expectations? > > What should we do? Change the document to match the code implementation or > change the code implementation to match the document? > > Regards, > Tao Jiuming >