https://github.com/apache/pulsar/issues/22473#issuecomment-2426787328
I’ve drawn a diagram of the current backlog quota and retention policy. Please correct me if it's not accurate. Thanks, Baodi Shi Baodi Shi <ba...@apache.org> 于2024年10月21日周一 20:00写道: > > hi, Thanks for bringing up this issue. > > > Personally, I prefer the description of the retention policy in the > > official document, it's independent. > > Me too, I'm inclined to the description of the official document. > > If we want to align the implementation with the official documentation, would > it be easy to refactor? > > On 2024/04/15 10:31:54 太上玄元道君 wrote: > > Hi Yike, > > > > The current code implementation of the retention policy looks a little > > strange to me. > > > > The biggest problem is we have coupled the backlog quota and retention > > policy together, > > we cannot retain historical data without setting the backlog quota, say, if > > I want to retain 10GB of acknowledged messages, > > then I have to set a backlog quota. > > > > The backlog quota will block message publishing or acknowledge messages > > automatically, in some cases it's unacceptable. > > > > Personally, I prefer the description of the retention policy in the > > official document, it's independent. > > > > Thanks, > > Tao Jiuming > > > > Yike Xiao <km...@live.com> 于2024年4月13日周六 23:32写道: > > > > > Hi Jiuming, > > > > > > Thank you for bringing this up. From a Pulsar admin perspective, the > > > current retention policy implementation does not ensure that users can > > > seek > > > back to a position within a specific size limit or have to pay extra cost > > > to achieve that. For example, to guarantee able to seek back to a position > > > 10GB earlier, users need to set the `retention policy = backlog quota + > > > 10GB`. However, the backlog quota is typically set quite large to allow > > > for > > > significant data accumulation. Therefore, users must bear the cost of a > > > large backlog quota (e.g., 100GB) to ensure they can revert to a position > > > 10GB earlier, even if there isn't backlog in subscription. > > > > > > Regards, > > > Yike > > > ________________________________ > > > From: 太上玄元道君 <dao...@apache.org> > > > Sent: Thursday, April 11, 2024 18:20 > > > To: dev@pulsar.apache.org <dev@pulsar.apache.org> > > > Subject: [Discuss] Pulsar retention policy > > > > > > Hi, Pulsar community, > > > > > > I'm opening this thread to discuss the retention policy for managed > > > ledgers. > > > > > > Currently, the retention policy is defined as a time/size-based policy to > > > retain messages in the ledger, but there is a difference between the > > > official documentation and the actual code implementation. > > > > > > The official documentation states that the retention policy is to retain > > > the messages that were *acknowledged*. For example, if the retention size > > > is set to 10GB and there are 20GB of messages acknowledged, Pulsar will > > > retain 10GB and delete the rest. > > > > > > However, the actual code implementation is different. It retains the > > > messages that were *written* to the ledger, including *backlog messages* > > > and *acknowledged messages*. For instance, if there are 10GB of messages > > > in > > > the backlog and 10GB of messages were acknowledged: > > > 1. If the retention size is set to 10GB, Pulsar will only retain the 10GB > > > of messages in the backlog, and the 10GB of messages that were > > > acknowledged > > > will be deleted. > > > 2. If the retention size is set to 20GB, Pulsar will retain the 10GB of > > > messages in the backlog and the 10GB of messages that were acknowledged. > > > 3. If the retention size is set to 5GB, Pulsar will retain the 10GB of > > > messages in the backlog, but the 10GB of messages that were acknowledged > > > will be deleted. > > > 4. If the retention size is set to 15GB, Pulsar will retain the 10GB of > > > messages in the backlog and the 5GB of messages that were acknowledged. > > > The > > > rest of the acknowledged messages will be deleted. > > > > > > From Pulsar open source to the present, the code implementation has never > > > changed, but the meaning of the official documentation has gradually > > > shifted. So I'm just considering which one is better: the official > > > documentation or the code implementation? Does the change in the meaning > > > of > > > the document align more with expectations? Does it indicate that users > > > want > > > to retain the messages that were acknowledged? > > > > > > For a long time, users have believed that the Retention Policy is for > > > retaining messages that were acknowledged. If we change the document to > > > match the code implementation, will it meet users' expectations? > > > > > > What should we do? Change the document to match the code implementation or > > > change the code implementation to match the document? > > > > > > Regards, > > > Tao Jiuming > > > > >