LGTM. The pic in https://pulsar.apache.org/docs/4.0.x/cookbooks-retention-expiry/#retention-policies also needs change.
Thanks, Tao Jiuming Baodi Shi <ba...@apache.org>于2024年10月21日 周一22:08写道: > https://github.com/apache/pulsar/issues/22473#issuecomment-2426787328 > > > I’ve drawn a diagram of the current backlog quota and retention > policy. Please correct me if it's not accurate. > > Thanks, > Baodi Shi > > Baodi Shi <ba...@apache.org> 于2024年10月21日周一 20:00写道: > > > > hi, Thanks for bringing up this issue. > > > > > Personally, I prefer the description of the retention policy in the > > > official document, it's independent. > > > > Me too, I'm inclined to the description of the official document. > > > > If we want to align the implementation with the official documentation, > would it be easy to refactor? > > > > On 2024/04/15 10:31:54 太上玄元道君 wrote: > > > Hi Yike, > > > > > > The current code implementation of the retention policy looks a little > > > strange to me. > > > > > > The biggest problem is we have coupled the backlog quota and retention > > > policy together, > > > we cannot retain historical data without setting the backlog quota, > say, if > > > I want to retain 10GB of acknowledged messages, > > > then I have to set a backlog quota. > > > > > > The backlog quota will block message publishing or acknowledge messages > > > automatically, in some cases it's unacceptable. > > > > > > Personally, I prefer the description of the retention policy in the > > > official document, it's independent. > > > > > > Thanks, > > > Tao Jiuming > > > > > > Yike Xiao <km...@live.com> 于2024年4月13日周六 23:32写道: > > > > > > > Hi Jiuming, > > > > > > > > Thank you for bringing this up. From a Pulsar admin perspective, the > > > > current retention policy implementation does not ensure that users > can seek > > > > back to a position within a specific size limit or have to pay extra > cost > > > > to achieve that. For example, to guarantee able to seek back to a > position > > > > 10GB earlier, users need to set the `retention policy = backlog > quota + > > > > 10GB`. However, the backlog quota is typically set quite large to > allow for > > > > significant data accumulation. Therefore, users must bear the cost > of a > > > > large backlog quota (e.g., 100GB) to ensure they can revert to a > position > > > > 10GB earlier, even if there isn't backlog in subscription. > > > > > > > > Regards, > > > > Yike > > > > ________________________________ > > > > From: 太上玄元道君 <dao...@apache.org> > > > > Sent: Thursday, April 11, 2024 18:20 > > > > To: dev@pulsar.apache.org <dev@pulsar.apache.org> > > > > Subject: [Discuss] Pulsar retention policy > > > > > > > > Hi, Pulsar community, > > > > > > > > I'm opening this thread to discuss the retention policy for managed > > > > ledgers. > > > > > > > > Currently, the retention policy is defined as a time/size-based > policy to > > > > retain messages in the ledger, but there is a difference between the > > > > official documentation and the actual code implementation. > > > > > > > > The official documentation states that the retention policy is to > retain > > > > the messages that were *acknowledged*. For example, if the retention > size > > > > is set to 10GB and there are 20GB of messages acknowledged, Pulsar > will > > > > retain 10GB and delete the rest. > > > > > > > > However, the actual code implementation is different. It retains the > > > > messages that were *written* to the ledger, including *backlog > messages* > > > > and *acknowledged messages*. For instance, if there are 10GB of > messages in > > > > the backlog and 10GB of messages were acknowledged: > > > > 1. If the retention size is set to 10GB, Pulsar will only retain the > 10GB > > > > of messages in the backlog, and the 10GB of messages that were > acknowledged > > > > will be deleted. > > > > 2. If the retention size is set to 20GB, Pulsar will retain the 10GB > of > > > > messages in the backlog and the 10GB of messages that were > acknowledged. > > > > 3. If the retention size is set to 5GB, Pulsar will retain the 10GB > of > > > > messages in the backlog, but the 10GB of messages that were > acknowledged > > > > will be deleted. > > > > 4. If the retention size is set to 15GB, Pulsar will retain the 10GB > of > > > > messages in the backlog and the 5GB of messages that were > acknowledged. The > > > > rest of the acknowledged messages will be deleted. > > > > > > > > From Pulsar open source to the present, the code implementation has > never > > > > changed, but the meaning of the official documentation has gradually > > > > shifted. So I'm just considering which one is better: the official > > > > documentation or the code implementation? Does the change in the > meaning of > > > > the document align more with expectations? Does it indicate that > users want > > > > to retain the messages that were acknowledged? > > > > > > > > For a long time, users have believed that the Retention Policy is for > > > > retaining messages that were acknowledged. If we change the document > to > > > > match the code implementation, will it meet users' expectations? > > > > > > > > What should we do? Change the document to match the code > implementation or > > > > change the code implementation to match the document? > > > > > > > > Regards, > > > > Tao Jiuming > > > > > > > >