hi, Thanks for bringing up this issue.

> Personally, I prefer the description of the retention policy in the
> official document, it's independent.

Me too, I'm inclined to the description of the official document.

If we want to align the implementation with the official documentation, would 
it be easy to refactor?

On 2024/04/15 10:31:54 太上玄元道君 wrote:
> Hi Yike,
> 
> The current code implementation of the retention policy looks a little
> strange to me.
> 
> The biggest problem is we have coupled the backlog quota and retention
> policy together,
> we cannot retain historical data without setting the backlog quota, say, if
> I want to retain 10GB of acknowledged messages,
> then I have to set a backlog quota.
> 
> The backlog quota will block message publishing or acknowledge messages
> automatically, in some cases it's unacceptable.
> 
> Personally, I prefer the description of the retention policy in the
> official document, it's independent.
> 
> Thanks,
> Tao Jiuming
> 
> Yike Xiao <km...@live.com> 于2024年4月13日周六 23:32写道:
> 
> > Hi Jiuming,
> >
> > Thank you for bringing this up. From a Pulsar admin perspective, the
> > current retention policy implementation does not ensure that users can seek
> > back to a position within a specific size limit or have to pay extra cost
> > to achieve that. For example, to guarantee able to seek back to a position
> > 10GB earlier, users need to set the `retention policy = backlog quota +
> > 10GB`. However, the backlog quota is typically set quite large to allow for
> > significant data accumulation. Therefore, users must bear the cost of a
> > large backlog quota (e.g., 100GB) to ensure they can revert to a position
> > 10GB earlier, even if there isn't backlog in subscription.
> >
> > Regards,
> > Yike
> > ________________________________
> > From: 太上玄元道君 <dao...@apache.org>
> > Sent: Thursday, April 11, 2024 18:20
> > To: dev@pulsar.apache.org <dev@pulsar.apache.org>
> > Subject: [Discuss] Pulsar retention policy
> >
> > Hi, Pulsar community,
> >
> > I'm opening this thread to discuss the retention policy for managed
> > ledgers.
> >
> > Currently, the retention policy is defined as a time/size-based policy to
> > retain messages in the ledger, but there is a difference between the
> > official documentation and the actual code implementation.
> >
> > The official documentation states that the retention policy is to retain
> > the messages that were *acknowledged*. For example, if the retention size
> > is set to 10GB and there are 20GB of messages acknowledged, Pulsar will
> > retain 10GB and delete the rest.
> >
> > However, the actual code implementation is different. It retains the
> > messages that were *written* to the ledger, including *backlog messages*
> > and *acknowledged messages*. For instance, if there are 10GB of messages in
> > the backlog and 10GB of messages were acknowledged:
> > 1. If the retention size is set to 10GB, Pulsar will only retain the 10GB
> > of messages in the backlog, and the 10GB of messages that were acknowledged
> > will be deleted.
> > 2. If the retention size is set to 20GB, Pulsar will retain the 10GB of
> > messages in the backlog and the 10GB of messages that were acknowledged.
> > 3. If the retention size is set to 5GB, Pulsar will retain the 10GB of
> > messages in the backlog, but the 10GB of messages that were acknowledged
> > will be deleted.
> > 4. If the retention size is set to 15GB, Pulsar will retain the 10GB of
> > messages in the backlog and the 5GB of messages that were acknowledged. The
> > rest of the acknowledged messages will be deleted.
> >
> > From Pulsar open source to the present, the code implementation has never
> > changed, but the meaning of the official documentation has gradually
> > shifted. So I'm just considering which one is better: the official
> > documentation or the code implementation? Does the change in the meaning of
> > the document align more with expectations? Does it indicate that users want
> > to retain the messages that were acknowledged?
> >
> > For a long time, users have believed that the Retention Policy is for
> > retaining messages that were acknowledged. If we change the document to
> > match the code implementation, will it meet users' expectations?
> >
> > What should we do? Change the document to match the code implementation or
> > change the code implementation to match the document?
> >
> > Regards,
> > Tao Jiuming
> >
> 

Reply via email to