https://github.com/apache/pulsar/issues/22473#issuecomment-2426787328


I’ve drawn a diagram of the current backlog quota and retention
policy. Please correct me if it's not accurate.

Thanks,
Baodi Shi

Baodi Shi <ba...@apache.org> 于2024年10月21日周一 20:00写道:
>
> hi, Thanks for bringing up this issue.
>
> > Personally, I prefer the description of the retention policy in the
> > official document, it's independent.
>
> Me too, I'm inclined to the description of the official document.
>
> If we want to align the implementation with the official documentation, would 
> it be easy to refactor?
>
> On 2024/04/15 10:31:54 太上玄元道君 wrote:
> > Hi Yike,
> >
> > The current code implementation of the retention policy looks a little
> > strange to me.
> >
> > The biggest problem is we have coupled the backlog quota and retention
> > policy together,
> > we cannot retain historical data without setting the backlog quota, say, if
> > I want to retain 10GB of acknowledged messages,
> > then I have to set a backlog quota.
> >
> > The backlog quota will block message publishing or acknowledge messages
> > automatically, in some cases it's unacceptable.
> >
> > Personally, I prefer the description of the retention policy in the
> > official document, it's independent.
> >
> > Thanks,
> > Tao Jiuming
> >
> > Yike Xiao <km...@live.com> 于2024年4月13日周六 23:32写道:
> >
> > > Hi Jiuming,
> > >
> > > Thank you for bringing this up. From a Pulsar admin perspective, the
> > > current retention policy implementation does not ensure that users can 
> > > seek
> > > back to a position within a specific size limit or have to pay extra cost
> > > to achieve that. For example, to guarantee able to seek back to a position
> > > 10GB earlier, users need to set the `retention policy = backlog quota +
> > > 10GB`. However, the backlog quota is typically set quite large to allow 
> > > for
> > > significant data accumulation. Therefore, users must bear the cost of a
> > > large backlog quota (e.g., 100GB) to ensure they can revert to a position
> > > 10GB earlier, even if there isn't backlog in subscription.
> > >
> > > Regards,
> > > Yike
> > > ________________________________
> > > From: 太上玄元道君 <dao...@apache.org>
> > > Sent: Thursday, April 11, 2024 18:20
> > > To: dev@pulsar.apache.org <dev@pulsar.apache.org>
> > > Subject: [Discuss] Pulsar retention policy
> > >
> > > Hi, Pulsar community,
> > >
> > > I'm opening this thread to discuss the retention policy for managed
> > > ledgers.
> > >
> > > Currently, the retention policy is defined as a time/size-based policy to
> > > retain messages in the ledger, but there is a difference between the
> > > official documentation and the actual code implementation.
> > >
> > > The official documentation states that the retention policy is to retain
> > > the messages that were *acknowledged*. For example, if the retention size
> > > is set to 10GB and there are 20GB of messages acknowledged, Pulsar will
> > > retain 10GB and delete the rest.
> > >
> > > However, the actual code implementation is different. It retains the
> > > messages that were *written* to the ledger, including *backlog messages*
> > > and *acknowledged messages*. For instance, if there are 10GB of messages 
> > > in
> > > the backlog and 10GB of messages were acknowledged:
> > > 1. If the retention size is set to 10GB, Pulsar will only retain the 10GB
> > > of messages in the backlog, and the 10GB of messages that were 
> > > acknowledged
> > > will be deleted.
> > > 2. If the retention size is set to 20GB, Pulsar will retain the 10GB of
> > > messages in the backlog and the 10GB of messages that were acknowledged.
> > > 3. If the retention size is set to 5GB, Pulsar will retain the 10GB of
> > > messages in the backlog, but the 10GB of messages that were acknowledged
> > > will be deleted.
> > > 4. If the retention size is set to 15GB, Pulsar will retain the 10GB of
> > > messages in the backlog and the 5GB of messages that were acknowledged. 
> > > The
> > > rest of the acknowledged messages will be deleted.
> > >
> > > From Pulsar open source to the present, the code implementation has never
> > > changed, but the meaning of the official documentation has gradually
> > > shifted. So I'm just considering which one is better: the official
> > > documentation or the code implementation? Does the change in the meaning 
> > > of
> > > the document align more with expectations? Does it indicate that users 
> > > want
> > > to retain the messages that were acknowledged?
> > >
> > > For a long time, users have believed that the Retention Policy is for
> > > retaining messages that were acknowledged. If we change the document to
> > > match the code implementation, will it meet users' expectations?
> > >
> > > What should we do? Change the document to match the code implementation or
> > > change the code implementation to match the document?
> > >
> > > Regards,
> > > Tao Jiuming
> > >
> >

Reply via email to