LGTM.

The pic in
https://pulsar.apache.org/docs/4.0.x/cookbooks-retention-expiry/#retention-policies
also needs change.

Thanks,
Tao Jiuming

Baodi Shi <ba...@apache.org>于2024年10月21日 周一22:08写道:

> https://github.com/apache/pulsar/issues/22473#issuecomment-2426787328
>
>
> I’ve drawn a diagram of the current backlog quota and retention
> policy. Please correct me if it's not accurate.
>
> Thanks,
> Baodi Shi
>
> Baodi Shi <ba...@apache.org> 于2024年10月21日周一 20:00写道:
> >
> > hi, Thanks for bringing up this issue.
> >
> > > Personally, I prefer the description of the retention policy in the
> > > official document, it's independent.
> >
> > Me too, I'm inclined to the description of the official document.
> >
> > If we want to align the implementation with the official documentation,
> would it be easy to refactor?
> >
> > On 2024/04/15 10:31:54 太上玄元道君 wrote:
> > > Hi Yike,
> > >
> > > The current code implementation of the retention policy looks a little
> > > strange to me.
> > >
> > > The biggest problem is we have coupled the backlog quota and retention
> > > policy together,
> > > we cannot retain historical data without setting the backlog quota,
> say, if
> > > I want to retain 10GB of acknowledged messages,
> > > then I have to set a backlog quota.
> > >
> > > The backlog quota will block message publishing or acknowledge messages
> > > automatically, in some cases it's unacceptable.
> > >
> > > Personally, I prefer the description of the retention policy in the
> > > official document, it's independent.
> > >
> > > Thanks,
> > > Tao Jiuming
> > >
> > > Yike Xiao <km...@live.com> 于2024年4月13日周六 23:32写道:
> > >
> > > > Hi Jiuming,
> > > >
> > > > Thank you for bringing this up. From a Pulsar admin perspective, the
> > > > current retention policy implementation does not ensure that users
> can seek
> > > > back to a position within a specific size limit or have to pay extra
> cost
> > > > to achieve that. For example, to guarantee able to seek back to a
> position
> > > > 10GB earlier, users need to set the `retention policy = backlog
> quota +
> > > > 10GB`. However, the backlog quota is typically set quite large to
> allow for
> > > > significant data accumulation. Therefore, users must bear the cost
> of a
> > > > large backlog quota (e.g., 100GB) to ensure they can revert to a
> position
> > > > 10GB earlier, even if there isn't backlog in subscription.
> > > >
> > > > Regards,
> > > > Yike
> > > > ________________________________
> > > > From: 太上玄元道君 <dao...@apache.org>
> > > > Sent: Thursday, April 11, 2024 18:20
> > > > To: dev@pulsar.apache.org <dev@pulsar.apache.org>
> > > > Subject: [Discuss] Pulsar retention policy
> > > >
> > > > Hi, Pulsar community,
> > > >
> > > > I'm opening this thread to discuss the retention policy for managed
> > > > ledgers.
> > > >
> > > > Currently, the retention policy is defined as a time/size-based
> policy to
> > > > retain messages in the ledger, but there is a difference between the
> > > > official documentation and the actual code implementation.
> > > >
> > > > The official documentation states that the retention policy is to
> retain
> > > > the messages that were *acknowledged*. For example, if the retention
> size
> > > > is set to 10GB and there are 20GB of messages acknowledged, Pulsar
> will
> > > > retain 10GB and delete the rest.
> > > >
> > > > However, the actual code implementation is different. It retains the
> > > > messages that were *written* to the ledger, including *backlog
> messages*
> > > > and *acknowledged messages*. For instance, if there are 10GB of
> messages in
> > > > the backlog and 10GB of messages were acknowledged:
> > > > 1. If the retention size is set to 10GB, Pulsar will only retain the
> 10GB
> > > > of messages in the backlog, and the 10GB of messages that were
> acknowledged
> > > > will be deleted.
> > > > 2. If the retention size is set to 20GB, Pulsar will retain the 10GB
> of
> > > > messages in the backlog and the 10GB of messages that were
> acknowledged.
> > > > 3. If the retention size is set to 5GB, Pulsar will retain the 10GB
> of
> > > > messages in the backlog, but the 10GB of messages that were
> acknowledged
> > > > will be deleted.
> > > > 4. If the retention size is set to 15GB, Pulsar will retain the 10GB
> of
> > > > messages in the backlog and the 5GB of messages that were
> acknowledged. The
> > > > rest of the acknowledged messages will be deleted.
> > > >
> > > > From Pulsar open source to the present, the code implementation has
> never
> > > > changed, but the meaning of the official documentation has gradually
> > > > shifted. So I'm just considering which one is better: the official
> > > > documentation or the code implementation? Does the change in the
> meaning of
> > > > the document align more with expectations? Does it indicate that
> users want
> > > > to retain the messages that were acknowledged?
> > > >
> > > > For a long time, users have believed that the Retention Policy is for
> > > > retaining messages that were acknowledged. If we change the document
> to
> > > > match the code implementation, will it meet users' expectations?
> > > >
> > > > What should we do? Change the document to match the code
> implementation or
> > > > change the code implementation to match the document?
> > > >
> > > > Regards,
> > > > Tao Jiuming
> > > >
> > >
>

Reply via email to