Re: [Discuss] Pulsar retention policy

太上玄元道君 Mon, 15 Apr 2024 03:32:10 -0700

Hi Yike,

The current code implementation of the retention policy looks a little
strange to me.


The biggest problem is we have coupled the backlog quota and retention
policy together,
we cannot retain historical data without setting the backlog quota, say, if
I want to retain 10GB of acknowledged messages,
then I have to set a backlog quota.

The backlog quota will block message publishing or acknowledge messages
automatically, in some cases it's unacceptable.

Personally, I prefer the description of the retention policy in the
official document, it's independent.

Thanks,
Tao Jiuming

Yike Xiao <km...@live.com> 于2024年4月13日周六 23:32写道：

> Hi Jiuming,
>
> Thank you for bringing this up. From a Pulsar admin perspective, the
> current retention policy implementation does not ensure that users can seek
> back to a position within a specific size limit or have to pay extra cost
> to achieve that. For example, to guarantee able to seek back to a position
> 10GB earlier, users need to set the `retention policy = backlog quota +
> 10GB`. However, the backlog quota is typically set quite large to allow for
> significant data accumulation. Therefore, users must bear the cost of a
> large backlog quota (e.g., 100GB) to ensure they can revert to a position
> 10GB earlier, even if there isn't backlog in subscription.
>
> Regards,
> Yike
> ________________________________
> From: 太上玄元道君 <dao...@apache.org>
> Sent: Thursday, April 11, 2024 18:20
> To: dev@pulsar.apache.org <dev@pulsar.apache.org>
> Subject: [Discuss] Pulsar retention policy
>
> Hi, Pulsar community,
>
> I'm opening this thread to discuss the retention policy for managed
> ledgers.
>
> Currently, the retention policy is defined as a time/size-based policy to
> retain messages in the ledger, but there is a difference between the
> official documentation and the actual code implementation.
>
> The official documentation states that the retention policy is to retain
> the messages that were *acknowledged*. For example, if the retention size
> is set to 10GB and there are 20GB of messages acknowledged, Pulsar will
> retain 10GB and delete the rest.
>
> However, the actual code implementation is different. It retains the
> messages that were *written* to the ledger, including *backlog messages*
> and *acknowledged messages*. For instance, if there are 10GB of messages in
> the backlog and 10GB of messages were acknowledged:
> 1. If the retention size is set to 10GB, Pulsar will only retain the 10GB
> of messages in the backlog, and the 10GB of messages that were acknowledged
> will be deleted.
> 2. If the retention size is set to 20GB, Pulsar will retain the 10GB of
> messages in the backlog and the 10GB of messages that were acknowledged.
> 3. If the retention size is set to 5GB, Pulsar will retain the 10GB of
> messages in the backlog, but the 10GB of messages that were acknowledged
> will be deleted.
> 4. If the retention size is set to 15GB, Pulsar will retain the 10GB of
> messages in the backlog and the 5GB of messages that were acknowledged. The
> rest of the acknowledged messages will be deleted.
>
> From Pulsar open source to the present, the code implementation has never
> changed, but the meaning of the official documentation has gradually
> shifted. So I'm just considering which one is better: the official
> documentation or the code implementation? Does the change in the meaning of
> the document align more with expectations? Does it indicate that users want
> to retain the messages that were acknowledged?
>
> For a long time, users have believed that the Retention Policy is for
> retaining messages that were acknowledged. If we change the document to
> match the code implementation, will it meet users' expectations?
>
> What should we do? Change the document to match the code implementation or
> change the code implementation to match the document?
>
> Regards,
> Tao Jiuming
>

Re: [Discuss] Pulsar retention policy

Reply via email to