Hi Jiuming, Thank you for bringing this up. From a Pulsar admin perspective, the current retention policy implementation does not ensure that users can seek back to a position within a specific size limit or have to pay extra cost to achieve that. For example, to guarantee able to seek back to a position 10GB earlier, users need to set the `retention policy = backlog quota + 10GB`. However, the backlog quota is typically set quite large to allow for significant data accumulation. Therefore, users must bear the cost of a large backlog quota (e.g., 100GB) to ensure they can revert to a position 10GB earlier, even if there isn't backlog in subscription.
Regards, Yike ________________________________ From: 太上玄元道君 <dao...@apache.org> Sent: Thursday, April 11, 2024 18:20 To: dev@pulsar.apache.org <dev@pulsar.apache.org> Subject: [Discuss] Pulsar retention policy Hi, Pulsar community, I'm opening this thread to discuss the retention policy for managed ledgers. Currently, the retention policy is defined as a time/size-based policy to retain messages in the ledger, but there is a difference between the official documentation and the actual code implementation. The official documentation states that the retention policy is to retain the messages that were *acknowledged*. For example, if the retention size is set to 10GB and there are 20GB of messages acknowledged, Pulsar will retain 10GB and delete the rest. However, the actual code implementation is different. It retains the messages that were *written* to the ledger, including *backlog messages* and *acknowledged messages*. For instance, if there are 10GB of messages in the backlog and 10GB of messages were acknowledged: 1. If the retention size is set to 10GB, Pulsar will only retain the 10GB of messages in the backlog, and the 10GB of messages that were acknowledged will be deleted. 2. If the retention size is set to 20GB, Pulsar will retain the 10GB of messages in the backlog and the 10GB of messages that were acknowledged. 3. If the retention size is set to 5GB, Pulsar will retain the 10GB of messages in the backlog, but the 10GB of messages that were acknowledged will be deleted. 4. If the retention size is set to 15GB, Pulsar will retain the 10GB of messages in the backlog and the 5GB of messages that were acknowledged. The rest of the acknowledged messages will be deleted. >From Pulsar open source to the present, the code implementation has never changed, but the meaning of the official documentation has gradually shifted. So I'm just considering which one is better: the official documentation or the code implementation? Does the change in the meaning of the document align more with expectations? Does it indicate that users want to retain the messages that were acknowledged? For a long time, users have believed that the Retention Policy is for retaining messages that were acknowledged. If we change the document to match the code implementation, will it meet users' expectations? What should we do? Change the document to match the code implementation or change the code implementation to match the document? Regards, Tao Jiuming