Hi all, I have drafted PIP-219: Support full scan and trim ledger
PIP link: https://github.com/apache/pulsar/issues/18128 Here's a copy of the contents of the GH issue for your references: ### Motivation Broker uses the `Trimledgers` thread to clean up outdated ledgers. During cleaning, each Broker traverses the topic metadata in memory to find the ledger that reach the retention or TTL threshold. However, there are some problems with this approach. When a topic has no producer and consumer, Broker deletes the metadata of topic from memory. As a result, ledgers of these topics can never be deleted. Therefore, we need a way to scan and clean all outdated ledgers . ### Goal The full scan will cause a large number of requests to the ZooKeeper. Therefore, the existing cleanup mode will be retained and a full scan mode will be added. ### API Changes 1. Add a new scheduling thread pool 2. Add the following configuration item: // Full scan interval. This parameter is enabled only when the value > 0. fullScanTrimLedgerInterval=0 // Maximum number of Metadata requests per second during scanning fullScanMaximumMetadataConcurrencyPerSecond=200 ### Implementation 1. Only the Leader Broker performs full scan. 2. Leader Broker traverse `managedLedger` in each namespace from meta store . Since Ledger metadata contains the creation time. If the creation time is greater than the retention time + TTL time, Ledger should be deleted. Only the metadata of Ledger is parsed instead of loading all topics to the memory. The metadata request frequency is limited using semaphore. 3. When a topic that meets the conditions, the leader broker loads the topic and invokes its `TrimLedger` method. After cleaning is done, the leader closes the topic to release memory.