Hi all,

I have drafted PIP-219: Support full scan and trim ledger

PIP link:
https://github.com/apache/pulsar/issues/18128

Here's a copy of the contents of the GH issue for your references:

### Motivation

Broker uses the `Trimledgers` thread to clean up outdated ledgers. During
cleaning, each Broker traverses the topic metadata in memory to find the
ledger that reach the retention or TTL threshold.
However, there are some problems with this approach. When a topic has no
producer and consumer, Broker deletes the metadata of topic from memory. As
a result, ledgers of these topics can never be deleted.
Therefore, we need a way to scan and clean all outdated ledgers .

### Goal

The full scan will cause a large number of requests to the ZooKeeper.
 Therefore, the existing cleanup mode will be retained and a full scan mode
will be added.


### API Changes

1. Add a new scheduling thread pool

2. Add the following configuration item:
// Full scan interval. This parameter is enabled only when the value > 0.
fullScanTrimLedgerInterval=0
// Maximum number of Metadata requests per second during scanning
fullScanMaximumMetadataConcurrencyPerSecond=200

### Implementation

1. Only the Leader Broker performs full scan.
2. Leader Broker traverse `managedLedger` in each namespace from meta store
. Since Ledger metadata contains the creation time. If the creation time is
greater than the retention time + TTL time, Ledger should be deleted.
Only the metadata of Ledger is parsed instead of loading all topics to the
memory.
The metadata request frequency is limited using semaphore.

3. When a topic that meets the conditions, the leader broker loads the
topic and invokes its `TrimLedger` method. After cleaning is done, the
leader closes the topic to release memory.

Reply via email to