Heesung, there are users out there which delete 2M topics in a day. The stress we apply on ZK today due to the amount of topics is more than enough. I don't think we should rely on ZK to also manage the deletion of those ledgers if that means adding more nodes / more reads and writes to it. I totally agree that we need to know if there is an in flight request. I think this can be achieved using a special topic. I mean, why can't we just mark in ZK itself the ledger is marked for delete, and the workflow will go through the different stages until all needed data is deleted and only then delete the ledger ID from ZK?
On Tue, Jan 31, 2023 at 9:06 PM Heesung Sohn <heesung.s...@streamnative.io.invalid> wrote: > On Tue, Jan 31, 2023 at 6:43 AM Yan Zhao <horizo...@apache.org> wrote: > > > > - Have we considered a metadata store to persist and dedup deletion > > > requests instead of the system topic? Why is the system topic the > better > > > choice than a metadata store for this problem? > > If we use the metadata store to store the middle step ledger, we need to > > operate the metadata store after deletion every time. > > > > > > > And we need a trigger to trigger deletion. In the broker, it may have > lots > > of topics, the ledger deletion is also much. Using the metadata store to > > store it may be a bottleneck. > > Using pub/sub is easy to implement, and it is a good trigger to trigger > > deletion. > > > > > We can group the multiple resource deletions to a single record in the > metadata store. Also, we can use the metadata store watcher to trigger the > deletion. > > I can see that a similar transactional operation(using metadata store) can > be done like the following. > > Alternatively, > 1. A broker receives a resource(ledger) deletion request from a client. > 2. If the target resource is available, the broker persists a transaction > lock(/transactions/broker-id/delete_ledger/ledger_id) into a metadata > store(state:pending, createdAt:now). > 2.1 If there is no target resource, error > out(ResourceDoesNotExistException). > 2.2 If the lock already exists, error out(OperationInProgressExeception). > 3. The broker returns success to the client. > 4. The transaction watcher(metadata store listener) on the same broker-id > is notified. > 5. The transaction watcher runs the deletion process with an x min timeout. > 5.1 The transaction watcher updates the lock state (state: running, > startedAt: now) > 5.2 Run step 1 ... n (periodically update the lock state and > updatedAt:now every x secs) > 5.3 Delete the lock. > 6. The orphan transaction monitor runs any orphan jobs by retrying step 5. > (If the watcher fails in the middle at step 5, the lock state will be > orphan(state:running and startedAt : > x min)) > 7. The leader monitor(on the leader broker) manages orphan jobs if brokers > are gone or unavailable. > > We can have multiple types of transaction locks(or generic lock) depending > on the operations types. This will reduce the number of locks to > create/update if there are multiple target resources to operate on for a > single transaction. > > - Single ledger deletion: /transactions/broker-id/delete_ledger/ledger_id > - Mult-ledger deletion: /transactions/broker-id/delete_ledgers/ledgers : > {ledger_ids[a,b,c,d], last_deleted_ledger_index:3} > //last_deleted_ledger_index could be periodically updated every min. This > can help to resume the deletion when retrying. > - Topic deletion : /transactions/broker-id/delete_topic/topic_name > > > > > > - How does Pulsar deduplicate deletion requests(error out to users) > while > > > the deletion request is running? > > The user only can invoke `truncateTopic`, it's not for a particular > > ledger. The note: "The truncate operation will move all cursors to the > end > > of the topic and delete all inactive ledgers." > > It's just a trigger for the user. > > > > What if the admin concurrently requests `truncateTopic` many times for the > same topic while one truncation job is running? How does Pulsar currently > deduplicate these requests? And how does this proposal handle this > situation? > > > > > > > - How do users track async deletion flow status? (do we expose any > > > describeDeletion API to show the deletion status?) > > Why need to track the async deletion flow status? The ledger deletion is > > transparent for pulsarClient. In the broker, deleting a ledger will print > > the log `delete ledger xx successfully `. > > If delete failed, it print the log `delete ledger xxx failed.` > > > > IMHO, relying on logs to check the system state is not a good practice. > Generally, every async user/admin API(long-running async workflow API) > needs the corresponding describe* API to return the current running state. > > > Regards, > Heesung >