Hello everyone, I have been using the bookkeeper as part of pulsar clusters for a while and noticed that the process of decommissioning <https://bookkeeper.apache.org/docs/latest/admin/decomission/> a bookie (or the recover command <https://bookkeeper.apache.org/docs/latest/reference/cli/>) is not very operation friendly and has some limitations: 1. the bookie to be decommissioned has to be stopped first and ledgers on the bookie will be unavailable immediately - there will be a higher risk of data loss during the process as some ledgers will be under-replicated for a while - the load on the remaining nodes may increase immediately because of more read requests to serve, including those to recover under-replicated ledgers - the process doesn't work for ledgers with an ensemble size as 1 2. the decommission has to be performed one after another in sequence to avoid data loss - some ledgers might be replicated multiple times when removing multiple bookies from the cluster 3. the re-replication is performed on the node that executing the decommission command - it might be more efficient and safe to leverage the auto-recovery system and benefit from the improvements on it, e.g., auto-scaling, replication throttling, etc.
I think it would be better to have an improved process that could re-replicate the ledgers from the bookies to be removed in parallel while they are still active in the cluster. It would make it much easier and safe to operate a large-scale bookkeeper cluster. I found that there is a bookkeeper proposal draft named "BP-4 - BookKeeper Lifecycle Management <https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-4+-+BookKeeper+Lifecycle+Management>" that was trying to address this issue but it has not been accepted or implemented yet: - add a new bookie state called `draining` which is similar to the `readonly` state that it is still able to serve read requests but no new ledgers could be allocated onto it, while the auditor will see it as 'lost' and generate re-replication tasks for all ledgers on it. - once all ledgers on the `draining` bookie are fully replicated the bookie is safe to be removed from the cluster. - REST APIs should be added - to update the bookie state dynamically. - to query whether all ledgers on the bookie have been drained I'm not sure if this proposal or similar issues have been discussed, it seems to me that there won't be too much change in the code while the benefits would be significant. Any comments or suggestions are welcome, and I could spend some time working on it if it is viable, thanks! Best regards, Yang Yang