Hi Bookkeeper Community, 

This is a BP discussion on Support non-stop bookie data migration and 
bookie offline
The issue can be 
found: https://github.com/apache/bookkeeper/issues/3456 


I copy the content here for convenience, any suggestions are welcome and 
appreciated.




### Motivation
bookie offline steps:
1. Log on to the bookie node, check if there are underreplicated ledgers.If 
there are, the decommission command will force them to be replicated: 
bin/bookkeeper shell listunderreplicated
2. Stop the bookie : bin/bookkeeper-daemon.sh stop bookie
3. Run the decommission command. If you have logged onto the node you wish to 
decommission, you don't need to provide -bookieid If you are running the 
decommission command for target bookie node from another bookie node you should 
mention the target bookie id in the arguments for -bookieid : 
bin/bookkeeper shell decommissionbookie or $ bin/bookkeeper shell 
decommissionbookie -bookieid <target bookieid&gt;
4. Validate that there are no ledgers on decommissioned bookie $ bin/bookkeeper 
shell listledgers -bookieid <target bookieid&gt;


For the current bookie offline solution, need to stop the bookie first??execute 
the decommission command and wait for the ledger migration on the bookie to 
complete.


it is very time-consuming to offline a bookie node. When we need to offline a 
lot of bookie nodes, the time-consuming of this solution will not be acceptable.


Therefore, we need a solution that can migrate data without stopping bookie, so 
that bookie nodes can be offlined in batches.


### Proposal
In order to solve this solution, we propose a solution that can be replicated 
without stopping the bookie.&nbsp;
The process is as follows:
1. Submit the bookie node to be offline;
5. Traverse each ledgers on the offline bookie, and persist these ledgers and 
the corresponding offline bookie nodes to the zookeeper directory: 
ledgers/offline_ledgers/ledgerId;
6. Get the ledger to be offline;
7. Traverse all fragments on a ledger, and filter out the fragments containing 
the offline bookie copy;
8. Copy data for each fragment;
9. When a ledger fragment is copied, delete the corresponding 
ledgers/offline_ledgers/ledgerId;
10. When all ledgerId directories under ledgers/offline_ledgers are deleted, it 
means that the data has been migrated, you can stop bookies in batches and go 
offline;


To achieve our goal, we need to achieve two things:
1. Implement a command to submit the bookie to be offline and the corresponding 
ledgers, for example:
bin/bookkeeper shell decommissionbookie -offline_bookieids&nbsp; 
bookieId1,bookieId2,bookieId3,...bookieIdN
&nbsp; This command will write all ledgers on the offline bookie node to the 
zookeeper directory, for example: put ledgers/offline_ledgers/ledgerId 
bookId1,bookId2,...bookIdn;
2. Design a ReassignLedgerWorker class to perform the actual ledger 
replication:&nbsp;
&nbsp; &nbsp;this class will obtain a ledger from the zookeeper directory 
ledgers/offline_ledgers for replication.&nbsp;
&nbsp; &nbsp;It will first filter out all the fragments containing the offline 
bookieId under the ledger,then copy these fragments;

Reply via email to