?????? [Discuss] BP-56: Support non-stop bookie data migration and bookie offline

lordcheng10 Wed, 14 Sep 2022 20:23:48 -0700

Reply for Yong:
Q: Does the migration process run in the recovery service? Or run it
standalone?A:&nbsp; The migration tool is running in the recovery service.



In addition, some changes have been made to the migration tool, for more 
details, see issue and PR:
issue: https://github.com/apache/bookkeeper/issues/3456
PR: https://github.com/apache/bookkeeper/pull/3457


The relevant descriptions are as follows:


??. Motivation
Currently bookkeeper does not support data migration, only data recovery.


We have a scenario where 125 bookie nodes are offline, and we find that the 
bookie's offline solution is very time-consuming.


Bookie offline steps are as follows:
1. Set the bookie node to be offline to readOnly;
2. Wait for the Pulsar data on the Bookie node to be offline to expire and 
delete;
3. When most of the data on these offline nodes is expired and cleaned up, 
there will still be some data that cannot be expired and deleted;
4. Stop a bookie, and then use the decommission command to migrate the data 
that has not been expired and cleaned up to the new node:
bin/bookkeeper shell decommissionbookie -bookieid xx
5. When the data on one bookie node is migrated, continue to the next bookie 
node;


Step 4 is very time-consuming. We found that waiting for a bookie data 
migration to complete, it takes about 1 hour, and we have 125 bookie nodes to 
be offline.


In addition, step 2 is also very time-consuming, depending on the pulsar retain 
time, usually more than ten hours.


??. Proposal
To solve this problem, we developed a data migration tool.
After having this tool, our offline steps are as follows:
1. Execute the data migration command:
bin/bookkeeper shell replicasMigration --bookieIds bookie1,bookie2&nbsp; 
--readOnly true


2. When the data migration is completed, stop all bookie nodes to be offline;


In addition, this command can also migrate some replicas data on bookie nodes 
to other nodes, for example:
bin/bookkeeper shell replicasMigration --bookieIds bookie1,bookie2 --ledgerIds 
ledger1,ledger2,ledger3 --readOnly false




??. For example
1. Migrate all ledger data on bookie1 and bookie2 to other bookie nodes:
sh bin/bookkeeper shell replicasMigration -bookieIds bookie1,bookie2&nbsp; 
-readOnly true


2. Migrate ledger1 and ledger3 on bookie1 and bookie2 to other bookie nodes:
sh bin/bookkeeper shell replicasMigration -bookieIds bookie1,bookie2 -ledgerIds 
ledger1,ledger3 -readOnly false




??. Application scenarios:
1. The bookie node goes offline:
As mentioned above, after bookkeeper has this data migration tool, the offline 
steps of bookie are only two steps, and the time-consuming is greatly reduced:
a.Execute the data migration command:
bin/bookkeeper shell replicasMigration --bookieIds bookie1,bookie2&nbsp; 
--readOnly true
b. When the data migration is completed, stop all bookie nodes to be offline;


2. Expand the bookie node to improve the reading speed of historical data:
a. When the client consumes historical data a few days ago, it hopes to 
increase the reading speed of historical data by expanding the bookie node.
b. When we expand the new bookie node to the cluster, the new node can only 
receive the read and write of new data, and cannot improve the reading speed of 
historical data.
c. After the data migration tool of bookkeeper, we can migrate some historical 
data to the new node, let the new node provide some historical data reading, 
and improve the reading speed of historical data.


??. The data migration steps are as follows
1. Submit the ledger replicas to be offline to zookeeper through the 
ReplicasMigrationCommand. The path on zookeeper is as follows:
ledgers/replicasMigration/ledgerId1


And write the bookie nodes of the migrating replicas to the migration path, for 
example:


set ledgers/replicasMigration/ledgerId1 bookie1,bookie3


2. Start the replica migration service ReplicasMigrationWorker in 
AutoRecoveryMain:
&nbsp; &nbsp; The ReplicasMigrationWorker service first obtains a migrating 
ledger , then finds the fragments stored by the ledger on the corresponding 
bookie node, and replicates these fragments by the replicateLedgerFragment 
method.


4. When a ledger migration task is completed, the corresponding ledger path on 
zookeeper will be deleted, for example:
&nbsp; &nbsp; delete ledgers/replicasMigration/ledgerId1

?????? [Discuss] BP-56: Support non-stop bookie data migration and bookie offline

Reply via email to