Reply for Yong: Q: Does the migration process run in the recovery service? Or run it standalone?A: The migration tool is running in the recovery service.
In addition, some changes have been made to the migration tool, for more details, see issue and PR: issue: https://github.com/apache/bookkeeper/issues/3456 PR: https://github.com/apache/bookkeeper/pull/3457 The relevant descriptions are as follows: ??. Motivation Currently bookkeeper does not support data migration, only data recovery. We have a scenario where 125 bookie nodes are offline, and we find that the bookie's offline solution is very time-consuming. Bookie offline steps are as follows: 1. Set the bookie node to be offline to readOnly; 2. Wait for the Pulsar data on the Bookie node to be offline to expire and delete; 3. When most of the data on these offline nodes is expired and cleaned up, there will still be some data that cannot be expired and deleted; 4. Stop a bookie, and then use the decommission command to migrate the data that has not been expired and cleaned up to the new node: bin/bookkeeper shell decommissionbookie -bookieid xx 5. When the data on one bookie node is migrated, continue to the next bookie node; Step 4 is very time-consuming. We found that waiting for a bookie data migration to complete, it takes about 1 hour, and we have 125 bookie nodes to be offline. In addition, step 2 is also very time-consuming, depending on the pulsar retain time, usually more than ten hours. ??. Proposal To solve this problem, we developed a data migration tool. After having this tool, our offline steps are as follows: 1. Execute the data migration command: bin/bookkeeper shell replicasMigration --bookieIds bookie1,bookie2 --readOnly true 2. When the data migration is completed, stop all bookie nodes to be offline; In addition, this command can also migrate some replicas data on bookie nodes to other nodes, for example: bin/bookkeeper shell replicasMigration --bookieIds bookie1,bookie2 --ledgerIds ledger1,ledger2,ledger3 --readOnly false ??. For example 1. Migrate all ledger data on bookie1 and bookie2 to other bookie nodes: sh bin/bookkeeper shell replicasMigration -bookieIds bookie1,bookie2 -readOnly true 2. Migrate ledger1 and ledger3 on bookie1 and bookie2 to other bookie nodes: sh bin/bookkeeper shell replicasMigration -bookieIds bookie1,bookie2 -ledgerIds ledger1,ledger3 -readOnly false ??. Application scenarios: 1. The bookie node goes offline: As mentioned above, after bookkeeper has this data migration tool, the offline steps of bookie are only two steps, and the time-consuming is greatly reduced: a.Execute the data migration command: bin/bookkeeper shell replicasMigration --bookieIds bookie1,bookie2 --readOnly true b. When the data migration is completed, stop all bookie nodes to be offline; 2. Expand the bookie node to improve the reading speed of historical data: a. When the client consumes historical data a few days ago, it hopes to increase the reading speed of historical data by expanding the bookie node. b. When we expand the new bookie node to the cluster, the new node can only receive the read and write of new data, and cannot improve the reading speed of historical data. c. After the data migration tool of bookkeeper, we can migrate some historical data to the new node, let the new node provide some historical data reading, and improve the reading speed of historical data. ??. The data migration steps are as follows 1. Submit the ledger replicas to be offline to zookeeper through the ReplicasMigrationCommand. The path on zookeeper is as follows: ledgers/replicasMigration/ledgerId1 And write the bookie nodes of the migrating replicas to the migration path, for example: set ledgers/replicasMigration/ledgerId1 bookie1,bookie3 2. Start the replica migration service ReplicasMigrationWorker in AutoRecoveryMain: The ReplicasMigrationWorker service first obtains a migrating ledger , then finds the fragments stored by the ledger on the corresponding bookie node, and replicates these fragments by the replicateLedgerFragment method. 4. When a ledger migration task is completed, the corresponding ledger path on zookeeper will be deleted, for example: delete ledgers/replicasMigration/ledgerId1