On Mon, Jul 02, 2018 at 04:46:18PM +0800, Peter Xu wrote: > On Mon, Jul 02, 2018 at 01:34:45PM +0530, Balamuruhan S wrote: > > On Wed, Jun 27, 2018 at 09:22:42PM +0800, Peter Xu wrote: > > > v3: > > > - keep the recovery logic even for RDMA by dropping the 3rd patch and > > > touch up the original 4th patch (current 3rd patch) to suite that [Dave] > > > > > > v2: > > > - break the first patch into several > > > - fix a QEMUFile leak > > > > > > Please review. Thanks, > > Hi Peter, > > Hi, Balamuruhan, > > Glad to know that you are playing this stuff with ppc. I think the > major steps are correct, though... >
Thank you Peter for correcting my mistake, It works like a charm. Nice feature! Tested-by: Balamuruhan S <bal...@linux.vnet.ibm.com> > > > > I have applied this patchset with upstream Qemu for testing postcopy > > pause recover feature in PowerPC, > > > > I used NFS shared qcow2 between source and target host > > > > source: > > # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none \ > > -machine pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 \ > > -device virtio-blk-pci,drive=rootdisk -drive \ > > file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk > > \ > > -monitor telnet:127.0.0.1:1234,server,nowait -net nic,model=virtio \ > > -net user -redir tcp:2000::22 > > > > To keep the VM with workload I ran stress-ng inside guest, > > > > # stress-ng --cpu 6 --vm 6 --io 6 > > > > target: > > # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none \ > > -machine pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 \ > > -device virtio-blk-pci,drive=rootdisk -drive \ > > file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk > > \ > > -monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio \ > > -net user -redir tcp:2001::22 -incoming tcp:0:4445 > > > > enabled postcopy on both source and destination from qemu monitor > > > > (qemu) migrate_set_capability postcopy-ram on > > > > From source qemu monitor, > > (qemu) migrate -d tcp:10.45.70.203:4445 > > [1] > > > (qemu) info migrate > > globals: > > store-global-state: on > > only-migratable: off > > send-configuration: on > > send-section-footer: on > > decompress-error-check: on > > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off > > zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off > > release-ram: off block: off return-path: off pause-before-switchover: > > off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off > > late-block-activate: off > > Migration status: active > > total time: 2331 milliseconds > > expected downtime: 300 milliseconds > > setup: 65 milliseconds > > transferred ram: 38914 kbytes > > throughput: 273.16 mbps > > remaining ram: 67063784 kbytes > > total ram: 67109120 kbytes > > duplicate: 1627 pages > > skipped: 0 pages > > normal: 9706 pages > > normal bytes: 38824 kbytes > > dirty sync count: 1 > > page size: 4 kbytes > > multifd bytes: 0 kbytes > > > > triggered postcopy from source, > > (qemu) migrate_start_postcopy > > > > After triggering postcopy from source, in target I tried to pause the > > postcopy migration > > > > (qemu) migrate_pause > > > > In target I see error as, > > error while loading state section id 4(ram) > > qemu-system-ppc64: Detected IO failure for postcopy. Migration paused. > > > > In source I see error as, > > qemu-system-ppc64: Detected IO failure for postcopy. Migration paused. > > > > Later from target I try for recovery from target monitor, > > (qemu) migrate_recover qemu+ssh://10.45.70.203/system > > ... here is that URI for libvirt only? > > Normally I'll use something similar to [1] above. > > > Migrate recovery is triggered already > > And this means that you have already sent one recovery command before > hand. In the future we'd better allow the recovery command to be run > more than once (in case the first one mistyped...). > > > > > but in source still it remains to be in postcopy-paused state > > (qemu) info migrate > > globals: > > store-global-state: on > > only-migratable: off > > send-configuration: on > > send-section-footer: on > > decompress-error-check: on > > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off > > zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off > > release-ram: off block: off return-path: off pause-before-switchover: > > off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off > > late-block-activate: off > > Migration status: postcopy-paused > > total time: 222841 milliseconds > > expected downtime: 382991 milliseconds > > setup: 65 milliseconds > > transferred ram: 385270 kbytes > > throughput: 265.06 mbps > > remaining ram: 8150528 kbytes > > total ram: 67109120 kbytes > > duplicate: 14679647 pages > > skipped: 0 pages > > normal: 63937 pages > > normal bytes: 255748 kbytes > > dirty sync count: 2 > > page size: 4 kbytes > > multifd bytes: 0 kbytes > > dirty pages rate: 854740 pages > > postcopy request count: 374 > > > > later I also tried to recover postcopy in source monitor, > > (qemu) migrate_recover qemu+ssh://10.45.193.21/system > > This command should be run on destination side only. Here the > "migrate-recover" command on destination will start a new listening > port there waiting for the migration to be continued. Then after that > command we need an extra command on source to start the recovery: > > (HMP) migrate -r $URI > > Here $URI should be the only you specified in the "migrate-recover" > command on destination machine. > > > Migrate recover can only be run when postcopy is paused. > > I can try to fix up this error. Basically we shouldn't allow this > command to be run on source machine. Sure, :+1: > > > > > Looks to be it is broken, please help me if I missed something > > in this test. > > Btw, I'm writting up an unit test for postcopy recovery recently, that > could be a good reference for the new feature. Meanwhile I think I > should write up some documents too afterwards. fine, I am also working on writing test scenario in tp-qemu using Avocado-VT for postcopy pause/recover and multifd features. -- Bala > > Regards, > > > > > Thank you, > > Bala > > > > > > Peter Xu (4): > > > migration: delay postcopy paused state > > > migration: move income process out of multifd > > > migration: unbreak postcopy recovery > > > migration: unify incoming processing > > > > > > migration/ram.h | 2 +- > > > migration/exec.c | 3 --- > > > migration/fd.c | 3 --- > > > migration/migration.c | 44 ++++++++++++++++++++++++++++++++++++------- > > > migration/ram.c | 11 +++++------ > > > migration/savevm.c | 6 +++--- > > > migration/socket.c | 5 ----- > > > 7 files changed, 46 insertions(+), 28 deletions(-) > > > > > > -- > > > 2.17.1 > > > > > > > > > > -- > Peter Xu >