On Sat, Jun 29, 2019 at 06:15:21PM +0530, Sukrit Bhatnagar wrote: > On Fri, 28 Jun 2019 at 16:56, Dr. David Alan Gilbert > <dgilb...@redhat.com> wrote: > > > > * Yuval Shaia (yuval.sh...@oracle.com) wrote: > > > On Fri, Jun 21, 2019 at 08:15:41PM +0530, Sukrit Bhatnagar wrote: > > > > Define and register SaveVMHandlers pvrdma_save and > > > > pvrdma_load for saving and loading the device state, > > > > which currently includes only the dma, command slot > > > > and response slot addresses. > > > > > > > > Remap the DSR, command slot and response slot upon > > > > loading the addresses in the pvrdma_load function. > > > > > > > > Cc: Marcel Apfelbaum <marcel.apfelb...@gmail.com> > > > > Cc: Yuval Shaia <yuval.sh...@oracle.com> > > > > Signed-off-by: Sukrit Bhatnagar <skrtbht...@gmail.com> > > > > --- > > > > hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 56 insertions(+) > > > > > > > > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c > > > > index adcf79cd63..cd8573173c 100644 > > > > --- a/hw/rdma/vmw/pvrdma_main.c > > > > +++ b/hw/rdma/vmw/pvrdma_main.c > > > > @@ -28,6 +28,7 @@ > > > > #include "sysemu/sysemu.h" > > > > #include "monitor/monitor.h" > > > > #include "hw/rdma/rdma.h" > > > > +#include "migration/register.h" > > > > > > > > #include "../rdma_rm.h" > > > > #include "../rdma_backend.h" > > > > @@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, > > > > void *opaque) > > > > pvrdma_fini(pci_dev); > > > > } > > > > > > > > +static void pvrdma_save(QEMUFile *f, void *opaque) > > > > +{ > > > > + PVRDMADev *dev = PVRDMA_DEV(opaque); > > > > + > > > > + qemu_put_be64(f, dev->dsr_info.dma); > > > > + qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma); > > > > + qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma); > > > > +} > > > > + > > > > +static int pvrdma_load(QEMUFile *f, void *opaque, int version_id) > > > > +{ > > > > + PVRDMADev *dev = PVRDMA_DEV(opaque); > > > > + PCIDevice *pci_dev = PCI_DEVICE(dev); > > > > + > > > > + // Remap DSR > > > > + dev->dsr_info.dma = qemu_get_be64(f); > > > > + dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma, > > > > + sizeof(struct > > > > pvrdma_device_shared_region)); > > > > + if (!dev->dsr_info.dsr) { > > > > + rdma_error_report("Failed to map to DSR"); > > > > + return -1; > > > > + } > > > > + qemu_log("pvrdma_load: successfully remapped to DSR\n"); > > > > + > > > > + // Remap cmd slot > > > > + dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f); > > > > + dev->dsr_info.req = rdma_pci_dma_map(pci_dev, > > > > dev->dsr_info.dsr->cmd_slot_dma, > > > > + sizeof(union pvrdma_cmd_req)); > > > > + if (!dev->dsr_info.req) { > > > > > > So this is where you hit the error, right? > > > All looks good to me, i wonder why the first pci_dma_map works fine but > > > second fails where the global BounceBuffer is marked as used. > > > > > > Anyone? > > > > I've got a few guesses: > > a) My reading of address_space_map is that it gives a bounce buffer > > pointer and then it has to be freed; so if it's going wrong and using a > > bounce buffer, then the 1st call would work and only the 2nd would fail. > > In the scenario without any migration, the device is init and load_dsr() > is called when the guest writes to DSR in BAR1 of pvrdma. > > Inside the load_dsr(), there are similar calls to rdma_pci_dma_map(). > > The difference is that when the address_space_map() is called there, > !memory_access_is_direct() condition is FALSE. > So, there is no use for BounceBuffer. > > > In the migration scenario, when the device at dest calls load and > subsequently rdma_pci_dma_map(), the !memory_access_is_direct() > condition is TRUE. > So, the first rdma_pci_dma_map() will set BounceBuffer from FALSE to > TRUE and succeed. But, the subsequent calls will fail at atomic_xchg(). > > This BounceBuffer is only freed when address_space_unmap() is called. > > > I am guessing that when the address is translated to one in FlatView, > the MemoryRegion returned at dest is causing the issue. > To be honest, at this point I am not sure how FlatView translations works. > > I tried adding qemu_log to memory_access_is_direct(), but I guess it is > called too many times, so the vm stalls before it even starts. > > > b) Perhaps the dma address read from the stream is bad, and isn't > > pointing into RAM properly - and that's why you're getting a bounce > > buffer. > > I have logged the addresses saved in pvrdma_save(), and the ones > loaded in pvrdma_load(). They are same. So, there is no problem in the > stream itself, I suppose. > > > c) Do you have some ordering problems? i.e. is this code happening > > before some other device has been loaded/initialised? If you're relying > > on some other state, then perhaps the right thing is to perform these > > mappings later, at the end of migration, not during the loading itself. > > The device which is saved/loaded before pvrdma is vmxnet3, on which > the pvrdma depends. > > I have included the last few lines of my traces during migration below. > The pvrdma migration is performed in this last part of migration. > I had added some debug messages at various places to see which parts > of the code are touched. The ones I added are without any timestamp. > > Source: > 15942@1561806657.197239:vmstate_subsection_save_top PCIDevice > 15942@1561806657.197257:vmstate_subsection_save_top vmxnet3/pcie > 15942@1561806657.197275:savevm_section_end 0000:00:10.0/vmxnet3, > section_id 46 -> 0 > 15942@1561806657.197302:savevm_section_start 0000:00:10.1/pvrdma, section_id > 47 > 15942@1561806657.197326:vmstate_save 0000:00:10.1/pvrdma, (old) > pvrdma_save: dev->dsr_info.dma is 2032963584 > pvrdma_save: dev->dsr_info.dsr->cmd_slot_dma is 2032660480 > pvrdma_save: dev->dsr_info.dsr->resp_slot_dma is 893681664 > 15942@1561806657.197372:savevm_section_end 0000:00:10.1/pvrdma, > section_id 47 -> 0 > 15942@1561806657.197397:savevm_section_start acpi_build, section_id 48 > 15942@1561806657.197420:vmstate_save acpi_build, acpi_build > 15942@1561806657.197441:vmstate_save_state_top acpi_build > 15942@1561806657.197459:vmstate_n_elems patched: 1 > 15942@1561806657.197479:vmstate_save_state_loop acpi_build/patched[1] > 15942@1561806657.197503:vmstate_subsection_save_top acpi_build > 15942@1561806657.197520:savevm_section_end acpi_build, section_id 48 -> 0 > 15942@1561806657.197545:savevm_section_start globalstate, section_id 49 > 15942@1561806657.197568:vmstate_save globalstate, globalstate > 15942@1561806657.197589:vmstate_save_state_top globalstate > 15942@1561806657.197606:migrate_global_state_pre_save saved state: running > 15942@1561806657.197624:vmstate_save_state_pre_save_res globalstate/0 > 15942@1561806657.197645:vmstate_n_elems size: 1 > 15942@1561806657.197677:vmstate_save_state_loop globalstate/size[1] > 15942@1561806657.197710:vmstate_n_elems runstate: 1 > 15942@1561806657.197730:vmstate_save_state_loop globalstate/runstate[1] > 15942@1561806657.197822:vmstate_subsection_save_top globalstate > 15942@1561806657.197885:savevm_section_end globalstate, section_id 49 -> 0 > 15942@1561806657.198240:migrate_set_state new state completed > 15942@1561806657.198269:migration_thread_after_loop > 15797@1561806657.198329:savevm_state_cleanup > 15797@1561806657.198377:migrate_fd_cleanup > 15797@1561806657.199072:qemu_file_fclose > > Destination: > 15830@1561806657.667024:vmstate_subsection_load vmxnet3/pcie > 15830@1561806657.667064:vmstate_subsection_load_good vmxnet3/pcie > 15830@1561806657.667106:vmstate_load_state_end vmxnet3/pcie end/0 > 15830@1561806657.667150:vmstate_subsection_load_good vmxnet3 > 15830@1561806657.667213:vmstate_load_state_end vmxnet3 end/0 > 15830@1561806657.667263:qemu_loadvm_state_section 4 > 15830@1561806657.667293:qemu_loadvm_state_section_startfull > 47(0000:00:10.1/pvrdma) 0 1 > 15830@1561806657.667346:vmstate_load 0000:00:10.1/pvrdma, (old) > pvrdma_load: dev->dsr_info.dma is 2032963584 > address_space_map: is_write is 0 > address_space_map: addr is 2032963584 > address_space_map: Inside !memory_access_is_direct > 15830@1561806657.667453:rdma_pci_dma_map 0x792c9000 -> 0x5616d1f66000 > (len=280) > pvrdma_load: successfully remapped to DSR > pvrdma_load: dev->dsr_info.dsr->cmd_slot_dma is 2032660480 > address_space_map: is_write is 0 > address_space_map: addr is 2032660480 > address_space_map: Inside !memory_access_is_direct > address_space_map: Inside atomic_xchg > qemu-system-x86_64: rdma: pci_dma_map fail, addr=0x7927f000, len=184 > qemu-system-x86_64: rdma: Failed to map to command slot address > qemu-system-x86_64: error while loading state for instance 0x0 of > device '0000:00:10.1/pvrdma' > 15830@1561806657.667693:qemu_loadvm_state_post_main -1 > 15830@1561806657.667729:loadvm_state_cleanup > 15830@1561806657.668568:process_incoming_migration_co_end ret=-1 > postcopy-state=0 > qemu-system-x86_64: load of migration failed: Operation not permitted > 15830@1561806657.668721:migrate_set_state new state failed > 15830@1561806657.668807:qemu_file_fclose > qemu-system-x86_64: network script /etc/qemu-ifdown failed with status 256 > > @Marcel, @Yuval: As David has suggested, what if we just read the dma > addresses in pvrdma_load(), and let the load_dsr() do the mapping? > In pvrdma_regs_write(), we can check if dev->dsr_info.dma is already set, so > that its value is not overwritten.
Have you tried that? Did it works? So it is like a lazy load and the first command will be bit slower, we can live with that i guess. Is there other way to postpone work so it will be executed right after migration is completes? I just don't like the fact that we are adding an 'if' statement. In any case, wrap the "if (dev->dsr_info.dma)" with "unlikely" because as soon as it be initialized it will always true. > > > > Other notes: > > d) Use VMSTATE macros and structures rather than open-coding qemu_get > > calls. > > Yes, I am trying it currently. > > > e) QEMU normally uses /* comments */ rather than // > > I have corrected this mistake in my code. patchew notified me about this > and the line length issues minutes after I had sent this patch. > > > > Dave > > > > > > + rdma_error_report("Failed to map to command slot address"); > > > > + return -1; > > > > + } > > > > + qemu_log("pvrdma_load: successfully remapped to cmd slot\n"); > > > > + > > > > + // Remap rsp slot > > > > > > btw, this is RFC so we are fine but try to avoid such way of comments. > > > > > > > + dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f); > > > > + dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, > > > > dev->dsr_info.dsr->resp_slot_dma, > > > > + sizeof(union pvrdma_cmd_resp)); > > > > + if (!dev->dsr_info.rsp) { > > > > + rdma_error_report("Failed to map to response slot address"); > > > > + return -1; > > > > + } > > > > + qemu_log("pvrdma_load: successfully remapped to rsp slot\n"); > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +static SaveVMHandlers savevm_pvrdma = { > > > > + .save_state = pvrdma_save, > > > > + .load_state = pvrdma_load, > > > > +}; > > > > + > > > > static void pvrdma_realize(PCIDevice *pdev, Error **errp) > > > > { > > > > int rc = 0; > > > > + DeviceState *s = DEVICE(pdev); > > > > PVRDMADev *dev = PVRDMA_DEV(pdev); > > > > Object *memdev_root; > > > > bool ram_shared = false; > > > > @@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error > > > > **errp) > > > > dev->shutdown_notifier.notify = pvrdma_shutdown_notifier; > > > > qemu_register_shutdown_notifier(&dev->shutdown_notifier); > > > > > > > > + register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev); > > > > + > > > > out: > > > > if (rc) { > > > > pvrdma_fini(pdev); > > > > -- > > > > 2.21.0 > > > > > > > > > > > > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK