Il 14/02/2013 20:29, Michael R. Hines ha scritto: > >> Are you still using the tcp for transferring device state? If so you >> can call the tcp functions from the migration rdma code as a first >> step but I would prefer it to use RDMA too. > > This is the crux of the problem of using RDMA for migration: Currently > all of the QEMU migration control logic and device state goes through > the the QEMUFile implementation. RDMA, however is by nature a zero-copy > solution and is incompatible with QEMUFile.
With the patches I sent yesterday, there is no more buffering involved in migration.c. All data goes straight from arch_init.c to a QEMUFile. QEMUFile still does some buffering, but this should change with other patches that Orit is working on. > Using RDMA for transferring device state is not recommended: Setuping an > RDMA requires registering the memory locations on both sides with the > RDMA hardware. This is not ideal because this would require pinning the > memory holding the device state and then issuing the RDMA transfer for > *each* type of device - which would require changing the control path of > every type of migrated device in QEMU. Yes, this would not work well. However, you can (I think) define a QEMUFileOps implementation for RDMA that: 1) registers the buffer of a QEMUFile with the RDMA hardware; 2) in its get_buffer (receive) and put_buffer (send) callbacks, issues a synchronous RDMA transfer; 3) unregisters the buffer in the close callback. As a proof of concept, this would also work (though it would make no sense) for transferring the RAM; in the end of course it would be used only for the device state. It's not a problem to add more operations to QEMUFileOps or similar. It's also not a problem to change the way buf is allocated, if you need it to be page-aligned or something like that. It is much better than adding migration_use_rdma() everywhere. Paolo > > Currently the Patch we submitted bypasses QEMUFile. It does just issues > the RDMA transfer for the memory that was dirtied and then continues > along with the rest of the migration call path normally. > > In an ideal world, we would prefer a hyrbid approach, something like: > > *Begin Migration Iteration Round:* > 1. stop VCPU > 2. start iterative pass over memory > 3. send control signals (if any) / device state to QEMUFile > 4. When a dirty memory page is found, do: > a) Instruct the QEMUFile to block > b) Issue the RDMA transfer > c) Instruct the QEMUFile to unblock > 5. resume VCPU > > This would require a "smarter" QEMUFile implementation that understands > when to block and for how long.