Orit (and anthony if you're not busy),
I forgot to respond to this very important comment:
On 02/13/2013 03:46 AM, Orit Wasserman wrote:
Are you still using the tcp for transferring device state? If so you
can call the tcp functions from the migration rdma code as a first
step but I would prefer it to use RDMA too.
This is the crux of the problem of using RDMA for migration: Currently
all of the QEMU migration control logic and device state goes through
the the QEMUFile implementation. RDMA, however is by nature a zero-copy
solution and is incompatible with QEMUFile.
Using RDMA for transferring device state is not recommended: Setuping an
RDMA requires registering the memory locations on both sides with the
RDMA hardware. This is not ideal because this would require pinning the
memory holding the device state and then issuing the RDMA transfer for
*each* type of device - which would require changing the control path of
every type of migrated device in QEMU.
Currently the Patch we submitted bypasses QEMUFile. It does just issues
the RDMA transfer for the memory that was dirtied and then continues
along with the rest of the migration call path normally.
In an ideal world, we would prefer a hyrbid approach, something like:
*Begin Migration Iteration Round:*
1. stop VCPU
2. start iterative pass over memory
3. send control signals (if any) / device state to QEMUFile
4. When a dirty memory page is found, do:
a) Instruct the QEMUFile to block
b) Issue the RDMA transfer
c) Instruct the QEMUFile to unblock
5. resume VCPU
This would require a "smarter" QEMUFile implementation that understands
when to block and for how long.
Comments?
- Michael