> This is not reasonable IMHO.
>
> I was okay with sticking a name on a ramblock, but encoding a guest PA
> offset turns this into a supported ABI which I'm not willing to do.
>
> A one line change is one thing, but not a complex new option that
> introduces an ABI only for a proprietary product that's jumping through hoops 
> to keep
> from contributing useful logic to QEMU.

Hi Anthony,

Thanks for getting back to me.

Sticking a name on the ramblock file would suite our product just
fine. Indeed, this is what we had agreed upon at the KVM forum.
However, I submitted a more complex patch in an attempt to expose a
more general & easy to use feature; I was trying to make a more useful
contribution than the simple patch :-)

Perhaps I can assuage your ABI concern and argue the utility of this
patch vs the one-line version. However, if you aren't satisfied,
please let me know and I'll resubmit the one-line version.

On ABI: This patch doesn't add a new ABI. QEMU already has this ABI
due to Xen live migration.

When a Xen domain is booted, a new domain is created with an empty
physmap. Then QEMU is launched. QEMU creates its ramblocks and, via
memory callbacks (xen_add_to_physmap), populates Xen's physmap using
ramblock sizes & offsets.

On incoming migration, the Xen toolstack creates a new domain,
populates its physmap, and copies RAM from the outgoing migration.
When QEMU is launched, it populates its Xen memory model (i.e.,
XenIOState) by reading the domain's existing physmap from xenstore.
When QEMU creates ramblocks, the callbacks in xen-all.c _ignore_ the
new ramblocks because their offsets are already in the physmap. If the
new ramblocks had different sizes & offsets than those from the
outgoing QEMU process, then QEMU's memory model would be inconsistent
with Xen's (i.e., the physmap maintained by the hypervisor and the
XenIOState maintained in userspace). In particular, QEMU would expect
memory at a particular physmap offset that wouldn't have been
populated by the Xen toolstack during live migration.

On utility: Just adding ramblock names to backing file paths makes
post-copy migration & cloning possible, but involves some painful VFS
contortions, which I give a detailed example of below. On the other
hand, these new -mem-path parameters make post-copy migration &
cloning simple by leveraging an existing QMP command, existing
filesystems, and kernel behavior. Put another way, the useful logic
for memory sharing and post-copy live migration already exists in the
kernel and a myriad of filesystems.  A fairly small patch (albeit not
one line) enables that logic in QEMU.

Peter

Detailed example:

Suppose you have a patched QEMU that adds ramblock names to their
backing files and you want to implement memory sharing via cloning.
When clones come up, each of their ramblocks' backing files need to
contain the same data as the corresponding backing file from the
parent (obviously you want those new backing files to somehow share
pages and COW). The basic idea is to save the parent's ramblock files
and arrange for the clones to open them.

You can see the parent's ramblock files easily enough by looking at
the unlinked ramblock files (e.g., /proc/pid/fd/10 is a symlink to
/tmp/qemu_back_mem.pc.ram.WHFZYw (deleted), /proc/pid/fd/11 is a
symlink to /tmp/qemu_back_mem.vga.vram.WT1yQW (deleted), etc.).
Unfortunately, since they're all mapped MAP_PRIVATE, these symlinks,
when opened, will give all zeros. So you can either implement your own
filesystem that gives you a backdoor to the MAP_PRIVATE pages (fast
but complicated), or you can use qemu's monitor to dump guest RAM
(slow but works).

When a clone runs and creates a new backing file using mkstemp, you
need to arrange for that backing file to somehow contain the same data
as the corresponding file from the parent. There is an obvious
heuristic for determining this correspondence: parse the ramblock name
from the child's file and use the matching file from the parent.
Correctness aside (e.g., multiple ramblocks can have the same name,
e.g., e1000.rom, but this is moot because the _important_ ramblocks,
i.e., pc.ram and vga.ram, are unique in the emulated system we care
about), implementing this heuristic is a pain. To see the file being
created, you need to implement a custom file system. Moreover, to
share memory with another file that's been opened MAP_PRIVATE, you
have to implement your own VMA operations. Oye!

Reply via email to