As part of some Qemu Migration related development work I'm doing I
stumbled upon what appears to be a bug (patch to follow in separate
email).
exec-obsolete.h : cpu_physical_memory_set_dirty_flags() seems to assume
the caller provided a page boundary aligned address.
Some code paths call cpu_physical_memory_set_dirty_flags() with an
address that is not on a page boundary. The subsequent call to
cpu_physical_memory_get_dirty is assuming page boundary alignment
because it hard codes a length of TARGET_PAGE_SIZE. This causes
problems when the target address lies within a page whose "migration
dirty bit" is NOT set, but the following page's "migration dirty bit" is
set. In this case, cpu_physical_memory_get_dirty will claim that the
page is already dirty when it is not.
cpu_physical_memory_set_dirty_flags then skips incrementing
ram_list.dirty_pages but still updates the target page's dirty bit with
the following code: ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |=
dirty_flags; This causes the counter (ram_list.dirty_pages) to be less
than the actual number of dirty bits. This can cause our migration
remaining ram counter to underflow and can even hang migration in some
cases.
In my development/test environment (non-x86 platform) I am experiencing
this problem fairly frequently. I'm wondering if anyone knows if
cpu_physical_memory_set_dirty_flags() should be performing a page
boundary alignment on the target address or if there is some reason this
is a bad idea?
--
-- Jason J. Herne (jjhe...@linux.vnet.ibm.com)