As part of some Qemu Migration related development work I'm doing I stumbled upon what appears to be a bug (patch to follow in separate email).

exec-obsolete.h : cpu_physical_memory_set_dirty_flags() seems to assume the caller provided a page boundary aligned address.

Some code paths call cpu_physical_memory_set_dirty_flags() with an address that is not on a page boundary. The subsequent call to cpu_physical_memory_get_dirty is assuming page boundary alignment because it hard codes a length of TARGET_PAGE_SIZE. This causes problems when the target address lies within a page whose "migration dirty bit" is NOT set, but the following page's "migration dirty bit" is set. In this case, cpu_physical_memory_get_dirty will claim that the page is already dirty when it is not. cpu_physical_memory_set_dirty_flags then skips incrementing ram_list.dirty_pages but still updates the target page's dirty bit with the following code: ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |= dirty_flags; This causes the counter (ram_list.dirty_pages) to be less than the actual number of dirty bits. This can cause our migration remaining ram counter to underflow and can even hang migration in some cases.

In my development/test environment (non-x86 platform) I am experiencing this problem fairly frequently. I'm wondering if anyone knows if cpu_physical_memory_set_dirty_flags() should be performing a page boundary alignment on the target address or if there is some reason this is a bad idea?

--
-- Jason J. Herne (jjhe...@linux.vnet.ibm.com)


Reply via email to