On 03/04/17 13:42, Gerd Hoffmann wrote: > On Mo, 2017-04-03 at 13:24 +0100, Mark Cave-Ayland wrote: >> On 03/04/17 13:03, Gerd Hoffmann wrote: >> >>> Hi, >>> >>>> I checked the branch, is bitmap_copy_and_clear_atomic correct when you >>>> have partial updates? Maybe you need to handle partial updates of the >>>> first and last word? >>> >>> Should not be a problem. We might clear some more bits, but these are >>> outsize the visible area so they should cause visible corruption (and if >>> the visible area changes the display code needs to do a full refresh >>> anyway). >> >> Right. Also is there a reason that >> cpu_physical_memory_snapshot_and_clear_dirty() uses BITS_PER_LEVEL when >> calculating the alignment used for the first/last addresses? > > The code copy doesn't operate on bits but on bitmap longs, so I can just > copy the whole long instead of fiddeling with bits. So I round down to > a bitmap long start. On 64bit hosts that are units of 64 pages. > > Actually querying the copied bitmap operates on page granularity of > course. > >> Otherwise you end up expanding the range of your last address >> considerably beyond the next page alignment. > > Yes, it's larger, but as it expands to the start of the long word I > expect the penalty is almost zero (same cache line) and being able to > just copy the whole long instead of dealing with individual bits should > be a win. > >> And also in the main page >> loop why is BITS_PER_LEVEL used? I see that several of the internal >> bitmap routines appear to use BITS_PER_LONG for compressing into the >> bitmap which might be more appropriate? > > shift by BITS_PER_LEVEL is the same as division by BITS_PER_LONG
Right hopefully I understand this now. The following diff appears to fix CG3/TCX for me: diff --git a/exec.c b/exec.c index 000af18..66bdcf4 100644 --- a/exec.c +++ b/exec.c @@ -1071,7 +1071,7 @@ DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty (ram_addr_t start, ram_addr_t length, unsigned client) { DirtyMemoryBlocks *blocks; - unsigned long align = 1 << (TARGET_PAGE_BITS + BITS_PER_LEVEL); + unsigned long align = 1UL << (TARGET_PAGE_BITS + BITS_PER_LEVEL); ram_addr_t first = QEMU_ALIGN_DOWN(start, align); ram_addr_t last = QEMU_ALIGN_UP(start + length, align); DirtyBitmapSnapshot *snap; @@ -1097,7 +1097,6 @@ DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty assert(QEMU_IS_ALIGNED(offset, (1 << BITS_PER_LEVEL))); assert(QEMU_IS_ALIGNED(num, (1 << BITS_PER_LEVEL))); - offset >>= BITS_PER_LEVEL; bitmap_copy_and_clear_atomic(snap->dirty + dest, blocks->blocks[idx] + (offset >> BITS_PER_LEVEL), There were 2 issues here: without the UL suffix on align I was getting incorrect first/last addresses since the high bits of align weren't being cleared, and then offset appeared to be shifted twice. Does this look reasonable to you? ATB, Mark.