On Thu, Dec 08, 2022 at 10:39:11PM +0800, Chuang Xu wrote: > > On 2022/12/8 上午6:08, Peter Xu wrote: > > On Thu, Dec 08, 2022 at 12:07:03AM +0800, Chuang Xu wrote: > > > On 2022/12/6 上午12:28, Peter Xu wrote: > > > > Chuang, > > > > > > > > No worry on the delay; you're faster than when I read yours. :) > > > > > > > > On Mon, Dec 05, 2022 at 02:56:15PM +0800, Chuang Xu wrote: > > > > > > As a start, maybe you can try with poison > > > > > > address_space_to_flatview() (by > > > > > > e.g. checking the start_pack_mr_change flag and assert it is not > > > > > > set) > > > > > > during this process to see whether any call stack can even try to > > > > > > dereference a flatview. > > > > > > > > > > > > It's just that I didn't figure a good way to "prove" its validity, > > > > > > even if > > > > > > I think this is an interesting idea worth thinking to shrink the > > > > > > downtime. > > > > > Thanks for your sugguestions! > > > > > I used a thread local variable to identify whether the current thread > > > > > is a > > > > > migration thread(main thread of target qemu) and I modified the code > > > > > of > > > > > qemu_coroutine_switch to make sure the thread local variable true > > > > > only in > > > > > process_incoming_migration_co call stack. If the target qemu detects > > > > > that > > > > > start_pack_mr_change is set and address_space_to_flatview() is called > > > > > in > > > > > non-migrating threads or non-migrating coroutine, it will crash. > > > > Are you using the thread var just to avoid the assert triggering in the > > > > migration thread when commiting memory changes? > > > > > > > > I think _maybe_ another cleaner way to sanity check this is directly > > > > upon > > > > the depth: > > > > > > > > static inline FlatView *address_space_to_flatview(AddressSpace *as) > > > > { > > > > /* > > > > * Before using any flatview, sanity check we're not during a > > > > memory > > > > * region transaction or the map can be invalid. Note that this > > > > can > > > > * also be called during commit phase of memory transaction, but > > > > that > > > > * should also only happen when the depth decreases to 0 first. > > > > */ > > > > assert(memory_region_transaction_depth == 0); > > > > return qatomic_rcu_read(&as->current_map); > > > > } > > > > > > > > That should also cover the safe cases of memory transaction commits > > > > during > > > > migration. > > > > > > > Peter, I tried this way and found that the target qemu will crash. > > > > > > Here is the gdb backtrace: > > > > > > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > > > #1 0x00007ff2929d851a in __GI_abort () at abort.c:118 > > > #2 0x00007ff2929cfe67 in __assert_fail_base (fmt=<optimized out>, > > > assertion=assertion@entry=0x55a32578cdc0 "memory_region_transaction_depth > > > == 0", file=file@entry=0x55a32575d9b0 > > > "/data00/migration/qemu-5.2.0/include/exec/memory.h", > > > line=line@entry=766, function=function@entry=0x55a32578d6e0 > > > <__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at assert.c:92 > > > #3 0x00007ff2929cff12 in __GI___assert_fail > > > (assertion=assertion@entry=0x55a32578cdc0 > > > "memory_region_transaction_depth == 0", file=file@entry=0x55a32575d9b0 > > > "/data00/migration/qemu-5.2.0/include/exec/memory.h", line=line@entry=766, > > > function=function@entry=0x55a32578d6e0 <__PRETTY_FUNCTION__.20463> > > > "address_space_to_flatview") at assert.c:101 > > > #4 0x000055a324b2ed5e in address_space_to_flatview (as=0x55a326132580 > > > <address_space_memory>) at > > > /data00/migration/qemu-5.2.0/include/exec/memory.h:766 > > > #5 0x000055a324e79559 in address_space_to_flatview (as=0x55a326132580 > > > <address_space_memory>) at ../softmmu/memory.c:811 > > > #6 address_space_get_flatview (as=0x55a326132580 <address_space_memory>) > > > at ../softmmu/memory.c:805 > > > #7 0x000055a324e96474 in address_space_cache_init > > > (cache=cache@entry=0x55a32a4fb000, as=<optimized out>, > > > addr=addr@entry=68404985856, len=len@entry=4096, is_write=false) at > > > ../softmmu/physmem.c:3307 > > > #8 0x000055a324ea9cba in virtio_init_region_cache (vdev=0x55a32985d9a0, > > > n=0) at ../hw/virtio/virtio.c:185 > > > #9 0x000055a324eaa615 in virtio_load (vdev=0x55a32985d9a0, f=<optimized > > > out>, version_id=<optimized out>) at ../hw/virtio/virtio.c:3203 > > > #10 0x000055a324c6ab96 in vmstate_load_state (f=f@entry=0x55a329dc0c00, > > > vmsd=0x55a325fc1a60 <vmstate_virtio_scsi>, opaque=0x55a32985d9a0, > > > version_id=1) at ../migration/vmstate.c:143 > > > #11 0x000055a324cda138 in vmstate_load (f=0x55a329dc0c00, > > > se=0x55a329941c90) at ../migration/savevm.c:913 > > > #12 0x000055a324cdda34 in qemu_loadvm_section_start_full > > > (mis=0x55a3284ef9e0, f=0x55a329dc0c00) at ../migration/savevm.c:2741 > > > #13 qemu_loadvm_state_main (f=f@entry=0x55a329dc0c00, > > > mis=mis@entry=0x55a3284ef9e0) at ../migration/savevm.c:2939 > > > #14 0x000055a324cdf66a in qemu_loadvm_state (f=0x55a329dc0c00) at > > > ../migration/savevm.c:3021 > > > #15 0x000055a324d14b4e in process_incoming_migration_co > > > (opaque=<optimized out>) at ../migration/migration.c:574 > > > #16 0x000055a32501ae3b in coroutine_trampoline (i0=<optimized out>, > > > i1=<optimized out>) at ../util/coroutine-ucontext.c:173 > > > #17 0x00007ff2929e8000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > > > #18 0x00007ffed80dc2a0 in ?? () > > > #19 0x0000000000000000 in ?? () > > > > > > address_space_cache_init() is the only caller of address_space_to_flatview > > > I can find in vmstate_load call stack so far. Although I think the mr used > > > by address_space_cache_init() won't be affected by the delay of > > > memory_region_transaction_commit(), we really need a mechanism to prevent > > > the modified mr from being used. > > > > > > Maybe we can build a stale list: > > > If a subregion is added, add its parent to the stale list(considering that > > > new subregion's priority has uncertain effects on flatviews). > > > If a subregion is deleted, add itself to the stale list. > > > When memory_region_transaction_commit() regenerates flatviews, clear the > > > stale list. > > > when address_space_translate_internal() is called, check whether the mr > > > looked up matches one of mrs(or its child)in the stale list. If yes, a > > > crash will be triggered. > > I'm not sure that'll work, though. Consider this graph: > > > > A > > / \ > > B C > > (p=1) (p=0) > > > > A,B,C are MRs, B&C are subregions to A. When B's priority is higher (p=1), > > any access to A will go upon B, so far so good. > > > > Then, let's assume D comes under C with even higher priority: > > > > A > > / \ > > B C > > (p=1) (p=0) > > | > > D > > (p=2) > > > > > > Adding C into stale list won't work because when with the old flatview > > it'll point to B instead, while B is not in the stale list. The AS > > operation will carry out without noticing it's already wrong. > > Peter, I think our understanding of priority is different. > > In the qemu docs > (https://qemu.readthedocs.io/en/stable-6.1/devel/memory.html#overlapping-regions-and-priority), > it says 'Priority values are local to a container, because the priorities of > two regions are only compared when they are both children of the same > container.' > And as I read in code, when doing render_memory_region() operation on A, qemu > will firstly insert B's FlatRanges and its children's FlatRanges recursively > because B's priority is higher than C. After B's FlatRanges and its children's > FlatRanges are all inserted into flatviews, C's FlatRanges and its children's > FlatRanges will be inserted into gaps left by B if B and C overlaps. > > So I think adding D as C's subregion has no effect on B in your second case. > The old FlatRange pointing to B is still effective. C and C'children with > lower > priority than D will be affected, but we have flagged them as stale. > > I hope I have no misunderstanding of the flatview's construction code. If I > understand wrong, please forgive my ignorance..😭
No I think you're right.. thanks, I should read the code/doc first rather than trusting myself. :) But still, the whole point is that the parent may not even be visible to the flatview, so I still don't know how it could work. My 2nd attempt: A | B (p=1) Adding C with p=2: A / \ B C (p=1) (p=2) IIUC the flatview to access the offset A resides should point to B, then after C plugged we'll still lookup and find B. Even if A is in the stale list, B is not? The other thing I didn't mention is that I don't think the address space translation is the solo consumer of the flat view. Some examples: common_semi_find_bases() walks the flatview without translations. memory_region_update_coalesced_range() (calls address_space_get_flatview() first) notifies kvm coalesced mmio regions without translations. So at least hooking up address_space_translate_internal() itself may not be enough too. > > > > > > There may be many details to consider in this mechanism. Hope you can give > > > some suggestions on its feasibility. > > For this specific case, I'm wildly thinking whether we can just postpone > > the init of the vring cache until migration completes. > > > > One thing to mention from what I read it: we'll need to update all the > > caches in virtio_memory_listener_commit() anyway, when the batched commit() > > happens when migration completes with your approach, so we'll rebuild the > > vring cache once and for all which looks also nice if possible. > > > > There's some details to consider. E.g. the commit() happens only when > > memory_region_update_pending==true. We may want to make sure the cache is > > initialized unconditionally, at least. Not sure whether that's doable, > > though. > > > > Thanks, > > > Good idea! We can try it in the new patches! And with the delay of > virtio_init_region_cache(), we can still use assert in > address_space_to_flatview(). > However, I think the stale list can be used as a retention scheme for further > discussion in the future, because the stale list may adapt to more complex > scenarios. If the assert will work that'll be even better. I'm actually worried this can trigger like what you mentioned in the virtio path, I didn't expect it comes that soon. So if there's a minimum cases and we can fixup easily that'll be great. Hopefully there aren't so much or we'll need to revisit the whole idea. Thanks, -- Peter Xu