On 20/04/2018 19:55, Peter Maydell wrote: > There seems to be a race between tb_gen_code() and qemu_ram_free(), > which results in an abort() in Edgar's test case that exercises the > xilinx-spips mmio-exec functionality. > > Here's what happens: > (1) memory_region_invalidate_mmio_ptr() is called, and it deletes ^^
This "it" is more precisely memory_region_do_invalidate_mmio_ptr. > the temporary ram MemoryRegion. This doesn't immediately result in > deletion of the RAMBlock (which is done from the RCU thread later) > (2) we try to execute from this physaddr, so tb_gen_code() is called. > This calls get_page_addr_code(). Since the RAMBlock hasn't yet > really been deleted, get_page_addr_code() finds it, and returns a > ram_addr in the about-to-go-away memory... > (3) while the CPU thread is busy doing codegen, the RCU thread runs. > It does a flatview_destroy() which ends up doing an object_unref on the > memory region, whose destructor calls qemu_ram_free(), removing the > RAMBlock from the ram list > (4) when the CPU thread has done its codegen, it calls tb_link_page, > which calls tb_alloc_page, which calls tlb_protect_code(), with the > ram_addr that it got in step 2. We end up in tlb_reset_dirty_range_all(), > which calls qemu_get_ram_block() on that ram_addr. > (5) qemu_get_ram_block() aborts, because it can't find the RAMBlock > corresponding to the ram_addr (because we removed it in step 3). > > How should we resolve this race ? Note that qemu_ram_free() is _also_ RCU-freeing the RAMBlock. If it is not found, it means that codegen is not running within rcu_read_lock()/rcu_read_unlock(). In fact it's not. So one possibility would be to add it there. Theoretically the simplest possibility, but rcu_read_lock() is taken _outside_ tb_lock which may be a problem. The alternative is to add a ref/unref pair of the MemoryRegion whenever get_page_addr_code() uses it. That would delay the MemoryRegion destructor after the end of code generation. A rough sketch would be that get_page_addr_code() would grow a MemoryRegion** argument and would use memory_region_from_host() instead of qemu_ram_addr_from_host(). get_page_addr_code() would either call rcu_read_lock(), or the callers would (see the doc comment for memory_region_from_host(); it always feels good when you can refer to doc comments!) Thanks, Paolo > (PS: rr is really useful for identifying what's actually happening > in multi-thread races like this...) > > thanks > -- PMM >