On 25/08/17 18:53, Paolo Bonzini wrote: > On 25/08/2017 10:31, Alexey Kardashevskiy wrote: >> Otherwise old dispatch holds way too much memory before RCU gets >> a chance to free old dispatches. >> >> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >> --- >> >> This is a follow-up to the "Memory use with >100 virtio devices" >> thread. >> >> I assume this is a dirty hack (which fixes the problem though) > > This doesn't work. AddressSpaceDispatch can be accessed outside the big > QEMU lock, which is why it is destroyed via call_rcu. > > The solution is to: 1) share the FlatView structures if they refer to > the same root memory region; 2) have one AddressSpaceDispatch per > FlatView instead of one per AddressSpace, so that FlatView reference > counting takes care of clearing the AddressSpaceDispatch too. Neither > is particularly hard. The second requires getting rid of the > as->dispatch_listener and "hard coding" the creation of the > AddressSpaceDispatch from the FlatView in memory.c.
While I am trying this approach, here is a cheap workaround - diff --git a/vl.c b/vl.c index 8e247cc2a2..4d95bc2a6a 100644 --- a/vl.c +++ b/vl.c @@ -4655,12 +4655,16 @@ int main(int argc, char **argv, char **envp) igd_gfx_passthru(); /* init generic devices */ + memory_region_transaction_begin(); + rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE); if (qemu_opts_foreach(qemu_find_opts("device"), device_init_func, NULL, NULL)) { exit(1); } + memory_region_transaction_commit(); + cpu_synchronize_all_post_init(); rom_reset_order_override(); Just for self-education - how dirty is this hack? This effectively postpones all dispatch trees allocation/disposal till MR transation depth is 0 (which happens in this memory_region_transaction_commit()), for 500 virtio devices it reduces the amount of resident RAM from 45GB to 11GB with 2GB guest, good for speed too - time to build a machine is reduced from 4:00 to 1:20. > > Paolo > >> and I wonder what the proper solution would be. Thanks. >> >> >> What happens here is that every virtio block device creates 2 address >> spaces - for modern config space (called "virtio-pci-cfg-as") and >> for busmaster (common pci thing, called after the device name, >> in my case "virtio-blk-pci"). >> >> Each address_space_init() updates topology for _every_ address space. >> Every topology update (address_space_update_topology()) creates a new >> dispatch tree - AddressSpaceDispatch with nodes (1KB) and >> sections (48KB) and destroys the old one. >> >> However the dispatch destructor is postponed via RCU which does not >> get a chance to execute until the machine is initialized but before >> we get there, memory is not returned to the pool, and this is a lot >> of memory which grows n^2. >> >> >> Interestingly, mem_add() from exec.c is called twice: >> as as->dispatch_listener.region_add() and >> as as->dispatch_listener.region_nop() - I did not understand >> the trick but it does not work if I remove the .region_nop() hook. >> How does it work? :) >> >> --- >> exec.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/exec.c b/exec.c >> index 01ac21e3cd..ea5f3eb209 100644 >> --- a/exec.c >> +++ b/exec.c >> @@ -2707,7 +2707,7 @@ static void mem_commit(MemoryListener *listener) >> >> atomic_rcu_set(&as->dispatch, next); >> if (cur) { >> - call_rcu(cur, address_space_dispatch_free, rcu); >> + address_space_dispatch_free(cur); >> } >> } >> >> > -- Alexey