On 24/06/2016 10:11, Peter Lieven wrote: > Am 24.06.2016 um 06:10 schrieb Paolo Bonzini: >>>> If it's 10M nothing. If there is a 100M regression that is also caused >>>> by RCU, we have to give up on it for that data structure, or mmap/munmap >>>> the affected data structures. >>> If it was only 10MB I would agree. But if I run the VM described earlier >>> in this thread it goes from ~35MB with Qemu-2.2.0 to ~130-150MB with >>> current master. This is with coroutine pool disabled. With the coroutine >>> pool >>> it can grow to sth like 300-350MB. >>> >>> Is there an easy way to determinate if RCU is the problem? I have the same >>> symptoms, valgrind doesn't see the allocated memory. Is it possible >>> to make rcu_call directly invoking the function - maybe with a lock around >>> it >>> that serializes the calls? Even if its expensive it might show if we search >>> at the right place. >> Yes, you can do that. Just make it call the function without locks, for >> a quick PoC it will be okay. > > Unfortunately, it leads to immediate segfaults because a lot of things seem > to go horribly wrong ;-) > > Do you have any other idea than reverting all the rcu patches for this > section?
Try freeing under the big QEMU lock: if (qemu_mutex_iothread_locked()) { unlock = true; qemu_mutex_lock_iothread(); } ... if (unlock) { qemu_mutex_unlock_iothread(); } afbe70535ff1a8a7a32910cc15ebecc0ba92e7da should be easy to backport. Thanks, Paolo > I'm also wondering why the RSS is not returned to the kernel. One thing could > be fragmentation.... > > Peter >