Re: [RFC PATCH 0/3] um: clean up mm creation - another attempt

Anton Ivanov Tue, 26 Sep 2023 06:04:48 -0700



On 26/09/2023 13:38, Johannes Berg wrote:

On Tue, 2023-09-26 at 13:16 +0100, Anton Ivanov wrote:


For the time being it is mostly negative :)


Oh well :)

1. The performance after the mm patch is down. By 30-40% on my standard bench.


For the record, you mean this three-patch series that we're discussing
in the thread of?


Yes. It has no stability issues on its own as well as with the PREEMPT patch on 
top.



Btw, Benjamin realized that MADV_DONTFORK is broken in UML, precisely
_because_ we fork/copy the whole mm process and then try to fix it up.
But we can only fix up things that actually have VMAs, and of course
there are no VMAs with VM_DONTCOPY (set by MADV_DONTFORK) in the new mm
after fork.

To fix this, really we should either

1. Start from scratch, without copying, which my other patch [1] did.

    [1] 
https://lore.kernel.org/all/20230922131638.2c57ec713d1c.Id11dff4b349e6a8f0136bb6bb09f6e01a80befbb@changeid/

    But of course that's more expensive because we now have to page-fault
    everything in the new process, and page faults are expensive.

2. Compare the new mm and the old mm, which requires putting it into
    arch_dup_mmap() like these patches here - where I'm not sure I
    understand at all why they cause a perf regression - and remove the
    VMAs that are marked VM_DONTCOPY in the old one.


To be honest I don't really like _either_ of these approaches, nor the
current "fork the process" approach that UML takes. It's very magic, and
very much works around how Linux works.

+1


Remember that basically the mm process contents should match the page
tables in the VMAs; but this is decidedly not true where fork() is
involved, because while the VMAs are copied, most of the page tables are
_not_ copied. Thus, we have a situation where after fork we don't take
page faults in UML that we would take in a normal system (this part is
good for performance), and I believe also vice versa, which would then
perhaps explain the flush_tlb_page() in handle_page_fault(), because
honestly I don't otherwise have an explanation for it.


I think the better approach for correctness and integration into the
kernel would be to actually admit that UML is special because page
faults are so expensive, and

  * start with a fresh mm process every time
  * have vma_needs_copy() return true
  * completely fill the mappings according to only the new mm's VMAs
    in arch_dup_mmap() or perhaps later

I don't know how that'd behave wrt. performance, though it likely cannot
be better than with these patches, but at least it'd be more correct,
and more obviously correct too, for starters, because then the actual
mappings in the UML mm process would actually reflect the PTEs that
Linux knows about.


We can try that.

2. The preemption patches work fine on top (all 3 cases). The performance 
difference stays.

OK.

3. We do not have anything of value to add in term of cond_resched() to the 
drivers :(
Most drivers are fairly simplistic with no safe points to add this.


Yeah, not surprised by this.

6. Do we still need force_flush_all() in the arch_dup_mmap()? This works with a 
non-forced tlb flush
using flush_tlb_mm(mm);


Maybe not, does it make a difference though?


Nope. Same numbers in both cases.

7. In all cases, UML is doing something silly.
The CPU usage while doing find -type f -exec cat {} > /dev/null measured from 
outside in non-preemptive and
PREEMPT_VOLUNTARY stays around 8-15%. The UML takes a sabbatical for the 
remaining 85 instead of actually
doing work. PREEMPT is slightly better at 60, but still far from 100%. It just 
keeps going into idle and I
cannot understand why.


Is it just waiting for IO?


Nope. Nearly all I see on strace is wait4 and PTRACE. The epoll_waits are few 
and far between.

The bottleneck is mm and vm, not IO :(


johannes


--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [RFC PATCH 0/3] um: clean up mm creation - another attempt

Reply via email to