On 27/09/2023 10:52, Benjamin Berg wrote:
Hi,

On Tue, 2023-09-26 at 14:38 +0200, Johannes Berg wrote:
[SNIP]
1. Start from scratch, without copying, which my other patch [1] did.

I really think we should go ahead with that approach. Then follow up
with optimizations.

+1


[SNIP]

I think the better approach for correctness and integration into the
kernel would be to actually admit that UML is special because page
faults are so expensive, and

  * start with a fresh mm process every time
  * have vma_needs_copy() return true
  * completely fill the mappings according to only the new mm's VMAs
    in arch_dup_mmap() or perhaps later

I don't know how that'd behave wrt. performance, though it likely cannot
be better than with these patches, but at least it'd be more correct,
and more obviously correct too, for starters, because then the actual
mappings in the UML mm process would actually reflect the PTEs that
Linux knows about.

Yes, performance may degrade, but the implementation should be correct
in the first place. Note that even though we looked at it (and e.g.
found that MMAP_DONTFORK is incorrect), we have not figured out why the
first approach is slower currently as everything interesting should be
getting unmapped by the force_flush_all.

Once we are there, we can look for optimizations. The fundamental
problem is that page faults (even minor ones) are extremely expensive
for us.

Just throwing out ideas on what we could do:
    1. SECCOMP as that reduces the amount of context switches.
       (Yes, I know I should resubmit the patchset)

Actually... YES, YES and YES.

I was just looking at all the workaround which are in place to prevent
guest processes doing a syscall on the host. If this is prohibited at
a higher level we should get quite a boost as all these PTRACE_PEEKs
will become unnecessary.

    2. Maybe we can disable/cripple page access tracking? If we assume
       initially mark all pages as accessed by userspace (i.e.
       pte_mkyoung), then we avoid a minor page fault on first access.
       Doing that will mess with page eviction though.
    3. Do DAX (direct_access) for files. i.e. mmap files directly in the
       host kernel rather than through UM.
       With a hostfs like file system, one should be able to add an
       intermediate block device that maps host files to physical pages,
       then do DAX in the FS.
       For disk images, the existing iomem infrastructure should be
       usable, this should work with any DAX enabled filesystems (ext2,
       ext4, xfs, virtiofs, erofs).

I had some plans to do a ubd gen 2 which uses mmap and/or this. They are
presently way on the backburner. We can do some of that once we push
the new VM changes.


Benjamin


2. The preemption patches work fine on top (all 3 cases). The
performance difference stays.

OK.

3. We do not have anything of value to add in term of
cond_resched() to the drivers :(
Most drivers are fairly simplistic with no safe points to add this.

Yeah, not surprised by this.

6. Do we still need force_flush_all() in the arch_dup_mmap()? This
works with a non-forced tlb flush
using flush_tlb_mm(mm);

Maybe not, does it make a difference though?

7. In all cases, UML is doing something silly.
The CPU usage while doing find -type f -exec cat {} > /dev/null
measured from outside in non-preemptive and
PREEMPT_VOLUNTARY stays around 8-15%. The UML takes a sabbatical
for the remaining 85 instead of actually
doing work. PREEMPT is slightly better at 60, but still far from
100%. It just keeps going into idle and I
cannot understand why.

Is it just waiting for IO?

johannes

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um




--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Reply via email to