Here's a few patches which I'm sure will cause a lot of concern, but I think now is the time to have it out and really start optimising these things as far as we can. Radix MMU has been stable for quite some time, and distros have made releases with the more conservative flushes and barriers and updates etc.
If we decide not to do any of these things, we can document why not so it becomes easier to revisit. With these patches, plus the TLB flush reduction patches earlier, plus a few generic mm patches that I haven't posted yet, fork/exec benchmark from selftests increases performance by 11%. A test which mprotects 16GB of memory to readonly, then reads a byte from each page, then protects read/write and updates a byte from each page, then repeats, is more tha 2x faster. Mostly due to reduced TLB flushing, barriers, and atomics from these two patch sets. Nicholas Piggin (3): powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full case powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags powerpc/64s/radix: optimise pte_update arch/powerpc/include/asm/book3s/64/radix.h | 37 +++++++++------------- arch/powerpc/mm/mmu_context.c | 6 ++-- arch/powerpc/mm/tlb-radix.c | 11 ++++++- 3 files changed, 29 insertions(+), 25 deletions(-) -- 2.17.0