On 20/08/2025 18:18, Edgecombe, Rick P wrote:
> On Wed, 2025-08-20 at 18:01 +0200, Kevin Brodsky wrote:
>> Apologies, Thunderbird helpfully decided to wrap around that table...
>> Here's the unmangled table:
>>
>> +-------------------+----------------------------------+------------------+---------------+
>>> Benchmark | Result Class | Without batching |
>>> With batching |
>> +===================+==================================+==================+===============+
>>> mmtests/kernbench | real time | 0.32% |
>>> 0.35% |
>>> | system time | (R) 4.18% |
>>> (R) 3.18% |
>>> | user time | 0.08% |
>>> 0.20% |
>> +-------------------+----------------------------------+------------------+---------------+
>>> micromm/fork | fork: h:0 | (R) 221.39% |
>>> (R) 3.35% |
>>> | fork: h:1 | (R) 282.89% |
>>> (R) 6.99% |
>> +-------------------+----------------------------------+------------------+---------------+
>>> micromm/munmap | munmap: h:0 | (R) 17.37% |
>>> -0.28% |
>>> | munmap: h:1 | (R) 172.61% |
>>> (R) 8.08% |
>> +-------------------+----------------------------------+------------------+---------------+
>>> micromm/vmalloc | fix_size_alloc_test: p:1, h:0 | (R) 15.54% |
>>> (R) 12.57% |
> Both this and the previous one have the 95% confidence interval. So it saw a
> 16%
> speed up with direct map modification. Possible?
Positive numbers mean performance degradation ("(R)" actually stands for
regression), so in that case the protection is adding a 16%/13%
overhead. Here this is mainly due to the added pkey register switching
(+ barrier) happening on every call to vmalloc() and vfree(), which has
a large relative impact since only one page is being allocated/freed.
>>> | fix_size_alloc_test: p:4, h:0 | (R) 39.18% |
>>> (R) 9.13% |
>>> | fix_size_alloc_test: p:16, h:0 | (R) 65.81% |
>>> 2.97% |
>>> | fix_size_alloc_test: p:64, h:0 | (R) 83.39% |
>>> -0.49% |
>>> | fix_size_alloc_test: p:256, h:0 | (R) 87.85% |
>>> (I) -2.04% |
>>> | fix_size_alloc_test: p:16, h:1 | (R) 51.21% |
>>> 3.77% |
>>> | fix_size_alloc_test: p:64, h:1 | (R) 60.02% |
>>> 0.99% |
>>> | fix_size_alloc_test: p:256, h:1 | (R) 63.82% |
>>> 1.16% |
>>> | random_size_alloc_test: p:1, h:0 | (R) 77.79% |
>>> -0.51% |
>>> | vm_map_ram_test: p:1, h:0 | (R) 30.67% |
>>> (R) 27.09% |
>> +-------------------+----------------------------------+------------------+---------------+
> Hmm, still surprisingly low to me, but ok. It would be good have x86 and arm
> work the same, but I don't think we have line of sight to x86 currently. And I
> actually never did real benchmarks.
It would certainly be good to get numbers on x86 as well - I'm hoping
that someone with a better understanding of x86 than myself could
implement kpkeys on x86 at some point, so that we can run the same
benchmarks there.
- Kevin