Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

Ingo Molnar Tue, 20 Mar 2018 01:28:01 -0700

* Thomas Gleixner <[email protected]> wrote:

> > Useful also for code that needs AVX-like registers to do things like CRCs.
> 
> x86/crypto/ has a lot of AVX optimized code.


Yeah, that's true, but the crypto code is processing fundamentally bigger 
blocks 
of data, which amortizes the cost of using kernel_fpu_begin()/_end().

kernel_fpu_begin()/_end() is a pretty heavy operation because it does a full 
FPU 
save/restore via the XSAVE[S] and XRSTOR[S] instructions, which can easily copy 
a 
thousand bytes around! So kernel_fpu_begin()/_end() is probably a non-starter 
for 
something small, like a single 256-bit or 512-bit word access.

But there's actually a new thing in modern kernels: we got rid of (most of) 
lazy 
save/restore FPU code, our new x86 FPU model is very "direct" with no FPU 
faults 
taken normally.

So assuming the target driver will only load on modern FPUs I *think* it should 
actually be possible to do something like (pseudocode):

        vmovdqa %ymm0, 40(%rsp)
        vmovdqa %ymm1, 80(%rsp)

        ...
        # use ymm0 and ymm1
        ...

        vmovdqa 80(%rsp), %ymm1
        vmovdqa 40(%rsp), %ymm0

... without using the heavy XSAVE/XRSTOR instructions.

Note that preemption probably still needs to be disabled and possibly there are 
other details as well, but there should be no 'heavy' FPU operations.

I think this should still preserve all user-space FPU state and shouldn't muck 
up 
any 'weird' user-space FPU state (such as pending exceptions, legacy x87 
running 
code, NaN registers or weird FPU control word settings) we might have 
interrupted 
either.

But I could be wrong, it should be checked whether this sequence is safe. 
Worst-case we might have to save/restore the FPU control and tag words - but 
those 
operations should still be much faster than a full XSAVE/XRSTOR pair.

So I do think we could do more in this area to improve driver performance, if 
the 
code is correct and if there's actual benchmarks that are showing real benefits.

Thanks,

        Ingo

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

Reply via email to