Re: Lazy FPU restoration / moving kernel_fpu_end() to context switch

Rik van Riel Fri, 15 Jun 2018 11:40:27 -0700

On Fri, 2018-06-15 at 11:31 -0700, Andy Lutomirski wrote:
> On Fri, Jun 15, 2018 at 6:11 AM Jason A. Donenfeld <ja...@zx2c4.com>
> wrote:
> > 
> > Hi Andy & folks,
> > 
> > Lots of crypto routines look like this:
> > 
> > kernel_fpu_begin();
> > encrypt();
> > kernel_fpu_end();
> > 
> > If you call such a routine twice, you get:
> > 
> > kernel_fpu_begin();
> > encrypt();
> > kernel_fpu_end();
> > kernel_fpu_begin();
> > encrypt();
> > kernel_fpu_end();
> > 
> > In a loop this looks like:
> > 
> > for (thing) {
> >   kernel_fpu_begin();
> >   encrypt(thing);
> >   kernel_fpu_end();
> > }
> > 
> > This is obviously very bad, because begin() and end() are slow, so
> > WireGuard does the obvious:
> > 
> > kernel_fpu_begin();
> > for (thing)
> >   encrypt(thing);
> > kernel_fpu_end();
> > 
> > This is fine and well, and the crypto API I'm working on will
> > enable
> > this to be done in a clear way, but I do wonder if maybe this is
> > not
> > something that should be happening at the level of the caller, but
> > rather in the fpu functions themselves. Namely, what are your
> > thoughts
> > on modifying kernel_fpu_end() so that it doesn't actually restore
> > the
> > state, but just reenables preemption and marks that on the next
> > context switch, the state should be restored? Then, essentially,
> > kernel_fpu_begin() and end() become free after the first usage of
> > kernel_fpu_begin().
> > 
> > Is this something feasible? I know that performance-wise, I'm
> > really
> > gaining a lot from hoisting those functions out of the loops, and
> > API
> > wise, it'd be slightly simpler to implement if I didn't have to all
> > for that hoisting.
> 
> Hi Jason-
> 
> Funny you asked.  This has been discussed a few times, although not
> quite in the form you imagined.  The idea that we've tossed around is
> to restore FPU state on return to user mode.  Roughly, we'd introduce
> a new thread flag TIF_FPU_UNLOADED (name TBD).
> prepare_exit_to_usermode() would notice this flag, copy the fpstate
> to
> fpregs, and clear the flag.  (Or maybe exit_to_usermode_loop() -- No
> one has quite thought it through, but I think it should be outside
> the
> loop.)  We'd update all the FPU accessors to understand the flag.
> We'd probably have a helper like:
> 
> void unload_user_fpstate(void)
> 
> that would do nothing if TIF_FPU_UNLOADED is set.  If
> TIF_FPU_UNLOADED
> is clear, it would move the state to memory and set TIF_FPU_UNLOADED.
> We'd call unload_user_fpstate() during context switch in the prev
> task
> context and in kernel_fpu_begin().
> 
> The one major complication I know of is that PKRU is different from
> all other FPU state in that it needs to be loaded when *kernel* code
> does any user memory access.  So we'd need to make sure that context
> switches load the new task's PKRU eagerly.  Using WRPKRU is easy,
> but,
> unless we do something very clever, actually finding PKRU in the
> in-memory fpstate image may be slightly nontrivial.


IIRC the in-kernel FPU state always has every FPU field
at a constant location.

> I suppose we could eagerly load the new FPU state on context
> switches,
> but I'm not sure I see any point.  We'd still have to load it when we
> switch back to user mode, so it would be slower but not necessarily
> any simpler.
> 
> I'd love to review patches to make this change, but I don't have the
> bandwidth to write them right now.

I can take a look at this, I am already looking at
some context switch code right now, anyway.

I also should still have the TIF_FPU_UNLOADED patches
(or whatever the flag was called) code around somewhere.

-- 
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part

Re: Lazy FPU restoration / moving kernel_fpu_end() to context switch

Reply via email to