On Fri, 2018-06-15 at 11:31 -0700, Andy Lutomirski wrote: > On Fri, Jun 15, 2018 at 6:11 AM Jason A. Donenfeld <ja...@zx2c4.com> > wrote: > > > > Hi Andy & folks, > > > > Lots of crypto routines look like this: > > > > kernel_fpu_begin(); > > encrypt(); > > kernel_fpu_end(); > > > > If you call such a routine twice, you get: > > > > kernel_fpu_begin(); > > encrypt(); > > kernel_fpu_end(); > > kernel_fpu_begin(); > > encrypt(); > > kernel_fpu_end(); > > > > In a loop this looks like: > > > > for (thing) { > > kernel_fpu_begin(); > > encrypt(thing); > > kernel_fpu_end(); > > } > > > > This is obviously very bad, because begin() and end() are slow, so > > WireGuard does the obvious: > > > > kernel_fpu_begin(); > > for (thing) > > encrypt(thing); > > kernel_fpu_end(); > > > > This is fine and well, and the crypto API I'm working on will > > enable > > this to be done in a clear way, but I do wonder if maybe this is > > not > > something that should be happening at the level of the caller, but > > rather in the fpu functions themselves. Namely, what are your > > thoughts > > on modifying kernel_fpu_end() so that it doesn't actually restore > > the > > state, but just reenables preemption and marks that on the next > > context switch, the state should be restored? Then, essentially, > > kernel_fpu_begin() and end() become free after the first usage of > > kernel_fpu_begin(). > > > > Is this something feasible? I know that performance-wise, I'm > > really > > gaining a lot from hoisting those functions out of the loops, and > > API > > wise, it'd be slightly simpler to implement if I didn't have to all > > for that hoisting. > > Hi Jason- > > Funny you asked. This has been discussed a few times, although not > quite in the form you imagined. The idea that we've tossed around is > to restore FPU state on return to user mode. Roughly, we'd introduce > a new thread flag TIF_FPU_UNLOADED (name TBD). > prepare_exit_to_usermode() would notice this flag, copy the fpstate > to > fpregs, and clear the flag. (Or maybe exit_to_usermode_loop() -- No > one has quite thought it through, but I think it should be outside > the > loop.) We'd update all the FPU accessors to understand the flag. > We'd probably have a helper like: > > void unload_user_fpstate(void) > > that would do nothing if TIF_FPU_UNLOADED is set. If > TIF_FPU_UNLOADED > is clear, it would move the state to memory and set TIF_FPU_UNLOADED. > We'd call unload_user_fpstate() during context switch in the prev > task > context and in kernel_fpu_begin(). > > The one major complication I know of is that PKRU is different from > all other FPU state in that it needs to be loaded when *kernel* code > does any user memory access. So we'd need to make sure that context > switches load the new task's PKRU eagerly. Using WRPKRU is easy, > but, > unless we do something very clever, actually finding PKRU in the > in-memory fpstate image may be slightly nontrivial.
IIRC the in-kernel FPU state always has every FPU field at a constant location. > I suppose we could eagerly load the new FPU state on context > switches, > but I'm not sure I see any point. We'd still have to load it when we > switch back to user mode, so it would be slower but not necessarily > any simpler. > > I'd love to review patches to make this change, but I don't have the > bandwidth to write them right now. I can take a look at this, I am already looking at some context switch code right now, anyway. I also should still have the TIF_FPU_UNLOADED patches (or whatever the flag was called) code around somewhere. -- All Rights Reversed.
signature.asc
Description: This is a digitally signed message part