Hi Andy & folks, Lots of crypto routines look like this:
kernel_fpu_begin(); encrypt(); kernel_fpu_end(); If you call such a routine twice, you get: kernel_fpu_begin(); encrypt(); kernel_fpu_end(); kernel_fpu_begin(); encrypt(); kernel_fpu_end(); In a loop this looks like: for (thing) { kernel_fpu_begin(); encrypt(thing); kernel_fpu_end(); } This is obviously very bad, because begin() and end() are slow, so WireGuard does the obvious: kernel_fpu_begin(); for (thing) encrypt(thing); kernel_fpu_end(); This is fine and well, and the crypto API I'm working on will enable this to be done in a clear way, but I do wonder if maybe this is not something that should be happening at the level of the caller, but rather in the fpu functions themselves. Namely, what are your thoughts on modifying kernel_fpu_end() so that it doesn't actually restore the state, but just reenables preemption and marks that on the next context switch, the state should be restored? Then, essentially, kernel_fpu_begin() and end() become free after the first usage of kernel_fpu_begin(). Is this something feasible? I know that performance-wise, I'm really gaining a lot from hoisting those functions out of the loops, and API wise, it'd be slightly simpler to implement if I didn't have to all for that hoisting. Regards, Jason