On 04/30/2016 12:53 AM, Ingo Molnar wrote: > We can still use the compacted area handling instructions, because presumably > those are the fastest and are also the most optimized ones? But I wouldn't > use > them to do dynamic allocation: just allocate the maximum possible FPU save > area at > task creation time and never again worry about that detail. > > Ok?
Sounds sane to me. BTW, I hacked up your "fpu performance" to compare XSAVE vs. XSAVES: > [ 0.048347] x86/fpu: Cost of: XSAVE insn : > 127 cycles > [ 0.049134] x86/fpu: Cost of: XSAVES insn : > 113 cycles > [ 0.048492] x86/fpu: Cost of: XRSTOR insn : > 120 cycles > [ 0.049267] x86/fpu: Cost of: XRSTORS insn : > 102 cycles So I guess we can add that to the list of things that XSAVES is good for. Granted, the real-world benefit is probably hard to measure because the cache residency of the XSAVE buffer isn't as good when _actually_ context switching, but this at least shows a small theoretical advantage for XSAVES.

