On 04/30/2016 12:53 AM, Ingo Molnar wrote:
> We can still use the compacted area handling instructions, because presumably 
> those are the fastest and are also the most optimized ones? But I wouldn't 
> use 
> them to do dynamic allocation: just allocate the maximum possible FPU save 
> area at 
> task creation time and never again worry about that detail.
> 
> Ok?

Sounds sane to me.

BTW, I hacked up your "fpu performance" to compare XSAVE vs. XSAVES:

> [    0.048347] x86/fpu: Cost of: XSAVE                       insn          :  
>  127 cycles
> [    0.049134] x86/fpu: Cost of: XSAVES                      insn          :  
>  113 cycles
> [    0.048492] x86/fpu: Cost of: XRSTOR                      insn          :  
>  120 cycles
> [    0.049267] x86/fpu: Cost of: XRSTORS                     insn          :  
>  102 cycles

So I guess we can add that to the list of things that XSAVES is good
for.  Granted, the real-world benefit is probably hard to measure
because the cache residency of the XSAVE buffer isn't as good when
_actually_ context switching, but this at least shows a small
theoretical advantage for XSAVES.

Reply via email to