* Dave Hansen <[email protected]> wrote:
> On 04/30/2016 12:53 AM, Ingo Molnar wrote:
> > We can still use the compacted area handling instructions, because
> > presumably
> > those are the fastest and are also the most optimized ones? But I wouldn't
> > use
> > them to do dynamic allocation: just allocate the maximum possible FPU save
> > area at
> > task creation time and never again worry about that detail.
> >
> > Ok?
>
> Sounds sane to me.
>
> BTW, I hacked up your "fpu performance" to compare XSAVE vs. XSAVES:
>
> > [ 0.048347] x86/fpu: Cost of: XSAVE insn
> > : 127 cycles
> > [ 0.049134] x86/fpu: Cost of: XSAVES insn
> > : 113 cycles
> > [ 0.048492] x86/fpu: Cost of: XRSTOR insn
> > : 120 cycles
> > [ 0.049267] x86/fpu: Cost of: XRSTORS insn
> > : 102 cycles
>
> So I guess we can add that to the list of things that XSAVES is good for.
Absolutely!
> [...] Granted, the real-world benefit is probably hard to measure because
> the
> cache residency of the XSAVE buffer isn't as good when _actually_ context
> switching, but this at least shows a small theoretical advantage for XSAVES.
Yeah, and anything that was measured for real is far from being theoretical.
It's
simply a best-case microbenchmark figure, but it's still a nice 10+ cycles
improvement overall - which might become bigger in future CPU generations.
Thanks,
Ingo