Module Name: src Committed By: maxv Date: Sat Nov 4 08:58:30 UTC 2017
Modified Files: src/sys/arch/x86/x86: fpu.c Log Message: Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds. To generate a diff of this commit: cvs rdiff -u -r1.23 -r1.24 src/sys/arch/x86/x86/fpu.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.