On Fri, Feb 20, 2015 at 10:58:15AM -0800, Andy Lutomirski wrote:
> - /* Auto enable eagerfpu for xsaveopt */
> - if (cpu_has_xsaveopt && eagerfpu != DISABLE)
> + /* Auto enable eagerfpu for everyone */
> + if (eagerfpu != DISABLE)
> eagerfpu = ENABLE;
So Mel did run s
* Borislav Petkov wrote:
> On Tue, Feb 24, 2015 at 04:07:07PM -0800, Andy Lutomirski wrote:
>
> > I'd prefer a different partial solution: encourage
> > everyone to clear the xstate before making syscalls
> > (using e.g. vzeroall). In fact, maybe user code should
> > aggressively clear newly
* Andy Lutomirski wrote:
> > I'm a big fan of simplifying things, but.
> >
> > SIMD registers were growing in x86, and they are going
> > to grow again, this time four-fold in Intel MIC: from
> > sixteen 256-bit registers to thirty two 512-bit
> > registers.
> >
> > That's 2 kbytes of data. J
On Tue, Feb 24, 2015 at 04:07:07PM -0800, Andy Lutomirski wrote:
> I'd prefer a different partial solution: encourage everyone to clear
> the xstate before making syscalls (using e.g. vzeroall). In fact,
> maybe user code should aggressively clear newly-unused xstate.
We don't trust userspace.
-
On Tue, Feb 24, 2015 at 11:15 AM, Denys Vlasenko
wrote:
> On Fri, Feb 20, 2015 at 7:58 PM, Andy Lutomirski wrote:
>> We have eager and lazy fpu modes, introduced in:
>>
>> 304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
>> xsave
>>
>> The result is rather messy. There
On Fri, Feb 20, 2015 at 7:58 PM, Andy Lutomirski wrote:
> We have eager and lazy fpu modes, introduced in:
>
> 304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
> xsave
>
> The result is rather messy. There are two code paths in almost all of the
> FPU code, and only one
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 09:31 PM, Andy Lutomirski wrote:
> On Mon, Feb 23, 2015 at 6:14 PM, Maciej W. Rozycki
> wrote:
>> That's an interesting case too, although not necessarily related.
>> If you say that we always save the FP context eagerly for the
>> purp
On Mon, Feb 23, 2015 at 6:14 PM, Maciej W. Rozycki wrote:
> On Mon, 23 Feb 2015, Andy Lutomirski wrote:
>
>> >> After a context switch, the instructions from the old task are no
>> >> longer in the pipeline.
>> >
>> > I'd say it's implementation-specific. As I mentioned the i486 aborted
>> > any
On Mon, 23 Feb 2015, Andy Lutomirski wrote:
> >> After a context switch, the instructions from the old task are no
> >> longer in the pipeline.
> >
> > I'd say it's implementation-specific. As I mentioned the i486 aborted
> > any transcendental x87 instruction in progress upon taking an exceptio
On Mon, 23 Feb 2015, Linus Torvalds wrote:
> We have one traditional special case, which actually did something
> like Maciej's nightmare scenario: the completely broken "FPU errors
> over irq13" IBM PC/AT FPU linkage.
>
> But since we don't actually support old i386 machines any more, we
> don't
On Mon, Feb 23, 2015 at 4:56 PM, Maciej W. Rozycki wrote:
> On Mon, 23 Feb 2015, Linus Torvalds wrote:
>
>> We have one traditional special case, which actually did something
>> like Maciej's nightmare scenario: the completely broken "FPU errors
>> over irq13" IBM PC/AT FPU linkage.
>>
>> But sinc
On Mon, Feb 23, 2015 at 2:27 PM, Maciej W. Rozycki wrote:
> On Mon, 23 Feb 2015, Rik van Riel wrote:
>
>> > I meant something else -- a slow FPU instruction can retire after a
>> > task has been switched where the FP context has been left intact,
>> > i.e. in the lazy FP context switching case, wh
On Mon, 23 Feb 2015, Rik van Riel wrote:
> > I meant something else -- a slow FPU instruction can retire after a
> > task has been switched where the FP context has been left intact,
> > i.e. in the lazy FP context switching case, where only the MMU
> > context and GPRs have been replaced.
>
> I
On Mon, Feb 23, 2015 at 1:21 PM, Rik van Riel wrote:
>
> On 02/23/2015 04:17 PM, Maciej W. Rozycki wrote:
>>>
>>> It seems highly unlikely to me that a slow FPU instruction can
>>> retire *after* a subsequent fxsave, which would need to happen
>>> for this to work.
>>
>> I meant something else --
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 04:17 PM, Maciej W. Rozycki wrote:
> On Sat, 21 Feb 2015, Andy Lutomirski wrote:
>
>>> Additionally I believe long-executing FPU instructions (i.e.
>>> transcendentals) can take advantage of continuing to execute in
>>> parallel where t
On Sat, 21 Feb 2015, Andy Lutomirski wrote:
> > Additionally I believe long-executing FPU instructions (i.e.
> > transcendentals) can take advantage of continuing to execute in parallel
> > where the context has already been switched rather than stalling an eager
> > FPU context switch until the
On 02/23, Rik van Riel wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 02/23/2015 10:11 AM, Borislav Petkov wrote:
> > On Mon, Feb 23, 2015 at 03:59:29PM +0100, Oleg Nesterov wrote:
> >> Well, but if we want this change then perhaps we should simply
> >> change the default value?
On Mon, Feb 23, 2015 at 10:51:26AM -0500, Rik van Riel wrote:
> However, we would still need the rest of the kernel code to ...
Yeah, let's wait out first and see what the benchmarks say. Mel started
a bunch of them on a couple of boxes here, we'll have results in the
coming days.
--
Regards/Gru
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 10:11 AM, Borislav Petkov wrote:
> On Mon, Feb 23, 2015 at 03:59:29PM +0100, Oleg Nesterov wrote:
>> Well, but if we want this change then perhaps we should simply
>> change the default value? This way "AUTO" still can work.
>
> Yeah, su
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 10:03 AM, Borislav Petkov wrote:
> On Mon, Feb 23, 2015 at 07:51:04AM -0500, Rik van Riel wrote:
>> At that point we either load the FPU context, or we set CR0.TS.
>
> Right, but provided eager doesn't bring any slowdown, we can drop
> t
On Mon, Feb 23, 2015 at 03:59:29PM +0100, Oleg Nesterov wrote:
> Well, but if we want this change then perhaps we should simply change
> the default value? This way "AUTO" still can work.
Yeah, sure, let's do some measurements first, to see whether this is
even worth it.
Btw, Mel pointed me at so
On Mon, Feb 23, 2015 at 07:51:04AM -0500, Rik van Riel wrote:
> At that point we either load the FPU context, or we
> set CR0.TS.
Right, but provided eager doesn't bring any slowdown, we can drop the TS
fiddling altogether and only load FPU context.
--
Regards/Gruss,
Boris.
ECO tip #101: Tr
On 02/20, Andy Lutomirski wrote:
>
> We have eager and lazy fpu modes, introduced in:
>
> 304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
> xsave
>
> The result is rather messy. There are two code paths in almost all of the
> FPU code, and only one of them (the eager c
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 12:22 AM, Andy Lutomirski wrote:
> On Sun, Feb 22, 2015 at 5:45 PM, Rik van Riel
> wrote:
>> One implication of this is that in kernel mode, we can no longer
>> just assume that the user space FPU state is always loaded, and
>> we need
On Sun, Feb 22, 2015 at 5:45 PM, Rik van Riel wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 02/22/2015 06:06 AM, Borislav Petkov wrote:
>> On Sat, Feb 21, 2015 at 06:18:01PM -0800, Andy Lutomirski wrote:
>>> That's true. The question is whether there are enough of them,
>>> and
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/22/2015 06:06 AM, Borislav Petkov wrote:
> On Sat, Feb 21, 2015 at 06:18:01PM -0800, Andy Lutomirski wrote:
>> That's true. The question is whether there are enough of them,
>> and whether twiddling TS is fast enough, that it's worth it.
>
> Ye
On Sun, Feb 22, 2015 at 01:57:36PM +0100, Ingo Molnar wrote:
> This is also very similar to the ~0.6 secs improvement your
> first set of numbers gave.
Yeah, running without --repeat was simply misleading.
> So now that it appears we have consistent numbers, it would
> be nice to check it on ol
* Borislav Petkov wrote:
> Lazy FPU:
> 219.406449195 seconds time elapsed
>( +- 0.17% )
> Eager FPU:
> 218.791122148 seconds time elapsed
>( +- 0.13% )
> Timing improvement of 0.6 secs on average
On Sun, Feb 22, 2015 at 09:18:40AM +0100, Ingo Molnar wrote:
> - It might make sense to do a 'perf stat --null --repeat'
> measurement as well [without any -e arguments], to make
> sure the rich PMU stats you are gathering are not
> interfering?
Well, the --repeat thing definitely
On Sat, Feb 21, 2015 at 06:18:01PM -0800, Andy Lutomirski wrote:
> That's true. The question is whether there are enough of them, and
> whether twiddling TS is fast enough, that it's worth it.
Yes, and let me make it clear what I'm trying to do here: I want to make
sure that eager FPU handling (b
On Sun, Feb 22, 2015 at 09:18:40AM +0100, Ingo Molnar wrote:
> So am I interpreting the older and your latest numbers
> correctly in stating that the cost observation has flipped
> around 180 degrees: the first measurement showed eager FPU
> to be a win, but now that we can do more precise
> me
* Ingo Molnar wrote:
> - Do you have enough RAM that there's essentially no IO
> in the system worth speaking of? Do you have enough RAM
> to copy a whole kernel tree to /tmp/linux/ and do the
> measurement there, on ramfs?
Doing that will also pin down the page cache: kernel buil
* Borislav Petkov wrote:
> which spit this:
>
> Lazy FPU:
> 219.127929718 seconds time elapsed
> Eager FPU:
> 220.148034331 seconds time elapsed
> so we have a second slowdown and 200K FPU saves more in eager mode.
So am I interpreting the older and your latest numbers
correctly i
On Sat, Feb 21, 2015 at 4:34 PM, Maciej W. Rozycki wrote:
> On Sat, 21 Feb 2015, Borislav Petkov wrote:
>
>> Provided I've not made a mistake, this leads me to think that this
>> simple workload and pretty much everything else uses the FPU through
>> glibc which does the SSE memcpy and so on. Whic
On Sat, 21 Feb 2015, Borislav Petkov wrote:
> Provided I've not made a mistake, this leads me to think that this
> simple workload and pretty much everything else uses the FPU through
> glibc which does the SSE memcpy and so on. Which basically kills the
> whole idea behind lazy FPU as practically
On Sat, Feb 21, 2015 at 08:23:52PM +0100, Ingo Molnar wrote:
> to switch between the modes?
I went all out and did a debugfs file, see patch at the end, which
counts FPU saves. Then I ran this script:
---
#!/bin/bash
D="/sys/kernel/debug/fpu/eager"
echo "Lazy FPU: "
echo 0 > $D
echo -n " FPU s
* Borislav Petkov wrote:
> > I'd sleep a lot better if we had some runtime debug
> > flag to be able to do run-to-run comparisons on the
> > same booted up kernel, or so.
>
> Let me take a look whether we could so some knob... The
> nice thing is, code uses use_eager_fpu() to check stuff
>
On Sat, Feb 21, 2015 at 07:39:52PM +0100, Ingo Molnar wrote:
> So the workload improved by ~600,000 usecs, and there's
> 68,000 less calls, so it saved 8.8 usecs per call. Isn't
I think you mean more calls. The eager measurement has more calls. Let
me do some primitive math:
def =(234.68133120
* Borislav Petkov wrote:
> On Sat, Feb 21, 2015 at 05:38:40PM +0100, Borislav Petkov wrote:
> > My assumption is that libc uses SSE for memcpy and thus the FPU will
> > be used. (I'll trace FPU-specific PMCs later to confirm).
>
> Ok, so I slapped a trace_printk() at the beginning of fpu_save_i
* Borislav Petkov wrote:
> plain 3.19:
>
> 234.681331200 seconds time elapsed
>( +- 0.15% )
>
> eagerfpu=ENABLE
>
> 234.066525648 seconds time elapsed
>( +- 0.19% )
hm, a win of more than 600 milli
On Sat, Feb 21, 2015 at 05:38:40PM +0100, Borislav Petkov wrote:
> My assumption is that libc uses SSE for memcpy and thus the FPU will
> be used. (I'll trace FPU-specific PMCs later to confirm).
Ok, so I slapped a trace_printk() at the beginning of fpu_save_init()
and did a kernel build once with
On Sat, Feb 21, 2015 at 10:31:50AM +0100, Ingo Molnar wrote:
> So it would be nice to test this on at least one reasonably old (but
> not uncomfortably old - say 5 years old) system, to get a feel for
> what kind of performance impact it has there.
Yeah, this is exactly what Andy and I were talkin
* Andy Lutomirski wrote:
> We have eager and lazy fpu modes, introduced in:
>
> 304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
> xsave
>
> The result is rather messy. There are two code paths in
> almost all of the FPU code, and only one of them (the
> eager case
+ Linus.
I'm sure he'll save something to say about it :-)
On Fri, Feb 20, 2015 at 10:58:15AM -0800, Andy Lutomirski wrote:
> We have eager and lazy fpu modes, introduced in:
>
> 304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
> xsave
>
> The result is rather messy.
We have eager and lazy fpu modes, introduced in:
304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting xsave
The result is rather messy. There are two code paths in almost all of the
FPU code, and only one of them (the eager case) is tested frequently, since
most kernel devel
45 matches
Mail list logo