On Tue, Jul 02, 2013 at 08:39:12AM +0200, Ingo Molnar wrote:
> Yeah - I didn't know your CPU count, -j64 is what I use.
Right, but the -j make jobs argument - whenever it is higher than the
core count - shouldn't matter too much to the workload because all those
threads remain runnable but simply
* Borislav Petkov wrote:
> On Mon, Jul 01, 2013 at 03:35:47PM -0700, Wedson Almeida Filho wrote:
> > On Mon, Jul 1, 2013 at 3:28 PM, Borislav Petkov wrote:
> > >
> > > perf stat --repeat 10 -a --sync --pre 'make -s clean; echo 1 >
> > > /proc/sys/vm/drop_caches' make -s -j64 bzImage
> >
> > H
On Mon, Jul 01, 2013 at 03:35:47PM -0700, Wedson Almeida Filho wrote:
> On Mon, Jul 1, 2013 at 3:28 PM, Borislav Petkov wrote:
> >
> > perf stat --repeat 10 -a --sync --pre 'make -s clean; echo 1 >
> > /proc/sys/vm/drop_caches' make -s -j64 bzImage
>
> How many CPUs do you have in your system? M
On Mon, Jul 1, 2013 at 3:28 PM, Borislav Petkov wrote:
>
> perf stat --repeat 10 -a --sync --pre 'make -s clean; echo 1 >
> /proc/sys/vm/drop_caches' make -s -j64 bzImage
How many CPUs do you have in your system? Maybe -j64 vs -jNUM_CPUs
affects your measurements as well.
--
To unsubscribe from
On Mon, Jul 01, 2013 at 04:48:51PM +0200, Borislav Petkov wrote:
> And yes, this way we don't see the speedup - numbers are almost the
> same. Now on to find out why do I see a speedup with my way of running
> the trace.
Ok, I think I know what happens:
When I do:
perf stat --repeat 10 -a --sync
Right... brain not awake yet, sorry, and responding from my phone.
I am not to worried about trading off a bit of additional branch mispredicts if
everything else wins, but Ingo does have a valid question if we are measuring
the right thing.
The effect is definitely big enough that it would be
On Mon, Jul 01, 2013 at 02:50:46PM +0200, Ingo Molnar wrote:
> > Yep, I didn't run -a since I wanted to trace only the build process.
> > Btw, the build-kernel.sh script looks like this:
> >
> > #!/bin/bash
> >
> > NUM_CPUS=$(cat /proc/cpuinfo | grep processor | wc -l)
> > MAKE_OPTS=-j$(($NUM_CPU
On Mon, Jul 01, 2013 at 07:45:41AM -0700, H. Peter Anvin wrote:
> Right... brain not awake yet, sorry, and responding from my phone.
>
> I am not to worried about trading off a bit of additional branch mispredicts
> if everything else wins, but Ingo does have a valid question if we are
> measuri
On Mon, Jul 01, 2013 at 07:30:14AM -0700, H. Peter Anvin wrote:
> Unconditional branches don't need prediction. The branch predictor
> is used for conditional branches and in some hardware designs for
> indirect branches. Unconditional direct branches never go through
> the branch predictor simply
Unconditional branches don't need prediction. The branch predictor is used for
conditional branches and in some hardware designs for indirect branches.
Unconditional direct branches never go through the branch predictor simply
because the front end can know with 100% certainty where the flow o
* Borislav Petkov wrote:
> On Mon, Jul 01, 2013 at 01:11:22PM +0200, Ingo Molnar wrote:
> > Hm, a 6 seconds win looks _way_ too much - we don't execute that much
> > mutex code, let alone a portion of it.
> >
> > This could perhaps be a bootup-to-bootup cache layout systematic jitter
> > arti
On Mon, Jul 01, 2013 at 01:11:22PM +0200, Ingo Molnar wrote:
> Hm, a 6 seconds win looks _way_ too much - we don't execute that much
> mutex code, let alone a portion of it.
>
> This could perhaps be a bootup-to-bootup cache layout systematic jitter
> artifact, which isn't captured by stddev obs
* Borislav Petkov wrote:
> On Mon, Jul 01, 2013 at 09:50:46AM +0200, Ingo Molnar wrote:
> > Not sure - the main thing we want to know is whether it gets faster.
> > The _amount_ will depend on things like precise usage patterns,
> > caching, etc. - but rarely does a real workload turn a win like
On Mon, Jul 01, 2013 at 09:50:46AM +0200, Ingo Molnar wrote:
> Not sure - the main thing we want to know is whether it gets faster.
> The _amount_ will depend on things like precise usage patterns,
> caching, etc. - but rarely does a real workload turn a win like this
> into a loss.
Yep, and it do
* Borislav Petkov wrote:
> On Sat, Jun 29, 2013 at 04:56:30PM -0700, Wedson Almeida Filho wrote:
> > On Fri, Jun 28, 2013 at 7:09 AM, Borislav Petkov wrote:
> >
> > > Btw, do we have any perf data showing any improvements from this patch?
> >
> > I wrote a simple test the measures the time it
On Sat, Jun 29, 2013 at 04:56:30PM -0700, Wedson Almeida Filho wrote:
> On Fri, Jun 28, 2013 at 7:09 AM, Borislav Petkov wrote:
>
> > Btw, do we have any perf data showing any improvements from this patch?
>
> I wrote a simple test the measures the time it takes to acquire and
> release an unco
On Fri, Jun 28, 2013 at 7:09 AM, Borislav Petkov wrote:
> Btw, do we have any perf data showing any improvements from this patch?
I wrote a simple test the measures the time it takes to acquire and
release an uncontended mutex (i.e., we always take the fast path)
100k times. I ran it a few time
On Fri, Jun 28, 2013 at 07:12:18AM -0700, H. Peter Anvin wrote:
> On 06/28/2013 07:09 AM, Borislav Petkov wrote:
> >
> > Our testing for asm goto otherwise is a bit more, hmm, hands-on in
> > arch/x86/include/asm/cpufeature.h:
> >
> > #if __GNUC__ > 4 || __GNUC_MINOR__ >= 5
> >
> > Maybe I shoul
On 06/28/2013 07:09 AM, Borislav Petkov wrote:
>
> Our testing for asm goto otherwise is a bit more, hmm, hands-on in
> arch/x86/include/asm/cpufeature.h:
>
> #if __GNUC__ > 4 || __GNUC_MINOR__ >= 5
>
> Maybe I should change that to the more explicit CC_HAVE_ASM_GOTO then.
>
We should... we di
On Fri, Jun 28, 2013 at 01:19:48PM +0200, Ingo Molnar wrote:
>
> * Wedson Almeida Filho wrote:
>
> > The new implementation allows the compiler to better optimize the code; the
> > original implementation is still used when the kernel is compiled with older
> > versions of gcc that don't support
* Wedson Almeida Filho wrote:
> The new implementation allows the compiler to better optimize the code; the
> original implementation is still used when the kernel is compiled with older
> versions of gcc that don't support asm-goto.
>
> Compiling with gcc 4.7.3, the original mutex_lock() is 60
The new implementation allows the compiler to better optimize the code; the
original implementation is still used when the kernel is compiled with older
versions of gcc that don't support asm-goto.
Compiling with gcc 4.7.3, the original mutex_lock() is 60 bytes with the fast
path taking 16 instruc
22 matches
Mail list logo