On Thu, Feb 02, 2017 at 06:04:56PM +0100, Borislav Petkov wrote:
> I think I stole it from you from some mail thread we had in the past.

Yap, --all-cpus is a bit better in that the difference between the two
kernels is smaller.

For some reason, though, with the patch the workload is a bit slower.
We have more cycles, more branches, ... It is only 2 sec slower though.
I think that's probably because it is the first Bulldozer uarch and
when you run it on newer versions of the uarch, it is better, due to
improvements in the uarch.

Yazen, what BD generation is your machine?

I have one more Bulldozer box: rev C0 on which I could run this over the
weekend.

./tools/perf/perf stat -a -e 
task-clock,context-switches,cache-misses,cpu-migrations,page-faults,cycles,instructions,branches,branch-misses
 --repeat 3 --sync --pre ~/bin/pre-build-kernel.sh -- make -s -j17 bzImage

before:

 Performance counter stats for 'system wide' (3 runs):

    2279512.230871      task-clock (msec)         #   15.999 CPUs utilized      
      ( +-  0.40% )
           714,492      context-switches          #    0.313 K/sec              
      ( +-  0.19% )
     6,726,972,836      cache-misses                                            
      ( +-  0.15% )
            56,490      cpu-migrations            #    0.025 K/sec              
      ( +-  2.98% )
        27,794,829      page-faults               #    0.012 M/sec              
      ( +-  0.04% )
 3,719,570,726,045      cycles                    #    1.632 GHz                
      ( +-  0.06% )
 2,146,930,432,417      instructions              #    0.58  insn per cycle     
                                         ( +-  0.05% )
   476,587,085,009      branches                  #  209.074 M/sec              
      ( +-  0.06% )
    25,286,321,575      branch-misses             #    5.31% of all branches    
      ( +-  0.07% )

     142.475046735 seconds time elapsed                                         
 ( +-  0.40% )

after:

 Performance counter stats for 'system wide' (3 runs):

    2312821.267459      task-clock (msec)         #   16.000 CPUs utilized      
      ( +-  0.20% )
           760,839      context-switches          #    0.329 K/sec              
      ( +-  0.29% )
     6,769,543,062      cache-misses                                            
      ( +-  0.05% )
            68,785      cpu-migrations            #    0.030 K/sec              
      ( +-  0.75% )
        27,828,222      page-faults               #    0.012 M/sec              
      ( +-  0.04% )
 3,725,704,384,061      cycles                    #    1.611 GHz                
      ( +-  0.06% )
 2,149,336,525,435      instructions              #    0.58  insn per cycle     
                                         ( +-  0.01% )
   477,157,066,501      branches                  #  206.310 M/sec              
      ( +-  0.01% )
    25,289,357,158      branch-misses             #    5.30% of all branches    
      ( +-  0.07% )

     144.551731453 seconds time elapsed                                         
 ( +-  0.20% )

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Reply via email to