On Thu, Oct 15, 2020 at 9:33 PM Andi Kleen <a...@firstfloor.org> wrote: > On Thu, Oct 15, 2020 at 05:53:40PM +0300, Or Gerlitz wrote: > > Earlier Intel processors (e.g E5-2650) support the more of classical > > two stall events (for backend and frontend [1]) and then perf shows > > the nice measure of stalled cycles per instruction - e.g here where we > > have IPC of 0.91 and CSPI (see [2]) of 0.68: > > Don't use it. It's misleading on a out-of-order CPU because you don't > know if it's actually limiting anything. > > If you want useful bottleneck data use --topdown.
So running again, this time with the below params, I got this output where all the right most column is colored red. I wonder what can be said on the amount/ratio of stalls for this app - if you can maybe recommend some posts of yours to better understand that, I saw some comment in the perf-stat man page and some lwn article but wasn't really able to figure it out. FWIW, the kernel is 5.5.7-100.fc30.x86_64 and the CPU E5-2650 0 $ perf stat --topdown -a taskset -c 0 $APP [...] Performance counter stats for 'system wide': retiring bad speculation frontend bound backend bound S0-D0-C0 1 24.9% 1.1% 16.1% 57.9% S0-D0-C1 1 16.3% 1.3% 17.3% 65.1% S0-D0-C2 1 17.0% 1.2% 15.3% 66.5% S0-D0-C3 1 18.3% 0.8% 8.2% 72.8% S0-D0-C4 1 18.1% 0.8% 8.5% 72.6% S0-D0-C5 1 17.6% 0.8% 10.0% 71.6% S0-D0-C6 1 18.3% 0.7% 7.4% 73.6% S0-D0-C7 1 15.4% 1.4% 22.1% 61.2% S1-D0-C0 1 15.9% 1.4% 16.4% 66.3% S1-D0-C1 1 21.9% 2.6% 16.9% 58.5% S1-D0-C2 1 20.8% 3.7% 17.1% 58.4% S1-D0-C3 1 17.8% 1.0% 9.2% 72.1% S1-D0-C4 1 17.8% 1.0% 9.0% 72.2% S1-D0-C5 1 17.8% 1.0% 9.0% 72.2% S1-D0-C6 1 17.4% 1.4% 12.8% 68.4% S1-D0-C7 1 23.6% 4.3% 17.2% 55.0% 13.341823591 seconds time elapsed while running with perf stat -d gives this: $ perf stat -d taskset -c 0 $APP Performance counter stats for 'taskset -c 0 ./main.gcc9.3.1': 15,075.30 msec task-clock # 0.900 CPUs utilized 199 context-switches # 0.013 K/sec 1 cpu-migrations # 0.000 K/sec 117,987 page-faults # 0.008 M/sec 40,907,365,540 cycles # 2.714 GHz 26,431,604,986 stalled-cycles-frontend # 64.61% frontend cycles idle 21,734,615,045 stalled-cycles-backend # 53.13% backend cycles idle 35,339,765,469 instructions # 0.86 insn per cycle # 0.75 stalled cycles per insn