On Mon, Apr 29, 2024 at 4:26 PM Lucier, Bradley J via Gcc <gcc@gcc.gnu.org> wrote: > > The question: How to interpret scheduling info with the compiler listed below. > > Specifically, a tight loop that was reported to be scheduled in 23 cycles (as > I understand it) actually executes in a little over 2 cycles per loop, as I > interpret two separate experiments. > > Am I misinterpreting something here?
Yes, the schedule mode in use here is the cortex-a53 one ... as evidenced by "cortex_a53_slot_" in the dump. Most aarch64 cores don't have a schedule model associated with it. Especially when it comes cores that don't have not been upstream directly from the company that produces them. The default scheduling model is cortex-a53 anyways. And you didn't use -mtune= nor -mcpu=; only -march=native which just changes the arch features and not the tuning or scheduler model. Thanks, Andrew Pinski > > Thanks. > > Brad > > The compiler: > > [MacBook-Pro:~/programs/gambit/gambit-feeley] lucier% gcc-13 -v > Using built-in specs. > COLLECT_GCC=gcc-13 > COLLECT_LTO_WRAPPER=/opt/homebrew/Cellar/gcc/13.2.0/bin/../libexec/gcc/aarch64-apple-darwin23/13/lto-wrapper > Target: aarch64-apple-darwin23 > Configured with: ../configure --prefix=/opt/homebrew/opt/gcc > --libdir=/opt/homebrew/opt/gcc/lib/gcc/current --disable-nls > --enable-checking=release --with-gcc-major-version-only > --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-13 > --with-gmp=/opt/homebrew/opt/gmp --with-mpfr=/opt/homebrew/opt/mpfr > --with-mpc=/opt/homebrew/opt/libmpc --with-isl=/opt/homebrew/opt/isl > --with-zstd=/opt/homebrew/opt/zstd --with-pkgversion='Homebrew GCC 13.2.0' > --with-bugurl=https://github.com/Homebrew/homebrew-core/issues > --with-system-zlib --build=aarch64-apple-darwin23 > --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk > --with-ld=/Library/Developer/CommandLineTools/usr/bin/ld-classic > Thread model: posix > Supported LTO compression algorithms: zlib zstd > gcc version 13.2.0 (Homebrew GCC 13.2.0) > > (so perhaps not the standard gcc). > > The command line (cut down a bit) is > > gcc-13 -save-temps -fverbose-asm -fdump-rtl-sched2 -O1 > -fexpensive-optimizations -fno-gcse -Wno-unused -Wno-write-strings > -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math > -fno-math-errno -fschedule-insns2 -foptimize-sibling-calls > -fomit-frame-pointer -fipa-ra -fmove-loop-invariants -march=native -fPIC > -fno-common -I"../include" -c -o _num.o -I. _num.c -D___LIBRARY > > The scheduling report for the loop is > > ;; ====================================================== > ;; -- basic block 10 from 39 to 70 -- after reload > ;; ====================================================== > > ;; 0--> b 0: i 39 x4=x2+x7 > :cortex_a53_slot_any > ;; 0--> b 0: i 46 x1=zxn([sxn(x2)*0x4+x8]) > :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load > ;; 3--> b 0: i 45 x9=zxn([sxn(x4)*0x4+x3]) > :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load > ;; 7--> b 0: i 47 x1=zxn(x6)*zxn(x1)+x9 > :(cortex_a53_slot_any+cortex_a53_imul) > ;; 9--> b 0: i 48 x1=x1+x5 > :cortex_a53_slot_any > ;; 9--> b 0: i 53 x5=x12+x2 > :cortex_a53_slot_any > ;; 10--> b 0: i 50 [sxn(x4)*0x4+x3]=x1 > :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store > ;; 10--> b 0: i 57 x4=x2+0x1 > :cortex_a53_slot_any > ;; 11--> b 0: i 67 x2=x2+0x2 > :cortex_a53_slot_any > ;; 12--> b 0: i 60 x9=zxn([sxn(x5)*0x4+x3]) > :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load > ;; 13--> b 0: i 61 x4=zxn([sxn(x4)*0x4+x8]) > :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load > ;; 17--> b 0: i 62 x4=zxn(x6)*zxn(x4)+x9 > :(cortex_a53_slot_any+cortex_a53_imul) > ;; 20--> b 0: i 63 x1=x1 0>>0x20+x4 > :cortex_a53_slot_any > ;; 20--> b 0: i 65 [sxn(x5)*0x4+x3]=x1 > :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store > ;; 22--> b 0: i 66 x5=x1 0>>0x20 > :cortex_a53_slot_any > ;; 22--> b 0: i 69 cc=cmp(x11,x2) > :cortex_a53_slot_any > ;; 23--> b 0: i 70 pc={(cc>0)?L68:pc} > :(cortex_a53_slot_any+cortex_a53_branch) > ;; Ready list (final): > ;; total time = 23 > ;; new head = 39 > ;; new tail = 70 >