https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115353
Bug ID: 115353 Summary: Missed thumb2 table branch instruction optimisations Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gus at projectgus dot com Target Milestone: --- Created attachment 58351 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58351&action=edit Minimal test case that previously generated tbb After updating to gcc 14.1 we noticed that many jump tables were no longer being optimised to use Thumb2 table branch instructions (tbb/tbh) compared to 13.2. A bisect shows the problem seems to have been introduced by 7006e5d2d7 "arm: Use deltas for Arm switch tables". ## Versions The 14.1 build we were using was: > Target: arm-none-eabi > Configured with: /build/arm-none-eabi-gcc/src/gcc-14.1.0/configure > --target=arm-none-eabi --prefix=/usr --with-sysroot=/usr/arm-none-eabi > --with-native-system-header-dir=/include --libexecdir=/usr/lib > --enable-languages=c,c++ --enable-plugins --disable-decimal-float > --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath > --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared > --disable-threads --disable-tls --with-gnu-as --with-gnu-ld > --with-system-zlib --with-newlib --with-headers=/usr/arm-none-eabi/include > --with-python-dir=share/gcc-arm-none-eabi --with-gmp --with-mpfr --with-mpc > --with-isl --with-libelf --enable-gnu-indirect-function > --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' > --with-pkgversion='Arch Repository' > --with-bugurl=https://gitlab.archlinux.org/archlinux/packaging/packages/arm-none-eabi-gcc/-/issues > --with-multilib-list=rmprofile > gcc version 14.1.0 (Arch Repository) Local bisect builds are configured slightly differently: > Target: arm-none-eabi > Configured with: /home/gus/dev/gcc/configure --target=arm-none-eabi > --prefix=/home/gus/ry/george/tmp/gcc-temp-7006e5d2 > --with-sysroot=/home/gus/ry/george/tmp/gcc-temp-7006e5d2/arm-none-eabi > --enable-languages=c --enable-plugins --disable-decimal-float > --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath > --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared > --disable-threads --disable-tls --with-gnu-as --with-gnu-ld > --with-system-zlib --with-newlib > --with-headers=/home/gus/ry/george/tmp/gcc-temp-7006e5d2/arm-none-eabi/include > --with-python-dir=share/gcc-arm-none-eabi --with-gmp --with-mpfr --with-mpc > --with-isl --with-libelf --enable-gnu-indirect-function > --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' > --with-multilib-list=rmprofile > gcc version 14.0.0 20231026 (experimental) (GCC) ## Test case Will attach two minimal test cases, one for tbb and one with slightly larger jump table range for tbh. Compiled with "arm-none-eabi-gcc -mcpu=cortex-m4 -Os -Wall -Wextra". GCC release 13.2 and parent commit of 7006e5d2d7 both optimise to table branch instructions, i.e. > jump_around: > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > push {r3, lr} > cmp r0, #6 > bhi .L9 > tbb [pc, r0] >.L4: > .byte (.L10-.L4)/2 > .byte (.L11-.L4)/2 > .byte (.L8-.L4)/2 > .byte (.L7-.L4)/2 > .byte (.L6-.L4)/2 > .byte (.L5-.L4)/2 > .byte (.L3-.L4)/2 > .p2align 1 gcc commit 7006e5d2d7, release 14.1, and recent master branch all generate PC address loads, i.e. >jump_around: > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > push {r3, lr} > cmp r0, #5 > bhi .L8 > adr r3, .L4 > ldr pc, [r3, r0, lsl #2] > .p2align 2 >.L4: > .word .L9+1 > .word .L10+1 > .word .L7+1 > .word .L6+1 > .word .L5+1 > .word .L3+1 > .p2align 1 For large jump tables the additional overhead of 2x or 4x code size per entry adds up.