https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115353

            Bug ID: 115353
           Summary: Missed thumb2 table branch instruction optimisations
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gus at projectgus dot com
  Target Milestone: ---

Created attachment 58351
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58351&action=edit
Minimal test case that previously generated tbb

After updating to gcc 14.1 we noticed that many jump tables were no longer
being optimised to use Thumb2 table branch instructions (tbb/tbh) compared to
13.2.

A bisect shows the problem seems to have been introduced by 7006e5d2d7 "arm:
Use deltas for Arm switch tables".

## Versions

The 14.1 build we were using was:

> Target: arm-none-eabi
> Configured with: /build/arm-none-eabi-gcc/src/gcc-14.1.0/configure 
> --target=arm-none-eabi --prefix=/usr --with-sysroot=/usr/arm-none-eabi 
> --with-native-system-header-dir=/include --libexecdir=/usr/lib 
> --enable-languages=c,c++ --enable-plugins --disable-decimal-float 
> --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath 
> --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared 
> --disable-threads --disable-tls --with-gnu-as --with-gnu-ld 
> --with-system-zlib --with-newlib --with-headers=/usr/arm-none-eabi/include 
> --with-python-dir=share/gcc-arm-none-eabi --with-gmp --with-mpfr --with-mpc 
> --with-isl --with-libelf --enable-gnu-indirect-function 
> --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' 
> --with-pkgversion='Arch Repository' 
> --with-bugurl=https://gitlab.archlinux.org/archlinux/packaging/packages/arm-none-eabi-gcc/-/issues
>  --with-multilib-list=rmprofile
> gcc version 14.1.0 (Arch Repository) 

Local bisect builds are configured slightly differently:

> Target: arm-none-eabi
> Configured with: /home/gus/dev/gcc/configure --target=arm-none-eabi 
> --prefix=/home/gus/ry/george/tmp/gcc-temp-7006e5d2 
> --with-sysroot=/home/gus/ry/george/tmp/gcc-temp-7006e5d2/arm-none-eabi 
> --enable-languages=c --enable-plugins --disable-decimal-float 
> --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath 
> --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared 
> --disable-threads --disable-tls --with-gnu-as --with-gnu-ld 
> --with-system-zlib --with-newlib 
> --with-headers=/home/gus/ry/george/tmp/gcc-temp-7006e5d2/arm-none-eabi/include
>  --with-python-dir=share/gcc-arm-none-eabi --with-gmp --with-mpfr --with-mpc 
> --with-isl --with-libelf --enable-gnu-indirect-function 
> --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' 
> --with-multilib-list=rmprofile
> gcc version 14.0.0 20231026 (experimental) (GCC) 

## Test case

Will attach two minimal test cases, one for tbb and one with slightly larger
jump table range for tbh.

Compiled with "arm-none-eabi-gcc -mcpu=cortex-m4 -Os -Wall -Wextra".

GCC release 13.2 and parent commit of 7006e5d2d7 both optimise to table branch
instructions, i.e.

> jump_around:
>        @ args = 0, pretend = 0, frame = 0
>        @ frame_needed = 0, uses_anonymous_args = 0
>        push    {r3, lr}
>        cmp     r0, #6
>        bhi     .L9
>        tbb     [pc, r0]
>.L4:
>        .byte   (.L10-.L4)/2
>        .byte   (.L11-.L4)/2
>        .byte   (.L8-.L4)/2
>        .byte   (.L7-.L4)/2
>        .byte   (.L6-.L4)/2
>        .byte   (.L5-.L4)/2
>        .byte   (.L3-.L4)/2
>        .p2align 1

gcc commit 7006e5d2d7, release 14.1, and recent master branch all generate PC
address loads, i.e.

>jump_around:
>       @ args = 0, pretend = 0, frame = 0
>       @ frame_needed = 0, uses_anonymous_args = 0
>       push    {r3, lr}
>       cmp     r0, #5
>       bhi     .L8
>       adr     r3, .L4
>       ldr     pc, [r3, r0, lsl #2]
>       .p2align 2
>.L4:
>       .word   .L9+1
>       .word   .L10+1
>       .word   .L7+1
>       .word   .L6+1
>       .word   .L5+1
>       .word   .L3+1
>       .p2align 1

For large jump tables the additional overhead of 2x or 4x code size per entry
adds up.

Reply via email to