[Bug c/39284] New: Computed gotos combined too aggressively

jyasskin at gmail dot com Mon, 23 Feb 2009 14:42:47 -0800

The following uses a file "ceval.i" that I'll upload shortly and a nightly
build of gcc-4.4.0-20090223. The file is a version of Python's core
interpretation loop modified to use computed gotos in order to help the
processor's branch predictor do a better job than with a switch statement.
However, gcc combines the computed gotos into only a few instructions,
producing a result nearly as bad as the switch.


The computed gotos are all of the form "goto *next_label;". About half of the
instances counted are reachable, as measured by the asm volatile farther down.
$ grep -c -F 'goto *next_label' ceval.i
256

I compile with -fno-gcse because
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Optimize-Options.html#index-fgcse-646
says it helps and with "--param max-goto-duplication-insns=100000" because
Diego Novillo suggested tweaking that parameter.
$ gcc-4.4 -m32  -pthread  -fno-strict-aliasing -g -fwrapv -O3 -Wall
-Wstrict-prototypes -fno-gcse --param max-goto-duplication-insns=100000  -S -dA
ceval.i -o ceval.s
$ egrep -c 'jmp[[:space:]]*\*' ceval.s
4


I tried to make the common instruction sequence leading up to the indirection
jump shorter by putting an asm volatile before it, but that didn't work. You
can see the result by running:

$ sed 's!goto \*next_label!asm volatile ("/* Computed goto */":::"memory");goto
*next_label!' < ceval.i > ceval-asm.i
$ gcc-4.4 -m32  -pthread  -fno-strict-aliasing -g -fwrapv -O3 -fno-gcse --param
max-goto-duplication-insns=100000  -S -dA ceval-asm.i -o ceval-asm.s
$ egrep -c 'Computed goto' ceval-asm.s
130
$ egrep -c 'jmp[[:space:]]*\*' ceval-asm.s
4


One of the relevant bits of assembly (search for "Computed goto" in
ceval-asm.s) is:

.L1125:
        # basic block 66
.LBE835:
        # ../src/Python/ceval.c:1000
        .loc 1 1000 0
        jmp     *-72(%ebp)
.LVL558:
        .p2align 4,,7
        .p2align 3
.L1433:
        # basic block 67

...

        # basic block 114
#APP
# 11 "../src/Include/ceval-vm.i" 1
        /* Computed goto */
# 0 "" 2
#NO_APP
        movl    %esi, %ebx
.LVL599:
        .p2align 4,,7
        .p2align 3
.L484:
        # basic block 115
        # ../src/Python/ceval.c:1000
        .loc 1 1000 0
        movl    $1, %eax
.LVL600:
        movl    %ebx, %esi
.LVL601:
        jmp     .L1125

The documentation for max-goto-duplication-insns says:

    The maximum number of instructions to duplicate to a block that jumps to a
computed goto. To avoid O(N^2) behavior in a number of passes, GCC factors
computed gotos early in the compilation process, and unfactors them as late as
possible. Only computed jumps at the end of a basic blocks with no more than
max-goto-duplication-insns are unfactored. The default value is 8. 

Since .L1125 and basic block 66 have only a single instruction, the computed
goto should have been unfactored.

$ gcc-4.4 -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: /home/dnovillo/perf/sbox/gcc/local.x86_64/src/configure
--prefix=/home/dnovillo/perf/sbox/gcc/local.x86_64/inst
--srcdir=/home/dnovillo/perf/sbox/gcc/local.x86_64/src
--with-gmp=/home/dnovillo/i686 --with-mpfr=/home/dnovillo/i686
--build=i686-pc-linux-gnu --enable-languages=c++,fortran,objc,obj-c++
Thread model: posix
gcc version 4.4.0 20090223 (experimental) (GCC)


-- 
           Summary: Computed gotos combined too aggressively
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jyasskin at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284

[Bug c/39284] New: Computed gotos combined too aggressively

Reply via email to