Worse code generation for FPU on versions after 6

jakub at gcc dot gnu.org Mon, 20 Feb 2017 00:42:13 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79593


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
That said, the reason why there is fld1 followed by fld %st(0) is that 1.0 is
used multiple times:
(insn 41 64 42 8 (set (reg:SF 114)
        (mem/u/c:SF (symbol_ref/u:SI ("*.LC1") [flags 0x2]) [4  S4 A32]))
"pr79593.c":17 125 {*movsf_internal}
     (expr_list:REG_EQUAL (const_double:SF 1.0e+0 [0x0.8p+1])
        (nil)))
(insn 42 41 43 8 (set (reg:XF 118 [ delta ])
        (float_extend:XF (reg:SF 114))) "pr79593.c":17 153 {*extendsfxf2_i387}
     (expr_list:REG_EQUAL (const_double:XF 1.0e+0 [0x0.8p+1])
        (nil)))
...
(insn 69 65 47 9 (set (reg:XF 110 [ delta ])
        (float_extend:XF (reg:SF 114))) "pr79593.c":17 153 {*extendsfxf2_i387}
     (expr_list:REG_DEAD (reg:SF 114)
        (expr_list:REG_EQUAL (const_double:XF 1.0e+0 [0x0.8p+1])
            (nil))))
in multiple basic blocks with conditional jump in between, so the combiner
doesn't combine it into (set (reg:XF ...)) (const_double:XF 1.0e+0).
Still in *.peephole2 we have:
(insn 82 64 42 8 (set (reg:SF 10 st(2) [114])
        (const_double:SF 1.0e+0 [0x0.8p+1])) "pr79593.c":17 125
{*movsf_internal}
     (expr_list:REG_EQUAL (const_double:SF 1.0e+0 [0x0.8p+1])
        (nil)))
(insn 42 82 83 8 (set (reg:XF 9 st(1) [orig:118 delta ] [118])
        (float_extend:XF (reg:SF 10 st(2) [114]))) "pr79593.c":17 153
{*extendsfxf2_i387}
     (expr_list:REG_EQUIV (const_double:XF 1.0e+0 [0x0.8p+1])
        (nil)))
...
(insn 69 65 47 9 (set (reg:XF 8 st [orig:110 delta ] [110])
        (float_extend:XF (reg:SF 10 st(2) [114]))) "pr79593.c":17 153
{*extendsfxf2_i387}
     (expr_list:REG_DEAD (reg:SF 10 st(2) [114])
        (expr_list:REG_EQUAL (const_double:XF 1.0e+0 [0x0.8p+1])
            (nil))))
It is only the regstack pass that optimizes those 2 into 1, but that isn't able
to peephole or otherwise combine:
(insn:TI 82 64 42 7 (set (reg:SF 8 st)
        (const_double:SF 1.0e+0 [0x0.8p+1])) "pr79593.c":17 125
{*movsf_internal}
     (expr_list:REG_EQUAL (const_double:SF 1.0e+0 [0x0.8p+1])
        (nil)))
(insn:TI 42 82 83 7 (set (reg:XF 8 st)
        (float_extend:XF (reg:SF 8 st))) "pr79593.c":17 153 {*extendsfxf2_i387}
     (expr_list:REG_EQUIV (const_double:XF 1.0e+0 [0x0.8p+1])
        (nil)))
and there is no peephole2 pass afterwards, so either regstack itself would need
to do this, or the machine reorg pass.

Still no idea why this is considered a regression, I get with gcc 5.4.1
20160721
        subl    $12, %esp
        fldz
        movl    16(%esp), %edx
        movl    20(%esp), %eax
        cmpl    %eax, (%edx)
        jbe     .L2
        flds    global_data
        flds    global_data+4
        fxch    %st(2)
        fcomp   %st(1)
        fnstsw  %ax
        sahf
        ja      .L13
        fxch    %st(1)
        fsubrs  4(%edx)
.L5:
        fdivp   %st, %st(1)
        ftst
        fnstsw  %ax
        sahf
        jnb     .L6
        fstp    %st(0)
        fldz
.L6:
        fld1
        fld     %st(0)
        fcomp   %st(2)
        fnstsw  %ax
        sahf
        jnb     .L14
        fstp    %st(1)
        jmp     .L7
        .p2align 4,,10
        .p2align 3
.L14:
        fstp    %st(0)
.L7:
.L2:
        addl    $12, %esp
        ret

[Bug target/79593] [6/7 Regression] Poor/Worse code generation for FPU on versions after 6

Reply via email to