https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63595

--- Comment #4 from Pat Haugen <pthaugen at gcc dot gnu.org> ---
Created attachment 33775
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33775&action=edit
unreduced bzip2 testcase

CPU2006 benchmark 447.dealII started segfaulting on PowerPC with revision
216305. Sorry for unreduced testcase, but wanted to get info out. ICF is
commoning functions, but doing so incorrectly. Atached .bzip2 file can be
compiled with 'g++ -S -m64 -O2 -mcpu=power7 tria.ii' to show the problem.

Looking at the generated assembler, the following 3 functions:
_ZNK13TriangulationILi3EE8end_faceEv
_ZNK13TriangulationILi3EE7end_hexEv
_ZNK13TriangulationILi3EE3endEv
have all been changed to call '_ZNK13TriangulationILi3EE8end_quadEv' instead of
having equivalent inline code. It appears the code for all 4 functions is the
same in r216304, but with r216305 the 3 named functions are loading gpr3 with
the addr of a local stack temp before calling
'_ZNK13TriangulationILi3EE8end_quadEv', such that the desired values do not get
stored off the original gpr3 value passed in (see '>>>' line noted below).

Following is generated asm for '_ZNK13TriangulationILi3EE8end_faceEv', the
other two are similar:

r216304:
        li 10,-1
        std 4,8(3)
        stw 10,0(3)
        stw 10,4(3)
        blr


216305:
        mflr 0
        std 0,16(1)
        stdu 1,-128(1)
        .cfi_def_cfa_offset 128
        .cfi_offset 65, 16
>>>     addi 3,1,112
        bl _ZNK13TriangulationILi3EE8end_quadEv
        nop
        addi 1,1,128
        .cfi_def_cfa_offset 0
        ld 0,16(1)
        mtlr 0
        .cfi_restore 65
        blr

A side comment about the above ICF tranformation, it sure seems like this is
going to degrade performance. We've went from a simple stackless leaf function
to one that stacks a frame and makes a call.

Reply via email to