https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116410

            Bug ID: 116410
           Summary: fat-lto-objects generates different and inefficient
                    code compared with no-fat-lto-objects
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yinyuefengyi at gmail dot com
  Target Milestone: ---

It seems unexpected that -ffat-lto-objects generates different code compared
with -fno-fat-lto-objects (which is set by default in GCC).  Unfortunately is
-ffat-lto-objects produces code with worse performance compared with
-fno-fat-lto-objects by about 2%+.  Even worse, many release OS like
Fedora/Redhat added the flag -ffat-lto-objects as global flag when building the
OSes, which means all software packages build by it are slow down.

One typical example is found in zstd and could be reproduced as below:


git clone https://github.com/facebook/zstd.git

cd zstd/programs
export CFLAGS="-O2 -flto=auto -g"
make zstd V=1 -j
gdb -batch -ex "disassemble/r ZSTD_rescaleFreqs" zstd   > nofat.asm

export CFLAGS="-O2 -flto=auto -g -ffat-lto-objects"
gdb -batch -ex "disassemble/r ZSTD_rescaleFreqs" zstd > fat.asm


cut one piece of code from fat-lto-objects:

   0x000000000043aca3 <+99>:    0f 29 54 24 10  movaps %xmm2,0x10(%rsp)
   0x000000000043aca8 <+104>:   0f 29 04 24     movaps %xmm0,(%rsp)
   0x000000000043acac <+108>:   0f 29 54 24 20  movaps %xmm2,0x20(%rsp)
   0x000000000043acb1 <+113>:   0f 29 54 24 30  movaps %xmm2,0x30(%rsp)
   0x000000000043acb6 <+118>:   0f 29 54 24 40  movaps %xmm2,0x40(%rsp)
   0x000000000043acbb <+123>:   0f 29 54 24 50  movaps %xmm2,0x50(%rsp)
   0x000000000043acc0 <+128>:   0f 29 54 24 60  movaps %xmm2,0x60(%rsp)
   0x000000000043acc5 <+133>:   0f 29 54 24 70  movaps %xmm2,0x70(%rsp)
   0x000000000043acca <+138>:   0f 29 94 24 80 00 00 00 movaps %xmm2,0x80(%rsp)
   0x000000000043acd2 <+146>:   0f 11 00        movups %xmm0,(%rax)
   0x000000000043acd5 <+149>:   66 0f 6f 7c 24 10       movdqa 0x10(%rsp),%xmm7
   0x000000000043acdb <+155>:   66 0f ef c0     pxor   %xmm0,%xmm0
   0x000000000043acdf <+159>:   0f 11 78 10     movups %xmm7,0x10(%rax)
   0x000000000043ace3 <+163>:   66 0f 6f 7c 24 20       movdqa 0x20(%rsp),%xmm7
   0x000000000043ace9 <+169>:   0f 11 78 20     movups %xmm7,0x20(%rax)
   0x000000000043aced <+173>:   66 0f 6f 7c 24 30       movdqa 0x30(%rsp),%xmm7
   0x000000000043acf3 <+179>:   0f 11 78 30     movups %xmm7,0x30(%rax)
   0x000000000043acf7 <+183>:   66 0f 6f 7c 24 40       movdqa 0x40(%rsp),%xmm7
   0x000000000043acfd <+189>:   0f 11 78 40     movups %xmm7,0x40(%rax)
   0x000000000043ad01 <+193>:   66 0f 6f 7c 24 50       movdqa 0x50(%rsp),%xmm7
   0x000000000043ad07 <+199>:   0f 11 78 50     movups %xmm7,0x50(%rax)
   0x000000000043ad0b <+203>:   66 0f 6f 74 24 60       movdqa 0x60(%rsp),%xmm6
   0x000000000043ad11 <+209>:   0f 11 70 60     movups %xmm6,0x60(%rax)
   0x000000000043ad15 <+213>:   66 0f 6f 7c 24 70       movdqa 0x70(%rsp),%xmm7
   0x000000000043ad1b <+219>:   0f 11 78 70     movups %xmm7,0x70(%rax)



same piece of code from no-fat-lto-objects:

   0x000000000043ab03 <+99>:    0f 11 50 10     movups %xmm2,0x10(%rax)
   0x000000000043ab07 <+103>:   0f 11 00        movups %xmm0,(%rax)
   0x000000000043ab0a <+106>:   0f 29 04 24     movaps %xmm0,(%rsp)
   0x000000000043ab0e <+110>:   66 0f ef c0     pxor   %xmm0,%xmm0
   0x000000000043ab12 <+114>:   0f 11 50 20     movups %xmm2,0x20(%rax)
   0x000000000043ab16 <+118>:   0f 11 50 30     movups %xmm2,0x30(%rax)
   0x000000000043ab1a <+122>:   0f 11 50 40     movups %xmm2,0x40(%rax)
   0x000000000043ab1e <+126>:   0f 11 50 50     movups %xmm2,0x50(%rax)
   0x000000000043ab22 <+130>:   0f 11 50 60     movups %xmm2,0x60(%rax)
   0x000000000043ab26 <+134>:   0f 11 50 70     movups %xmm2,0x70(%rax)
   0x000000000043ab2a <+138>:   0f 11 90 80 00 00 00    movups %xmm2,0x80(%rax)
   0x000000000043ab31 <+145>:   48 89 e0        mov    %rsp,%rax
   0x000000000043ab34 <+148>:   0f 29 54 24 10  movaps %xmm2,0x10(%rsp)
   0x000000000043ab39 <+153>:   0f 29 54 24 20  movaps %xmm2,0x20(%rsp)
   0x000000000043ab3e <+158>:   0f 29 54 24 30  movaps %xmm2,0x30(%rsp)
   0x000000000043ab43 <+163>:   0f 29 54 24 40  movaps %xmm2,0x40(%rsp)
   0x000000000043ab48 <+168>:   0f 29 54 24 50  movaps %xmm2,0x50(%rsp)
   0x000000000043ab4d <+173>:   0f 29 54 24 60  movaps %xmm2,0x60(%rsp)
   0x000000000043ab52 <+178>:   0f 29 54 24 70  movaps %xmm2,0x70(%rsp)
   0x000000000043ab57 <+183>:   0f 29 94 24 80 00 00 00 movaps %xmm2,0x80(%rsp)
   0x000000000043ab5f <+191>:   90      nop



I did a initial investigation and found that summaries information become
different since ipa-moderef pass:


  /* Compute no-LTO summaries when local optimization is going to happen.  */
  bool nolto = (!ipa || ((!flag_lto || flag_fat_lto_objects) && !in_lto_p)
                || (in_lto_p && !flag_wpa
                    && flag_incremental_link != INCREMENTAL_LINK_LTO));


nolto is true for fat-lto-objects, but false for no-fat-lto-objects, then
followed summary/summaries are modified and caused different alias analysis
information, dse fail to remove the redudant load/store to stack.  Is this a
valid bug?

Reply via email to