[Bug tree-optimization/74585] SRA forces parameters to memory causing awful code generation

rguenth at gcc dot gnu.org Fri, 12 Aug 2016 02:16:53 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2016-08-12
   Target Milestone|7.0                         |---
            Summary|[5/6/7] Tree-sra forces     |SRA forces parameters to
                   |parameters to memory        |memory causing awful code
                   |causing awful code          |generation
                   |generation                  |
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is not SRA pushing things to memory - it doesn't.  The issue is that
in the GIMPLE IL the parameter appears as "memory" as it is an aggregate type.

The issue is really RTL expansion of the argument (not sure where that's done)
which doesn't take into account that we could happily expand

  a$vx0_27 = MEM[(struct  *)&a];
  a$vx1_28 = MEM[(struct  *)&a + 16B];
  a$vx2_29 = MEM[(struct  *)&a + 32B];
  a$vx3_30 = MEM[(struct  *)&a + 48B];

if a is expanded to registers.

Note comparing assembler with/without -fno-tree-sra doesn't really show me
the obvious badness:

 test_vecd8_rotate_left:
-       addi 6,1,-288
-       li 9,160
+       addi 6,1,-224
+       li 7,96
        xxpermdi 34,34,34,2
        xxpermdi 35,35,35,2
-       li 10,176
+       li 8,112
+       li 10,128
        xxpermdi 36,36,36,2
        xxpermdi 37,37,37,2
+       stxvd2x 34,6,7
+       li 9,144
+       vspltisw 0,0
+       stxvd2x 35,6,8
+       stxvd2x 36,6,10
+       stxvd2x 37,6,9
+       lxvd2x 0,6,7
        li 7,32
-       stxvd2x 34,6,9
+       lxvd2x 7,6,8
        li 8,48
-       stxvd2x 35,6,10
-       li 10,192
-       stxvd2x 36,6,10
-       li 10,208
-       stxvd2x 37,6,10
-       lfd 12,-128(1)
-       lxvd2x 0,6,9
-       li 9,96
+       lxvd2x 9,6,10
        li 10,64
-       stfd 12,-184(1)
-       lfd 12,-104(1)
-       xxpermdi 0,0,0,3
-       stfd 0,-144(1)
-       stfd 12,-192(1)
-       lfd 12,-112(1)
-       stfd 12,-168(1)
-       lfd 12,-88(1)
-       lxvd2x 10,6,9
-       li 9,112
-       stxvd2x 10,6,7
-       stfd 12,-176(1)
-       lfd 12,-96(1)
-       stfd 12,-152(1)
-       lfd 12,-72(1)
        lxvd2x 11,6,9
-       li 9,128
-       lxvd2x 34,6,7
-       stxvd2x 11,6,8
-       stfd 12,-160(1)
-       lfd 12,-80(1)
-       xxpermdi 34,34,34,2
-       stfd 12,-136(1)
-       lxvd2x 12,6,9
-       li 9,144
-       lxvd2x 35,6,8
-       stxvd2x 12,6,10
-       lxvd2x 0,6,9
        li 9,80
-       xxpermdi 35,35,35,2
+       fmr 6,0
+       xxpermdi 0,0,0,3
+       fmr 8,7
+       xxpermdi 7,7,7,3
+       fmr 10,9
+       xxpermdi 9,9,9,3
+       fmr 12,11
+       xxpermdi 11,11,11,3
+       xxpermdi 6,6,32,1
+       xxpermdi 8,8,32,1
+       xxpermdi 10,10,32,1
+       xxpermdi 7,6,7,0
+       xxpermdi 12,12,32,1
+       xxpermdi 9,8,9,0
+       xxpermdi 11,10,11,0
+       xxpermdi 8,7,7,2
+       xxpermdi 0,12,0,0
+       xxpermdi 10,9,9,2
+       stxvd2x 8,6,7
+       xxpermdi 12,11,11,2
+       stxvd2x 10,6,8
+       xxpermdi 0,0,0,2
+       stxvd2x 12,6,10
        stxvd2x 0,6,9
+       lxvd2x 34,6,7
+       lxvd2x 35,6,8
        lxvd2x 36,6,10
        lxvd2x 37,6,9
+       xxpermdi 35,35,35,2
+       xxpermdi 34,34,34,2
        xxpermdi 36,36,36,2
        xxpermdi 37,37,37,2
        blr

 t.s |   80
++++++++++++++++++++++++++++++++++----------------------------------
 1 file changed, 40 insertions(+), 40 deletions(-)


Generally the GIMPLE phase not knowing about calling conventions has issues
but I don't see an easy way out here other than lowering things much much
earlier ... (with its own downside, esp. considering the "cross-target" games
we play for offloading via LTO).

That said, somebody needs to sit down and see where the copy to memory
is generated and why.

[Bug tree-optimization/74585] SRA forces parameters to memory causing awful code generation

Reply via email to