https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69274

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra
                 CC|                            |vmakarov at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, I can confirm the bisection to r231814.

Differences are RA / scheduling differences like (just the first one),
+++ is good, --- is bad:

--- 3dview.s    2016-02-04 15:53:21.906672969 +0100
+++ ../build_peak_amd64-m64-gcc42-nn.0000/3dview.s      2016-02-04
15:53:40.5157
56755 +0100
@@ -29,14 +29,14 @@
        setnb   %al
        orb     %al, %cl
        je      .L2
-       vbroadcastss    8(%rsi), %xmm1
-       vmulps  32(%rdi), %xmm1, %xmm1
-       vbroadcastss    4(%rsi), %xmm0
+       vbroadcastss    8(%rsi), %xmm0
+       vmulps  32(%rdi), %xmm0, %xmm1
        vmovups 48(%rdi), %xmm2
-       vfmadd231ps     16(%rdi), %xmm0, %xmm1
-       vbroadcastss    (%rsi), %xmm0
-       vfmadd132ps     (%rdi), %xmm2, %xmm0
-       vaddps  %xmm0, %xmm1, %xmm0
+       vbroadcastss    4(%rsi), %xmm0
+       vfmadd132ps     16(%rdi), %xmm1, %xmm0
+       vbroadcastss    (%rsi), %xmm1
+       vfmadd132ps     (%rdi), %xmm2, %xmm1
+       vaddps  %xmm1, %xmm0, %xmm0
        vmovups %xmm0, (%rdx)
        ret

It's differences all over the place, so profiling is needed here.  Will try to
get some data on that.

IRA dump differences are

@@ -578,26 +578,26 @@

   cp0:a21(r195)<->a22(r196)@5:shuffle
   cp1:a20(r197)<->a21(r195)@5:shuffle
-  cp2:a18(r199)<->a19(r198)@5:shuffle
-  cp3:a18(r199)<->a20(r197)@5:shuffle
+  cp2:a18(r199)<->a20(r197)@5:shuffle
+  cp3:a18(r199)<->a19(r198)@5:shuffle
   cp4:a16(r200)<->a17(r201)@5:shuffle
   cp5:a15(r202)<->a16(r200)@5:shuffle
-  cp6:a13(r204)<->a14(r203)@5:shuffle
-  cp7:a13(r204)<->a15(r202)@5:shuffle
+  cp6:a13(r204)<->a15(r202)@5:shuffle
+  cp7:a13(r204)<->a14(r203)@5:shuffle

that doesn't look like useful information to me.  Maybe

-      Forming thread by copy 14:a1r214-a2r213 (freq=5):
-        Result (freq=160): a1r214(80) a2r213(80)
-      Pushing a18(r199,l0)(cost 0)
+      Forming thread by copy 14:a1r214-a3r212 (freq=5):
+        Result (freq=320): a1r214(80) a3r212(80) a6r210(80) a7r211(80)
       Pushing a19(r198,l0)(cost 0)
-      Pushing a13(r204,l0)(cost 0)
       Pushing a14(r203,l0)(cost 0)
-      Pushing a8(r209,l0)(cost 0)
       Pushing a9(r208,l0)(cost 0)
-      Pushing a1(r214,l0)(cost 0)
       Pushing a2(r213,l0)(cost 0)
       Pushing a21(r195,l0)(cost 0)
+      Pushing a18(r199,l0)(cost 0)
       Pushing a22(r196,l0)(cost 0)
       Pushing a20(r197,l0)(cost 0)
       Pushing a16(r200,l0)(cost 0)
+      Pushing a13(r204,l0)(cost 0)

which looks like spurious ordering differences of same-cost stuff?

Completely mysterious why the patch causes so much differences in RA.  But
the resulting scheduling differences can explain the result (I really
suspect just one "unlucky" loop here, will try to track that down now).

Reply via email to