https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

Filip Kastl <pheeck at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW

--- Comment #17 from Filip Kastl <pheeck at gcc dot gnu.org> ---
This is the replacement that causes the slowdown (well, two replacements):

----------------------
Replace:

(insn 2224 2222 2228 20 (set (reg:V4DF 1604)
        (vec_duplicate:V4DF (mem/u/c:DF (symbol_ref/u:DI ("*.LC3") [flags 0x2])
[0  S8 A64]))) 9260 {vec_dupv4df}
     (expr_list:REG_EQUAL (const_vector:V4DF [
                (const_double:DF
2.7777777777777776235801354687282582744956016540527344e-2
[0x0.e38e38e38e38ep-5]) repeated x4
            ])
        (nil)))

with:

(insn 2224 2222 2228 20 (set (reg:V4DF 1604)
        (reg:V4DF 1655)) 2428 {movv4df_internal}
     (expr_list:REG_EQUAL (const_vector:V4DF [
                (const_double:DF
2.7777777777777776235801354687282582744956016540527344e-2
[0x0.e38e38e38e38ep-5]) repeated x4
            ])
        (nil)))

deferring rescan insn with uid = 2224.

Replace:

(insn 2228 2224 377 20 (set (reg:V2DF 1603)
        (vec_duplicate:V2DF (mem/u/c:DF (symbol_ref/u:DI ("*.LC3") [flags 0x2])
[0  S8 A64]))) 7168 {vec_dupv2df}
     (expr_list:REG_EQUAL (const_vector:V2DF [
                (const_double:DF
2.7777777777777776235801354687282582744956016540527344e-2
[0x0.e38e38e38e38ep-5]) repeated x2
            ])
        (nil)))

with:

(insn 2228 2224 377 20 (set (reg:V2DF 1603)
        (subreg:V2DF (reg:V4DF 1655) 0)) 2429 {movv2df_internal}
     (expr_list:REG_EQUAL (const_vector:V2DF [
                (const_double:DF
2.7777777777777776235801354687282582744956016540527344e-2
[0x0.e38e38e38e38ep-5]) repeated x2
            ])
        (nil)))

deferring rescan insn with uid = 2228.
----------------------

These instructions are inside function "main".  Though, the last RTL debug
instruction is

(debug_insn 272 271 273 19 (debug_marker) "lbm.c":275:2 discrim 1 -1
     (nil))

so I expect that function "LBM_performStreamCollideTRT" was inlined into main
and is the original source of these vector instructions.

Hopefully this helps.  If you meant something else by "testcase", do tell me.


What I did in more detail:

I used a custom debug counter.  If I set the 9-th call of
ix86_broadcast_inner() to return null (I adapted what hjl's patch does), I get
rid of the slowdown.

On r16-1644-gaba3b9d3a48a07 I added the debug counter and did:

/home/fkastl/gcc/inst/bin/gcc -std=gnu99 -m64 -c -o lbm.o -DSPEC -DNDEBUG
-DSPEC_AUTO_SUPPRESS_OPENMP  -Ofast -march=native -mtune=native -g -flto=32
-fpermissive -std=gnu17              -DSPEC_LP64  lbm.c
-fdbg-cnt=foo_counter:1000000000-1000000000
/home/fkastl/gcc/inst/bin/gcc -std=gnu99 -m64 -c -o main.o -DSPEC -DNDEBUG
-DSPEC_AUTO_SUPPRESS_OPENMP  -Ofast -march=native -mtune=native -g -flto=32
-fpermissive -std=gnu17              -DSPEC_LP64  main.c
-fdbg-cnt=foo_counter:1000000000-1000000000

/home/fkastl/gcc/inst/bin/gcc -std=gnu99 -m64
-Wl,-rpath,/home/fkastl/gcc/inst/lib64   -Ofast -march=native -mtune=native -g
-flto=32 -fpermissive -std=gnu17         lbm.o main.o             -lm -o lbm_r
-fdbg-cnt=foo_counter:9-9 -fdump-rtl-all

-> 3m43s

/home/fkastl/gcc/inst/bin/gcc -std=gnu99 -m64
-Wl,-rpath,/home/fkastl/gcc/inst/lib64   -Ofast -march=native -mtune=native -g
-flto=32 -fpermissive -std=gnu17         lbm.o main.o             -lm -o lbm_r
-fdbg-cnt=foo_counter:1000000000-1000000000 -fdump-rtl-all

-> 2m50s

Then I compared the *.rrvl rtl dumps.  Btw I had to "backport" the

Replace:
...
with:

and

Add:
...

dumping from a newer commit.

Reply via email to