https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

--- Comment #18 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Filip Kastl from comment #17)
> This is the replacement that causes the slowdown (well, two replacements):
> 
> ----------------------
> Replace:
> 
> (insn 2224 2222 2228 20 (set (reg:V4DF 1604)
>         (vec_duplicate:V4DF (mem/u/c:DF (symbol_ref/u:DI ("*.LC3") [flags
> 0x2]) [0  S8 A64]))) 9260 {vec_dupv4df}
>      (expr_list:REG_EQUAL (const_vector:V4DF [
>                 (const_double:DF
> 2.7777777777777776235801354687282582744956016540527344e-2
> [0x0.e38e38e38e38ep-5]) repeated x4
>             ])
>         (nil)))
> 
> with:
> 
> (insn 2224 2222 2228 20 (set (reg:V4DF 1604)
>         (reg:V4DF 1655)) 2428 {movv4df_internal}
>      (expr_list:REG_EQUAL (const_vector:V4DF [
>                 (const_double:DF
> 2.7777777777777776235801354687282582744956016540527344e-2
> [0x0.e38e38e38e38ep-5]) repeated x4
>             ])
>         (nil)))
> 
> deferring rescan insn with uid = 2224.
> 
> Replace:
> 
> (insn 2228 2224 377 20 (set (reg:V2DF 1603)
>         (vec_duplicate:V2DF (mem/u/c:DF (symbol_ref/u:DI ("*.LC3") [flags
> 0x2]) [0  S8 A64]))) 7168 {vec_dupv2df}
>      (expr_list:REG_EQUAL (const_vector:V2DF [
>                 (const_double:DF
> 2.7777777777777776235801354687282582744956016540527344e-2
> [0x0.e38e38e38e38ep-5]) repeated x2
>             ])
>         (nil)))
> 
> with:
> 
> (insn 2228 2224 377 20 (set (reg:V2DF 1603)
>         (subreg:V2DF (reg:V4DF 1655) 0)) 2429 {movv2df_internal}
>      (expr_list:REG_EQUAL (const_vector:V2DF [
>                 (const_double:DF
> 2.7777777777777776235801354687282582744956016540527344e-2
> [0x0.e38e38e38e38ep-5]) repeated x2
>             ])
>         (nil)))
> 
> deferring rescan insn with uid = 2228.
> ----------------------
> 
> These instructions are inside function "main".  Though, the last RTL debug
> instruction is
> 
> (debug_insn 272 271 273 19 (debug_marker) "lbm.c":275:2 discrim 1 -1
>      (nil))
> 
> so I expect that function "LBM_performStreamCollideTRT" was inlined into
> main and is the original source of these vector instructions.
> 
> Hopefully this helps.  If you meant something else by "testcase", do tell me.
> 
> 
> What I did in more detail:
> 
> I used a custom debug counter.  If I set the 9-th call of
> ix86_broadcast_inner() to return null (I adapted what hjl's patch does), I
> get rid of the slowdown.
> 
> On r16-1644-gaba3b9d3a48a07 I added the debug counter and did:
> 
> /home/fkastl/gcc/inst/bin/gcc -std=gnu99 -m64 -c -o lbm.o -DSPEC -DNDEBUG
> -DSPEC_AUTO_SUPPRESS_OPENMP  -Ofast -march=native -mtune=native -g -flto=32
> -fpermissive -std=gnu17              -DSPEC_LP64  lbm.c
> -fdbg-cnt=foo_counter:1000000000-1000000000
> /home/fkastl/gcc/inst/bin/gcc -std=gnu99 -m64 -c -o main.o -DSPEC -DNDEBUG
> -DSPEC_AUTO_SUPPRESS_OPENMP  -Ofast -march=native -mtune=native -g -flto=32
> -fpermissive -std=gnu17              -DSPEC_LP64  main.c
> -fdbg-cnt=foo_counter:1000000000-1000000000
> 
> /home/fkastl/gcc/inst/bin/gcc -std=gnu99 -m64
> -Wl,-rpath,/home/fkastl/gcc/inst/lib64   -Ofast -march=native -mtune=native
> -g -flto=32 -fpermissive -std=gnu17         lbm.o main.o             -lm -o
> lbm_r -fdbg-cnt=foo_counter:9-9 -fdump-rtl-all
> 
> -> 3m43s
> 
> /home/fkastl/gcc/inst/bin/gcc -std=gnu99 -m64
> -Wl,-rpath,/home/fkastl/gcc/inst/lib64   -Ofast -march=native -mtune=native
> -g -flto=32 -fpermissive -std=gnu17         lbm.o main.o             -lm -o
> lbm_r -fdbg-cnt=foo_counter:1000000000-1000000000 -fdump-rtl-all
> 
> -> 2m50s
> 
> Then I compared the *.rrvl rtl dumps.  Btw I had to "backport" the
> 
> Replace:
> ...
> with:
> 
> and
> 
> Add:
> ...
> 
> dumping from a newer commit.

Please extract something I can use it to reproduce.

Reply via email to