avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]%ymm[0-9]+[^\n]%xmm[0-9]+{%k[1-7]}{z}(?

kyukhin at gcc dot gnu.org Thu, 18 Feb 2016 03:17:31 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671


--- Comment #24 from Kirill Yukhin <kyukhin at gcc dot gnu.org> ---
(In reply to rguent...@suse.de from comment #23)
> On Wed, 17 Feb 2016, jakub at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671
> > 
> > --- Comment #22 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> > Created attachment 37722 [details]
> >   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37722&action=edit
> > gcc6-pr69671.patch
> > 
> > Actually, on a closer look, I believe the only problem are the patterns that
> > use a vector_move_operand "0C" inside of vec_select with only constants as 
> > the
> > parallel's operands.  Because fwprop is able to propagate constants into
> > instructions (thus undo the CSE effect), but doesn't do anything on these,
> > because it also simplifies them, so instead of the expected say
> >                 (vec_select:V4QI (const_vector:V16QI [
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                             (const_int 0 [0])
> >                         ])
> >                     (parallel [
> >                             (const_int 0 [0])
> >                             (const_int 1 [0x1])
> >                             (const_int 2 [0x2])
> >                             (const_int 3 [0x3])
> >                         ]))
> > we get in there simplified:
> >                 (const_vector:V4QI [
> >                         (const_int 0 [0])
> >                         (const_int 0 [0])
> >                         (const_int 0 [0])
> >                         (const_int 0 [0])
> >                     ])
> > So, by adding extra patterns for that simplification fwprop is able to do 
> > its
> > job even if CSE did a better job.
> 
> Of course then I wonder why we didn't simplify this in the first place
> when generating RTL and need to wait for forwprop ...
> 
> But yes, sounds like the easiest way to go forward.

Agree.

[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?

Reply via email to

[Bug target/69671] [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]%ymm[0-9]+[^\n]%xmm[0-9]+{%k[1-7]}{z}(?