On Mon, Mar 24, 2014 at 01:33:11PM +0100, Richard Biener wrote:
> On Mon, Mar 24, 2014 at 1:25 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
> > Hello!
> >
> >> On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper <drep...@gmail.com> wrote:
> >>>> Your patch is correct IMHO, but maybe it worst to add all missing
> >>>> `mm512_set1*' stuff?
> >>>>
> >>>> According to trunk and [1] we're still missing (beside mentioned by you)
> >>>> _mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.
> >>>
> >>> Yes, more are missing, but I think those will need new builtins.  The
> >>> _ps and _pd don't require additional instructions.
> >>>
> >>> _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8
> >>> might have to map to vpbroadcastb.  I haven't seen a way to generate
> >>> those instructions if needed and so this work was out of scope for now
> >>> due to time constraints.  I agree, they should be added as quickly as
> >>> possible to avoid releasing headers with incomplete APIs.
> >>>
> >>> What is the verdict on checking these changes in?  Too late for the
> >>> next release?
> >>
> >> This kind of changes can also be made for 4.9.1 for example.
> >
> > OTOH, these changes are isolated to intrinsic header files, and we
> > have quite extensive testsuite for these. I see no problem to check-in
> > these changes even at this stage.
> >
> > So, if there is no better solution I propose to check these changes
> > in, since the benefit to users outweight (minor) risk. Would this be
> > OK from RM POV, also weighting in benefits to users?
> 
> Yes, sure.  I've just meant that it's ok to do more work for 4.9.1, too.

But, if for say _mm512_set1_epi8 you have no intrinsics, just do something
similar to what _mm256_set_epi8 and _mm256_set1_epi8 do, the compiler should
be smart enough to recognize those as broadcasts.

The following is recognized well:

typedef char v32qi __attribute__((vector_size (32)));
v32qi foo (char a)
{
  return (v32qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, 
a, a, a, a, a, a, a, a, a, a, a, a };
}

This isn't:

typedef char v64qi __attribute__((vector_size (64)));
v64qi foo (char a)
{
  return (v64qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, 
a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, 
a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a };
}

But I believe it has been discussed already that the V32HImode and V64QImode
support is incomplete in 4.9.  While I think there are no direct broadcasts
for these modes, one can e.g. use AVX2 broadcasts and then duplicate into
the 512-bit mode.
See http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00757.html

        Jakub

Reply via email to