On Mon, Mar 24, 2014 at 01:33:11PM +0100, Richard Biener wrote: > On Mon, Mar 24, 2014 at 1:25 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > > Hello! > > > >> On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper <drep...@gmail.com> wrote: > >>>> Your patch is correct IMHO, but maybe it worst to add all missing > >>>> `mm512_set1*' stuff? > >>>> > >>>> According to trunk and [1] we're still missing (beside mentioned by you) > >>>> _mm512_set1_epi16 and _mm512_set1_epi8 broadcasts. > >>> > >>> Yes, more are missing, but I think those will need new builtins. The > >>> _ps and _pd don't require additional instructions. > >>> > >>> _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8 > >>> might have to map to vpbroadcastb. I haven't seen a way to generate > >>> those instructions if needed and so this work was out of scope for now > >>> due to time constraints. I agree, they should be added as quickly as > >>> possible to avoid releasing headers with incomplete APIs. > >>> > >>> What is the verdict on checking these changes in? Too late for the > >>> next release? > >> > >> This kind of changes can also be made for 4.9.1 for example. > > > > OTOH, these changes are isolated to intrinsic header files, and we > > have quite extensive testsuite for these. I see no problem to check-in > > these changes even at this stage. > > > > So, if there is no better solution I propose to check these changes > > in, since the benefit to users outweight (minor) risk. Would this be > > OK from RM POV, also weighting in benefits to users? > > Yes, sure. I've just meant that it's ok to do more work for 4.9.1, too.
But, if for say _mm512_set1_epi8 you have no intrinsics, just do something similar to what _mm256_set_epi8 and _mm256_set1_epi8 do, the compiler should be smart enough to recognize those as broadcasts. The following is recognized well: typedef char v32qi __attribute__((vector_size (32))); v32qi foo (char a) { return (v32qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a }; } This isn't: typedef char v64qi __attribute__((vector_size (64))); v64qi foo (char a) { return (v64qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a }; } But I believe it has been discussed already that the V32HImode and V64QImode support is incomplete in 4.9. While I think there are no direct broadcasts for these modes, one can e.g. use AVX2 broadcasts and then duplicate into the 512-bit mode. See http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00757.html Jakub