On Tuesday 02 May 2017, Jakub Jelinek wrote: > On Mon, Apr 24, 2017 at 03:15:11PM +0200, Allan Sandfeld Jensen wrote: > > Okay, I have tried that, and I also made it more obvious how the > > intrinsics can become non-immediate shift. > > > > > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > > index b58f5050db0..b9406550fc5 100644 > > --- a/gcc/ChangeLog > > +++ b/gcc/ChangeLog > > @@ -1,3 +1,10 @@ > > +2017-04-22 Allan Sandfeld Jensen <sandf...@kde.org> > > + > > + * config/i386/emmintrin.h (_mm_slli_*, _mm_srli_*): > > + Use vector intrinstics instead of builtins. > > + * config/i386/avx2intrin.h (_mm256_slli_*, _mm256_srli_*): > > + Use vector intrinstics instead of builtins. > > + > > > > 2017-04-21 Uros Bizjak <ubiz...@gmail.com> > > > > * config/i386/i386.md (*extzvqi_mem_rex64): Move above *extzv<mode>. > > > > diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h > > index 82f170a3d61..64ba52b244e 100644 > > --- a/gcc/config/i386/avx2intrin.h > > +++ b/gcc/config/i386/avx2intrin.h > > @@ -665,13 +665,6 @@ _mm256_slli_si256 (__m256i __A, const int __N) > > > > extern __inline __m256i > > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > > > > -_mm256_slli_epi16 (__m256i __A, int __B) > > -{ > > - return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)__A, __B); > > -} > > - > > -extern __inline __m256i > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > > > > _mm256_sll_epi16 (__m256i __A, __m128i __B) > > { > > > > return (__m256i)__builtin_ia32_psllw256((__v16hi)__A, (__v8hi)__B); > > > > @@ -679,9 +672,11 @@ _mm256_sll_epi16 (__m256i __A, __m128i __B) > > > > extern __inline __m256i > > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > > > > -_mm256_slli_epi32 (__m256i __A, int __B) > > +_mm256_slli_epi16 (__m256i __A, int __B) > > > > { > > > > - return (__m256i)__builtin_ia32_pslldi256 ((__v8si)__A, __B); > > + if (__builtin_constant_p(__B)) > > + return ((unsigned int)__B < 16) ? (__m256i)((__v16hi)__A << __B) : > > _mm256_setzero_si256(); + return _mm256_sll_epi16(__A, > > _mm_cvtsi32_si128(__B)); > > > > } > > The formatting is wrong, missing spaces before function names and opening > (, too long lines. Also, you've removed some builtin uses like > __builtin_ia32_psllwi256 above, but haven't removed those builtins from the > compiler (unlike the intrinsics, the builtins are not supported and can be > removed). But I guess the primary question is on Uros, do we > want to handle this in the *intrin.h headers and thus increase the size > of those (and their parsing time etc.), or do we want to handle this > in the target folders (tree as well as gimple one), where we'd convert > e.g. __builtin_ia32_psllwi256 to the shift if the shift count is constant. > Ok. I will await what you decide.
Btw. I thought of an alternative idea: Make a new set of built-ins, called for instance __builtin_lshift and __builtin_rshift, that translates simply to GIMPLE shifts, just like cpp_shifts currently does, the only difference being the new shifts (unlike C/C++ shifts) are defined for all shift sizes and on negative values. With this also variable shift intrinsics can be written without builtins. Though to do this would making a whole set of them for all integer types, it would need to be implemented in the c-parser like __buitin_shuffle, and not with the other generic builtins. Would that make sense? Best regards `Allan