On Mon, Apr 24, 2017 at 11:01:29AM +0200, Allan Sandfeld Jensen wrote:
> On Monday 24 April 2017, Jakub Jelinek wrote:
> > On Mon, Apr 24, 2017 at 10:34:58AM +0200, Allan Sandfeld Jensen wrote:
> > > That is a different instruction. That is the vpsllw not vpsllwi
> > > 
> > > The intrinsics I changed is the immediate version, I didn't change the
> > > non- immediate version. It is probably a bug if you can give
> > > non-immediate values to the immediate only intrinsic. At least both
> > > versions handles it, if in different ways, but is is illegal arguments.
> > 
> > The documentation is unclear on that and I've only recently fixed up some
> > cases where these intrinsics weren't able to handle non-constant arguments
> > in some cases, while both ICC and clang coped with that fine.
> > So it is clearly allowed and handled by all the compilers and needs to be
> > supported, people use that in real-world code.
> > 
> Undoubtedly it happens. I just make a mistake myself that created that case. 
> But it is rather unfortunate, and means we make wrong code currently for 
> corner case values.

The intrinsic documentation is poor, usually you have a good documentation
on what the instructions do, and then you just have to guess what the
intrinsics do.  You can of course ask Intel for clarification.

If you try:
#include <x86intrin.h>

__m128i
foo (__m128i a, int b)
{
  return _mm_slli_epi16 (a, b);
}
and call it with 257 from somewhere else, you can see that all the compilers
will give you zero vector.  And similarly if you use 257 literally instead
of b.  So what the intrinsic (unlike the instruction)
actually does is that it compares all bits of the imm8 argument (supposedly
using unsigned comparison) and if it is bigger than 15 (or 7 or 31 or 63
depending on the bitsize of element) it yields 0 vector.

        Jakub

Reply via email to