Re: [PATCH] [x86] Don't use builtins for SIMD shifts

Richard Biener Tue, 13 Dec 2016 00:50:32 -0800

On Sat, Dec 10, 2016 at 7:59 PM, Marc Glisse <marc.gli...@inria.fr> wrote:
> On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote:
>
>> On Saturday 10 December 2016, Marc Glisse wrote:
>>>
>>> On Sat, 10 Dec 2016, Marc Glisse wrote:
>>>>
>>>> On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote:
>>>>>
>>>>> Replaces the definitions of the shift intrinsics with GCC extension
>>>>> syntax to
>>>>> allow GCC to reason about what the instructions does. Tests are added
>>>>> to
>>>>> ensure the intrinsics still produce the right instructions, and that a
>>>>> few basic optimizations now work.
>>>>
>>>>
>>>> I don't think we can do it in such a straightforward way. Those
>>>> intrinsics are well defined for any shift argument, while operator<< and
>>>> operator>> are considered undefined for many input values. I believe you
>>>> may need to use an unsigned type for the lhs of left shifts, and write a
>>>> << (b & 31) to match the semantics for the rhs (I haven't checked
>>>> Intel's doc). Which might require adding a few patterns in sse.md to
>>>> avoid generating code for that AND.
>>>
>>>
>>> Oups, apparently I got that wrong, got confused with the scalar case.
>>> Left
>>> shift by more than precision is well defined for vectors, but it gives 0,
>>> it doesn't take the shift count modulo the precision. Which is even
>>> harder
>>> to explain to gcc (maybe ((unsigned)b<=31)?a<<b:0...). Yes, the way we
>>> model operations in gcc is often inconvenient.
>>>
>>> There was a similar issue for LLVM (https://reviews.llvm.org/D3353).
>>>
>> Well, it is undefined behaviour by the C standard, but is it also
>> undefined
>> inside GCC (since this specifically uses a GCC extension)? I would assume
>> it
>> just produces 0.
>
>
> In the optimizers, we tend to handle vectors the same as scalars. I don't
> remember if we have any optimization that takes advantage of the limited
> range for the second operand of shifts (mostly we disable optimizations when
> there is a risk they might produce a shift amount larger than prec), but I
> don't think we have specified the behavior for larger-than-prec shifts, so
> it would be dangerous. Note that depending on the platform, gcc may lower
> vector operations to smaller vectors or even scalars, which makes it
> inconvenient to have different semantics for vectors and scalars (not
> impossible, we would for instance need to teach the lowering pass that vec1
> << vec2 lowers to { ((unsigned)vec2[0]<=31)?vec1[0]<<vec2[0]:0,
> ((unsigned)vec2[1]<=31)?vec1[1]<<vec2[1]:0 }). I think on some platforms,
> shifting vectors does use the modulo behavior, or handles negative shift
> values as shifts in the other direction, we can't look at just one target.
>
> We could have a middle-end policy of vec << 1234 is not undefined (we cannot
> assume it doesn't happen) but we don't know what it is defined to, so we
> avoid any optimization that may have it as input or output. That might mean
> disabling all shift optimizations though, which would be counter productive.
>
>> The non-immediate case is simpler then, because it produces
>> the instructions which will inherently return the right thing.
>
>
> It will only generate the instructions you expect if it hasn't already
> optimized it to something else, which might not even involve a shift
> anymore.
>
> I expect reviewers (I am only commenting) will need convincing arguments
> that this is safe (I could be wrong, maybe they'll find reasons that I
> missed that make it obviously safe).


I agree that for all the reasons you stated it is important to have
vector (and complex!)
integer ops behave the same with respect to overflow.

That we do not yet take advantage of out-of-bound shifts is merely a
missed optimization.

Richard.

> --
> Marc Glisse

Re: [PATCH] [x86] Don't use builtins for SIMD shifts

Reply via email to