Sorry, just realised I'd never replied to this.

Marc Poulhies <poulh...@adacore.com> writes:
> Eric Botcazou <botca...@adacore.com> writes:
>>> The new variables seem to be unused, so I think slightly stronger
>>> DCE could remove the calls even after the patch.  Perhaps the containing
>>> functions should take an int32x4_t *ptr or something, with the calls
>>> assigning to different ptr[] indices.
>>
>> We run a minimal DCE pass at -O0 in our compiler to eliminate all the 
>> garbage 
>> generated by the gimplifier for variable-sized types (people care about code 
>> size at -O0 in specific contexts) but it does not touch anything written by 
>> the user (and debugging is unaffected of course).  Given that the builtins 
>> are 
>> pure functions and the arguments have no side effects, it eliminates the 
>> calls, but adding a LHS blocks that because this minimal DCE pass preserves 
>> anything user-related, in particular assignments to user variables.
>>
>>> I think it would be better to do that using new calls though,
>>> and xfail the existing ones when they no longer work.  For example:
>>> 
>>>   /* { dg-error "lane -1 out of range 0 - 7" "" {target *-*-*} 0 } */
>>>   vqdmlal_high_laneq_s16 (int32x4_a, int16x8_b, int16x8_c, -1);
>>>   /* { dg-error "lane -1 out of range 0 - 7" "" {target *-*-*} 0 } */
>>>   ptr[0] = vqdmlal_high_laneq_s16 (int32x4_a, int16x8_b, int16x8_c, -1);
>>> 
>>> That way we don't lose the existing tests.
>>
>> Frankly I'm not quite sure of what we can lose by adding a LHS here, can you 
>> elaborate a bit?  We would need a solution that works out of the box with 
>> our 
>> compiler in the future, i.e. without having to tweak 50 testcases again.
>
> Hi Richard,
>
> Thank for your reply !
>
> As Éric, I'm also wondering why having LHS in the existing tests would
> make us loose them. I guess I'm not familiar enough with this part of
> the testsuite and I'm missing something.

The problem is that we only enforce lane bounds via calls to
__builtin_aarch64_im_lane_boundsi.  In previous releases, the check
only happend at RTL expansion time, so the check would be skipped if
any gimple pass removed the call.  Now we do the checking during
folding, but that still misses cases.  E.g., compare the -O0 and -O1
behaviour for:

#include <arm_neon.h>

void f(int32x4_t *p0, int16x8_t *p1) {
    vqdmlal_high_laneq_s16(p0[0], p1[0], p1[1], -1);
    //p0[0] = vqdmlal_high_laneq_s16(p0[0], p1[0], p1[1], -1);
}

-O0 gives the error but -O1 doesn't [https://godbolt.org/z/1KosTY43T].
The -O1 behaviour here is wrong: badly-formed calls should be rejected
with a diagnostic even if the calls are unused.  Clang gets this right
in both cases [https://godbolt.org/z/EGxs8jq97].

I think keeping the lhs-free calls is important for making sure that
the -O0 behaviour doesn't regress without the DCE.

Your DCE will regress it, but that's the fault of the arm_neon.h
implementation rather than the fault of your pass.  Having the
tests but XFAILing them seems like the best way of dealing with that.
Hopefully we'll then see some progression if the arm_neon.h implementation
is improved in future.

Thanks,
Richard

Reply via email to