Hi,

> Sorry, just realised I'd never replied to this.

No worries! I also took a very long time to reply, sorry.

> The problem is that we only enforce lane bounds via calls to
> __builtin_aarch64_im_lane_boundsi.  In previous releases, the check
> only happend at RTL expansion time, so the check would be skipped if
> any gimple pass removed the call.  Now we do the checking during
> folding, but that still misses cases.  E.g., compare the -O0 and -O1
> behaviour for:
>
> #include <arm_neon.h>
>
> void f(int32x4_t *p0, int16x8_t *p1) {
>     vqdmlal_high_laneq_s16(p0[0], p1[0], p1[1], -1);
>     //p0[0] = vqdmlal_high_laneq_s16(p0[0], p1[0], p1[1], -1);
> }
>
> -O0 gives the error but -O1 doesn't [https://godbolt.org/z/1KosTY43T].
> The -O1 behaviour here is wrong: badly-formed calls should be rejected
> with a diagnostic even if the calls are unused.  Clang gets this right
> in both cases [https://godbolt.org/z/EGxs8jq97].
>
> I think keeping the lhs-free calls is important for making sure that
> the -O0 behaviour doesn't regress without the DCE.
>
> Your DCE will regress it, but that's the fault of the arm_neon.h
> implementation rather than the fault of your pass.  Having the
> tests but XFAILing them seems like the best way of dealing with that.
> Hopefully we'll then see some progression if the arm_neon.h implementation
> is improved in future.

Ok, thanks for the clarification, I now understand why you prefer to
keep the existing tests.

The initial goal of the patch is to minimize our changes in the tests
results by having the tests PASS both with our internal compiler and
with master. I can apply the modifications you are suggesting (i.e.
adding new calls with assignments) if you think this is an improvement
over the current state (but internally, we would still need to mark
existing tests as XFAIL).

Thanks!
Marc

Reply via email to