xiezhiheng <xiezhih...@huawei.com> writes: >> -----Original Message----- >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] >> Sent: Friday, August 21, 2020 5:02 PM >> To: xiezhiheng <xiezhih...@huawei.com> >> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions >> emitted at -O3 > > Cut... > >> Looks like the saturating intrinsics might need a bit more thought. >> Would you mind submitting the patch with just the other parts? >> Those were uncontroversial and it would be a shame to hold them >> up over this. > > Okay, I reorganized the existing patch and finished the first half of the > intrinsics > except saturating intrinsics and load intrinsics. > > Bootstrapped and tested on aarch64 Linux platform.
I know this'll be frustrating, sorry, but could you post the 2020-08-17 patch without the saturation changes? It's going to be easier to track and review if each patch deals with similar intrinsics. The non-saturating part of the 2020-08-17 patch was good because it was dealing purely with arithmetic operations. Loads should really be a separate change. BTW, for something like this, it's OK to test and submit several patches at once, so separating the patches doesn't need to mean longer test cycles. It's just that for review purposes, it's easier if one patch does one thing. > For load intrinsics, I have one problem when I set FLAG_READ_MEMORY for them, > some test cases like > gcc.target/aarch64/advsimd-intrinsics/vld2_lane_p8_indices_1.c > #include <arm_neon.h> > > /* { dg-do compile } */ > /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ > > poly8x8x2_t > f_vld2_lane_p8 (poly8_t * p, poly8x8x2_t v) > { > poly8x8x2_t res; > /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */ > res = vld2_lane_p8 (p, v, 8); > /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */ > res = vld2_lane_p8 (p, v, -1); > return res; > } > would fail in regression. Because the first statement > res = vld2_lane_p8 (p, v, 8); > would be eliminated as dead code in gimple phase but the error message is > generated in expand pass. So I am going to replace the second statement > res = vld2_lane_p8 (p, v, -1); > with > res = vld2_lane_p8 (p, res, -1); > or do you have any other suggestions? The test is valid as-is, so it would be better not to change it. I guess this means that we should leave the _lane loads and stores until we implement the range checks in a different way. This is somewhat related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95969 , although your example shows that the “dummy const function” approach might not work. So to start with, could you just patch the non-lane loads? > And for test case gcc.target/aarch64/arg-type-diagnostics-1.c, I return the > result > to prevent the statement > result = vrsra_n_s32 (arg1, arg2, a); > from being eliminated by treated as dead code. Hmm. Here too I think the test is valid as-is. I think we need to ensure that the range check still happens even if the call is dead code (similar to PR95969 above). So I guess here too, it might be better to leave the _n forms to a separate patch. That doesn't mean we shouldn't fix the _lane and _n cases (or the previous saturating cases). It's just that each time we find a group of functions that's awkward for some reason, it'd be better to deal with those functions separately. Thanks, Richard