On 10 December 2014 at 10:34, Alan Lawrence <alan.lawre...@arm.com> wrote: > Thanks, Charles. A couple of thoughts. > > I think the approach in patches 2+3+4 of using > __builtin_aarch64_im_lane_boundsi is justified and works quite neatly. > Modulo the question of argument ordering and __AARCH64_LANE_CHECK, those > patches look good. > > However, the SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX, seems a lot of > infrastructure to introduce if we are only going to use it in one place, and > I think I might argue in favour of using ...__im_lane_bound or > AARCH64_LANE_CHECK there also. Of course all of this palaver stems from > using the same builtins for both D- and Q-reg intrinsics, and I suspect some > cleanup may be due to those intrinsics *at some point*, but probably not in > time for gcc 5.0. > > However, this does mean that if I use a D-reg intrinsic > with a lane index that's out of bounds for the Q-reg too, I get a double > error message: e.g. for testcase > > int8x8x4_t > f_vld4_lane (int8_t * p, int8x8x4_t v) > { > int8x8x4_t res; > return vld4_lane_s8 (p, v, 18); > } > > I get output: > > In file included from gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c:5:0: > .../install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h: In function > 'f_vld4_lane': > .../install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h:18123:1: > error: lane 18 out of range 0 - 7 > __LD4_LANE_FUNC (int8x8x4_t, int8x8_t, int8x16x4_t, int8_t, v16qi, qi, s8, > ^ > In function 'vld4_lane_s8', > inlined from 'f_vld4_lane' at > gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c:12:7: > .../install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h:18123:1: > error: lane 18 out of range 0 - 15 > __LD4_LANE_FUNC (int8x8x4_t, int8x8_t, int8x16x4_t, int8_t, v16qi, qi, s8, > ^ > > which (although not serious) could be mildly confusing.
Oh dear, this is rather sad. Aesthetically, I think the builtins should protect themselves from direct misuse, but I can't think of a clean way to prevent this. It could be done like this, but I don't think the end result really justifies it. __o = __builtin_aarch64_ld4_lane##mode ((__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c & (__NUMBER_OF_LANES(__b.val[0]) - 1));