On 10 December 2014 at 10:34, Alan Lawrence <alan.lawre...@arm.com> wrote:
> Thanks, Charles. A couple of thoughts.
>
> I think the approach in patches 2+3+4 of using
> __builtin_aarch64_im_lane_boundsi is justified and works quite neatly.
> Modulo the question of argument ordering and __AARCH64_LANE_CHECK, those
> patches look good.
>
> However, the SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX, seems a lot of
> infrastructure to introduce if we are only going to use it in one place, and
> I think I might argue in favour of using ...__im_lane_bound or
> AARCH64_LANE_CHECK there also. Of course all of this palaver stems from
> using the same builtins for both D- and Q-reg intrinsics, and I suspect some
> cleanup may be due to those intrinsics *at some point*, but probably not in
> time for gcc 5.0.
>
> However, this does mean that if I use a D-reg intrinsic
> with a lane index that's out of bounds for the Q-reg too, I get a double
> error message: e.g. for testcase
>
> int8x8x4_t
> f_vld4_lane (int8_t * p, int8x8x4_t v)
> {
>   int8x8x4_t res;
>   return vld4_lane_s8 (p, v, 18);
> }
>
> I get output:
>
> In file included from gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c:5:0:
> .../install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h: In function
> 'f_vld4_lane':
> .../install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h:18123:1:
> error: lane 18 out of range 0 - 7
>  __LD4_LANE_FUNC (int8x8x4_t, int8x8_t, int8x16x4_t, int8_t, v16qi, qi, s8,
>  ^
> In function 'vld4_lane_s8',
>     inlined from 'f_vld4_lane' at
> gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c:12:7:
> .../install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h:18123:1:
> error: lane 18 out of range 0 - 15
>  __LD4_LANE_FUNC (int8x8x4_t, int8x8_t, int8x16x4_t, int8_t, v16qi, qi, s8,
>  ^
>
> which (although not serious) could be mildly confusing.

Oh dear, this is rather sad. Aesthetically, I think the builtins
should protect themselves from direct misuse, but I can't think of a
clean way to prevent this.

It could be done like this, but I don't think the end result really
justifies it.

  __o = __builtin_aarch64_ld4_lane##mode
((__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c &
(__NUMBER_OF_LANES(__b.val[0]) - 1));

Reply via email to