Andrea Corallo <andrea.cora...@arm.com> writes:
> Richard Sandiford <richard.sandif...@arm.com> writes:
>
>> Andrea Corallo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>> Hi all,
>>>
>>> Second version of the patch here implementing the bfloat16_t neon
>>> related load intrinsics: vld2_lane_bf16, vld2q_lane_bf16,
>>> vld3_lane_bf16, vld3q_lane_bf16 vld4_lane_bf16, vld4q_lane_bf16.
>>>
>>> This better narrows testcases so they do not cause regressions for the
>>> arm backend where these intrinsics are not yet present.
>>>
>>> Please see refer to:
>>> ACLE <https://developer.arm.com/docs/101028/latest>
>>> ISA  <https://developer.arm.com/docs/ddi0596/latest>
>>
>> The intrinsics are documented to require +bf16, but it looks like this
>> makes the bf16 forms available without that.  (This is enforced indirectly,
>> by complaining that the intrinsic wrapper can't be inlined into a caller
>> that uses incompatible target flags.)
>>
>> Perhaps we should keep the existing intrinsics where they are and
>> just move the #undefs to the end, similarly to __aarch64_vget_lane_any.
>>
>> Thanks,
>> Richard
>
> Hi Richard,
>
> thanks for reviewing.  I was wondering if wouldn't be better to wrap the
> new intrinsic definition into the correct pragma so the macro definition
> stays narrowed.  WDYT?

I guess there's not much in it either way, but IMO it would be more
consistent to keep the +bf16 stuff together.  That's already what we
do for the vget_lane macros.  And the only reason for grouping based
on function rather than based on feature for this patch is because the
functions happen to use macro definitions.  It feels odd for that to be
a determining factor, so that, e.g., the vreinterpret functions and the
full vld2 functions are grouped based on feature, but the vld2_lane
functions are grouped based on function.

Thanks,
Richard

Reply via email to