Re: [PATCH V3] aarch64: Add bfloat16 vldN_lane_bf16 + vldNq_lane_bf16 intrisics

Richard Sandiford via Gcc-patches Tue, 27 Oct 2020 04:26:03 -0700

Andrea Corallo <andrea.cora...@arm.com> writes:
> Richard Sandiford <richard.sandif...@arm.com> writes:
>
>> Andrea Corallo <andrea.cora...@arm.com> writes:
>>> Richard Sandiford <richard.sandif...@arm.com> writes:
>>>
>>>> Andrea Corallo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>>> Hi all,
>>>>>
>>>>> Second version of the patch here implementing the bfloat16_t neon
>>>>> related load intrinsics: vld2_lane_bf16, vld2q_lane_bf16,
>>>>> vld3_lane_bf16, vld3q_lane_bf16 vld4_lane_bf16, vld4q_lane_bf16.
>>>>>
>>>>> This better narrows testcases so they do not cause regressions for the
>>>>> arm backend where these intrinsics are not yet present.
>>>>>
>>>>> Please see refer to:
>>>>> ACLE <https://developer.arm.com/docs/101028/latest>
>>>>> ISA  <https://developer.arm.com/docs/ddi0596/latest>
>>>>
>>>> The intrinsics are documented to require +bf16, but it looks like this
>>>> makes the bf16 forms available without that.  (This is enforced indirectly,
>>>> by complaining that the intrinsic wrapper can't be inlined into a caller
>>>> that uses incompatible target flags.)
>>>>
>>>> Perhaps we should keep the existing intrinsics where they are and
>>>> just move the #undefs to the end, similarly to __aarch64_vget_lane_any.
>>>>
>>>> Thanks,
>>>> Richard
>>>
>>> Hi Richard,
>>>
>>> thanks for reviewing.  I was wondering if wouldn't be better to wrap the
>>> new intrinsic definition into the correct pragma so the macro definition
>>> stays narrowed.  WDYT?
>>
>> I guess there's not much in it either way, but IMO it would be more
>> consistent to keep the +bf16 stuff together.  That's already what we
>> do for the vget_lane macros.  And the only reason for grouping based
>> on function rather than based on feature for this patch is because the
>> functions happen to use macro definitions.  It feels odd for that to be
>> a determining factor, so that, e.g., the vreinterpret functions and the
>> full vld2 functions are grouped based on feature, but the vld2_lane
>> functions are grouped based on function.
>>
>> Thanks,
>> Richard
>
> Hi Richard,
>
> I had a look a little more closely and just moving the #undefs to the
> end of the file is not viable as these macros are: defined, undefined,
> redefined and finally undefined to generate the intrinsic and theier 'q'
> variants.
>
> In the attached patch the pragmas are added around the bfloat intrinsics
> without moving the code.
>
> Other option would be to rename some of these macro so they can be
> undefed at the end of the file without overlapping.  Please let me know
> if you prefer this way, I'll be happy to rework the patches accordingly.


Yeah, that sounds better (sorry).  This file is big enough and hard
enough to parse without overloaded macro names adding to the fun.
Generating the vld2q functions from __LD2Q_LANE_FUNC rather than
__LD2_LANE_FUNC seems more mnemonic as well as solving the undef
problem.

Thanks,
Richard

Re: [PATCH V3] aarch64: Add bfloat16 vldN_lane_bf16 + vldNq_lane_bf16 intrisics

Reply via email to