Re: [PATCH V2] aarch64: Add bfloat16 vldN_lane_bf16 + vldNq_lane_bf16 intrisics

Richard Sandiford via Gcc-patches Mon, 26 Oct 2020 08:53:46 -0700

Andrea Corallo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi all,
>
> Second version of the patch here implementing the bfloat16_t neon
> related load intrinsics: vld2_lane_bf16, vld2q_lane_bf16,
> vld3_lane_bf16, vld3q_lane_bf16 vld4_lane_bf16, vld4q_lane_bf16.
>
> This better narrows testcases so they do not cause regressions for the
> arm backend where these intrinsics are not yet present.
>
> Please see refer to:
> ACLE <https://developer.arm.com/docs/101028/latest>
> ISA  <https://developer.arm.com/docs/ddi0596/latest>


The intrinsics are documented to require +bf16, but it looks like this
makes the bf16 forms available without that.  (This is enforced indirectly,
by complaining that the intrinsic wrapper can't be inlined into a caller
that uses incompatible target flags.)

Perhaps we should keep the existing intrinsics where they are and
just move the #undefs to the end, similarly to __aarch64_vget_lane_any.

Thanks,
Richard

Re: [PATCH V2] aarch64: Add bfloat16 vldN_lane_bf16 + vldNq_lane_bf16 intrisics

Reply via email to