Andrea Corallo via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi all, > > Second version of the patch here implementing the bfloat16_t neon > related load intrinsics: vld2_lane_bf16, vld2q_lane_bf16, > vld3_lane_bf16, vld3q_lane_bf16 vld4_lane_bf16, vld4q_lane_bf16. > > This better narrows testcases so they do not cause regressions for the > arm backend where these intrinsics are not yet present. > > Please see refer to: > ACLE <https://developer.arm.com/docs/101028/latest> > ISA <https://developer.arm.com/docs/ddi0596/latest>
The intrinsics are documented to require +bf16, but it looks like this makes the bf16 forms available without that. (This is enforced indirectly, by complaining that the intrinsic wrapper can't be inlined into a caller that uses incompatible target flags.) Perhaps we should keep the existing intrinsics where they are and just move the #undefs to the end, similarly to __aarch64_vget_lane_any. Thanks, Richard