https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107533
Bug ID: 107533 Summary: Inefficient code sequence for fp16 testcase on aarch64 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ramana at gcc dot gnu.org Target Milestone: --- Derived from PR92999 struct phalf { __fp16 first; __fp16 second; }; struct phalf phalf_copy(struct phalf* src) __attribute__((noinline)); struct phalf phalf_copy(struct phalf* src) { return *src; } Compiling for AArch64 with a recent enough compiler produces. phalf_copy: ldr w0, [x0] ubfx x1, x0, 0, 16 lsr w0, w0, 16 dup v0.4h, w1 dup v1.4h, w0 ret Couldn't it just be ldr h0, [x0] ldr h1, [x0, 2] IIRC this is in base v8 rather than v8.2 regards Ramana