https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107533

            Bug ID: 107533
           Summary: Inefficient code sequence for fp16 testcase on aarch64
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ramana at gcc dot gnu.org
  Target Milestone: ---

Derived from PR92999 



struct phalf {
    __fp16 first;
    __fp16 second;
};

struct phalf phalf_copy(struct phalf* src) __attribute__((noinline));
struct phalf phalf_copy(struct phalf* src) {
    return *src;
}

Compiling for AArch64 with a recent enough compiler produces. 

phalf_copy:
        ldr     w0, [x0]
        ubfx    x1, x0, 0, 16
        lsr     w0, w0, 16
        dup     v0.4h, w1
        dup     v1.4h, w0
        ret


Couldn't it just be ldr h0, [x0]
                    ldr h1, [x0, 2] 

IIRC this is in base v8 rather than v8.2 


regards
Ramana

Reply via email to