https://bugs.llvm.org/show_bug.cgi?id=50653

            Bug ID: 50653
           Summary: [AArch64] Generate sqdmlal
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedb...@nondot.org
          Reporter: sjoerd.mei...@arm.com
                CC: htmldevelo...@gmail.com, llvm-bugs@lists.llvm.org

Raising this missed optimisation opportunity in case someone finds this
interesting.

For this input:

#include "arm_neon.h"
int32_t
t_vqdmlalh_lane_s16 (int32_t a, int16_t b, int16x4_t c) {
  return vqdmlalh_lane_s16 (a, b, c, 0);
}

We are not generating this multiply-accumulate variant that gcc generates:

t_vqdmlalh_lane_s16:
        dup     v2.4h, w1
        fmov    s1, w0
        sqdmlal s1, h2, v0.h[0]
        fmov    w0, s1
        ret

We get this instead:

t_vqdmlalh_lane_s16:                    // @t_vqdmlalh_lane_s16
        fmov    s1, w1
        sqdmull v0.4s, v1.4h, v0.4h
        fmov    s1, w0
        sqadd   s0, s1, s0
        fmov    w0, s0
        ret

See also https://godbolt.org/z/41nMxM5q1

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to