https://bugs.llvm.org/show_bug.cgi?id=50653
Bug ID: 50653
Summary: [AArch64] Generate sqdmlal
Product: new-bugs
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: unassignedb...@nondot.org
Reporter: sjoerd.mei...@arm.com
CC: htmldevelo...@gmail.com, llvm-bugs@lists.llvm.org
Raising this missed optimisation opportunity in case someone finds this
interesting.
For this input:
#include "arm_neon.h"
int32_t
t_vqdmlalh_lane_s16 (int32_t a, int16_t b, int16x4_t c) {
return vqdmlalh_lane_s16 (a, b, c, 0);
}
We are not generating this multiply-accumulate variant that gcc generates:
t_vqdmlalh_lane_s16:
dup v2.4h, w1
fmov s1, w0
sqdmlal s1, h2, v0.h[0]
fmov w0, s1
ret
We get this instead:
t_vqdmlalh_lane_s16: // @t_vqdmlalh_lane_s16
fmov s1, w1
sqdmull v0.4s, v1.4h, v0.4h
fmov s1, w0
sqadd s0, s1, s0
fmov w0, s0
ret
See also https://godbolt.org/z/41nMxM5q1
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs