https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
Bug ID: 87599
Summary: Broadcasting scalar to vector uses stack unnecessarily
on x86
Product: gcc
Version: 8.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vgatherps at gmail dot com
Target Milestone: ---
When compiled on GCC 8.2 with -O2,
typedef long long __m128i __attribute__ ((__vector_size__ (16),
__may_alias__));
__m128i vectorize(long val) {
__m128i rval = {val, val};
return rval;
}
generates the following code:
mov QWORD PTR [rsp-16], rdi
movq xmm0, QWORD PTR [rsp-16]
punpcklqdq xmm0, xmm0
ret
Which could be replaced with
movq xmm0, rdi
punpcklqdq xmm0, xmm0
ret
Interestingly, according to godbolt, the current trunk makes this optimization
with -Os but not with -O2 or -O3.