https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85697
Bug ID: 85697 Summary: At -Os nontrivial ctor does not use SSE to zero Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: msharov at users dot sourceforge.net Target Milestone: --- struct alignas(16) A { A (void) :a(0),b(0),c(0),d(0) {} int a,b,c,d; }; __attribute__((noinline)) void UseA (A& a) { a.a=1; } int main (void) { A a {}; UseA (a); return a.a; } -Os -march=native on Haswell, generates: main: subq $16, %rsp movq %rsp, %rdi movq $0, (%rsp) movq $0, 8(%rsp) call _Z4UseAR1A movl (%rsp), %eax addq $16, %rsp ret Using 16 bytes to zero A with 2 movq. With -O3: main: subq $24, %rsp vpxor %xmm0, %xmm0, %xmm0 movq %rsp, %rdi vmovaps %xmm0, (%rsp) call _Z4UseAR1A movl (%rsp), %eax addq $24, %rsp ret using only 9 bytes for pxor/movaps. With -mno-avx it is 7 bytes for xorps/movaps. With multiple objects of type A, the savings would be even greater, since only one pxor would be needed for all and only 4 bytes per object for zeroing. Removing A constructor also results in SSE instruction use.