https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89739
Bug ID: 89739 Summary: pessimizing vectorization at -O3 to load two u64 objects Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: redbeard0531 at gmail dot com Target Milestone: --- https://godbolt.org/z/8vIGZ3 using u64 = unsigned long long; struct u128 {u64 a, b;}; inline u64 load8(void* ptr) { u64 out; __builtin_memcpy(&out, ptr, 8); return out; } u128 load(char* basep, u64 n) { return {load8(basep), load8(basep+n-8)}; } At -O2 this emits ideal asm: mov rax, QWORD PTR [rdi] mov rdx, QWORD PTR [rdi-8+rsi] ret At -O3 it is comical: movq xmm0, QWORD PTR [rdi] movhps xmm0, QWORD PTR [rdi-8+rsi] movaps XMMWORD PTR [rsp-24], xmm0 mov rax, QWORD PTR [rsp-24] mov rdx, QWORD PTR [rsp-16] ret This seems to have been introduced in gcc7