https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89739

            Bug ID: 89739
           Summary: pessimizing vectorization at -O3 to load two u64
                    objects
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/8vIGZ3

using u64 = unsigned long long;
struct u128 {u64 a, b;};

inline u64 load8(void* ptr) {
    u64 out;
    __builtin_memcpy(&out, ptr, 8);
    return out;
}

u128 load(char* basep, u64 n) {
    return {load8(basep), load8(basep+n-8)};
}

At -O2 this emits ideal asm:
        mov     rax, QWORD PTR [rdi]
        mov     rdx, QWORD PTR [rdi-8+rsi]
        ret


At -O3 it is comical:
        movq    xmm0, QWORD PTR [rdi]
        movhps  xmm0, QWORD PTR [rdi-8+rsi]
        movaps  XMMWORD PTR [rsp-24], xmm0
        mov     rax, QWORD PTR [rsp-24]
        mov     rdx, QWORD PTR [rsp-16]
        ret

This seems to have been introduced in gcc7

Reply via email to