https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100769
Bug ID: 100769 Summary: [D] memcmp() == 0 for small constant strings not folded Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: d Assignee: ibuclaw at gdcproject dot org Reporter: witold.baryluk+gcc at gmail dot com Target Milestone: --- I expect this D code to be quite optimal, but it isn't. ``` extern(C) int memcmp(const void *s1, const void *s2, size_t n); int recognize3(const char* s) { return memcmp(s, "stract class", 12) == 0; } ``` https://godbolt.org/z/vx17WK9rs It produces a call to memcmp, instead of inlining and specializing the code for this specific case. int example.recognize3(const(char*)): sub rsp, 8 mov edx, 12 mov esi, OFFSET FLAT:.LC0 call memcmp test eax, eax sete al add rsp, 8 movzx eax, al ret ldc2 1.24.0 (for D) and clang 11.0.1-2 (for C and C++), and gcc 10.2.1 (for C and C++) produce close to optimal codes. Similarly ldc2 1.26.0 (for D), and gcc 11.1 (for C and C++): int example.recognize3(const(char*)): movabs rcx, 7142836979195081843 xor rcx, qword ptr [rdi] mov edx, dword ptr [rdi + 8] xor rdx, 1936941420 xor eax, eax or rdx, rcx sete al ret and recognize3: movabs rax, 7142836979195081843 cmp QWORD PTR [rdi], rax je .L6 .L2: mov eax, 1 xor eax, 1 ret .L6: xor eax, eax cmp DWORD PTR [rdi+8], 1936941420 jne .L2 xor eax, 1 ret Notice, how both gcc, clang and ldc2, compare first 8 bytes of input, then 4 bytes of input. clang and ldc2 just xor/or the result, then return, with no conditional jumps. gcc does a bit poorer, with more conditionals and more jumps, but still pretty good and same idea. gdc however, calls the generic memcmp, that does looping and does about 12 jumps and/or 13 exists.