https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111143
Bug ID: 111143 Summary: [missed optimization] unlikely code slows down diffutils x86-64 ASCII processing Product: gcc Version: 13.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eggert at cs dot ucla.edu Target Milestone: --- Created attachment 55788 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55788&action=edit source code illustrating the performance problem This bug report may be related to bug 110823 (also found for diffutils) but the symptoms differ somewhat so I am reporting it separately. I observed it with GCC 13.1.1 20230614 (Red Hat 13.1.1-4) on x86-64. While tuning GNU diffutils I noticed that its loops to process mostly-ASCII text were not compiled well by GCC on x86-64. For a stripped-down example of the problem, compile the attached program with: gcc -O2 -S code-mcel.c The result is in the attached file code-mcel.s. Its loop kernel assuming ASCII text (starting on line 44) looks like this: .L6: movsbq (%rbx), %rax testb %al, %al js .L4 addq %rax, %r12 movl $1, %eax .L5: addq %rax, %rbx cmpq %r13, %rbx jb .L6 The "movl $1, %eax" immediately followed by "addq %rax, %rbx" is poorly scheduled; the resulting dependency makes the code run quite a bit slower than it should. Replacing it with "addq $1, %rbx" and readjusting the surrounding code accordingly, as is done in the attached file code-mcel-opt.s, causes the benchmark to run 38% faster on my laptop's Intel i5-1335U. It seems that code that GCC knows is unlikely (because of __builtin_expect) is causing the kernel, which GCC knows is likely, to be poorly optimized.