https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551
Bug ID: 110551
Summary: [11 / 12 / 13 /14 regression] Suboptimal codegen for
128 bits multiplication on x86_64
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: moncef.mechri at gmail dot com
Target Milestone: ---
https://godbolt.org/z/3hdondY6n
Codegen for the code shared above (which is a mixing step in boost.Unordered
when a non-avalanching hash function is being used [1] ) regressed since GCC
11. I believe there are 2 regressions:
Regression 1:
A redundant move is introduced:
movabs rcx, -7046029254386353131
mov rax, rcx
The regression seems to be present at all optimization levels above -O0
(including -Os and -Og).
Possibly a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804
Regression 2
When using -march=haswell or newer, GCC >= 11 emits mulx. The resulting code is
longer (by 1 instruction) with no clear benefit to my untrained eyes. It looks
to me like the code generated by GCC 10 is optimal, even for haswell and newer.
I am reporting both issues in the same bug report because they seem related
enough. Let me know if you want me to split them into 2 bug reports instead.
[1]
https://github.com/boostorg/unordered/blob/9a7d1d336aaa73ad8e5f7c07bdb81b2e793f8d93/include/boost/unordered/detail/mulx.hpp#L111