https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910
Bug ID: 106910 Summary: roundss not vectorized Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: pilarlatiesa at gmail dot com Target Milestone: --- GCC 7 and newer optimize `std::floor(float)` into `vroundss` with -O2 and -march=skylake, which is great. However, I can see in Compiler Explorer that the following example: ``` #include <cmath> struct TVec { float x, y; }; struct TKey { int i, j; }; class TDom { private: static int Floor(float const x) { return static_cast<int>(std::floor(x)); } public: TKey CalcKey(TVec const &) const; }; TKey TDom::CalcKey(TVec const &r) const { return {Floor(r.x), Floor(r.y)}; } ``` produces: ``` vxorps %xmm1, %xmm1, %xmm1 vroundss $9, (%rsi), %xmm1, %xmm0 vroundss $9, 4(%rsi), %xmm1, %xmm1 vunpcklps %xmm1, %xmm0, %xmm0 vcvttps2dq %xmm0, %xmm2 vmovq %xmm2, %rax ret ``` Couldn’t the pair of `vroundss` have been merged into a single `vroundps`?