https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

            Bug ID: 106910
           Summary: roundss not vectorized
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pilarlatiesa at gmail dot com
  Target Milestone: ---

GCC 7 and newer optimize `std::floor(float)` into `vroundss` with -O2 and
-march=skylake, which is great.

However, I can see in Compiler Explorer that the following example:

```
#include <cmath>

struct TVec { float x, y; };

struct TKey { int i, j; };

class TDom
{
private:
  static int Floor(float const x)
    { return static_cast<int>(std::floor(x)); }

public:
  TKey CalcKey(TVec const &) const;
};

TKey TDom::CalcKey(TVec const &r) const
  { return {Floor(r.x), Floor(r.y)}; }
```

produces:

```

vxorps      %xmm1, %xmm1, %xmm1
vroundss    $9, (%rsi), %xmm1, %xmm0
vroundss    $9, 4(%rsi), %xmm1, %xmm1
vunpcklps   %xmm1, %xmm0, %xmm0
vcvttps2dq  %xmm0, %xmm2
vmovq       %xmm2, %rax
ret
```

Couldn’t the pair of `vroundss` have been merged into a single `vroundps`?

Reply via email to