https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104665

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2022-02-23
           Severity|normal                      |enhancement
     Ever confirmed|0                           |1

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
A couple of reasons.
First is store merging happens too late.
Second reason is store merging does not work in the loop case.

Take:
#if 1
enum b : unsigned char{};
#else
typedef unsigned char b;
#endif

void serialize_le(b* __restrict dst, const unsigned* __restrict src)
{
   // for (int i = 0; i < 128; ++i, ++src)
    {
        unsigned t = *src;
        *dst++ = static_cast<b>((t >>  0) & 0xff);
        *dst++ = static_cast<b>((t >>  8) & 0xff);
        *dst++ = static_cast<b>((t >> 16) & 0xff);
        *dst++ = static_cast<b>((t >> 24) & 0xff);
    }
}

This gets optimized to one load followed by one store. But once you add the
loop, and use -fno-tree-vectorize (because GCC's vectorizer gets kicked in
which causes other issues), the stores are not merged into one.

Also store merging happens way after loop distrubution happens so ...

Reply via email to