https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58483
Marc Glisse <glisse at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2014-05-28 Ever confirmed|0 |1 --- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> --- With current trunk, for the vector version, the most obvious optimization failure is the following: _57 = (unsigned long) &MEM[(void *)&._79 + 12B]; _36 = (unsigned long) &MEM[(void *)&._79 + 4B]; _51 = _57 - _36; _3 = _51 /[ex] 4; _26 = _3; _27 = _26 + 1; _31 = _27 * 4; etc. _51 should be folded to 8 (then we have the usual useless /4*4, but with a constant it would be ok). This part is a 4.9/4.10 regression, gcc-4.8 managed to get the constant 12: __builtin_memcpy (_33, &._80, 12);