https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #26 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> --- Had a test on spec2017 xz_r by changing the specified loop manually, on ppc64le. original loop (this loops occur three times in code): while (++len != len_limit) if (pb[len] != cur[len]) break; changed to loop: typedef long long __attribute__((may_alias)) TYPEE; for(++len; len + sizeof(TYPEE) <= len_limit; len += sizeof(TYPEE)) { long long a = *((TYPEE*)(cur+len)); long long b = *((TYPEE*)(pb+len)); if (a != b) { break; //to optimize len can be move forward here. } } for (;len != len_limit; ++len) if (pb[len] != cur[len]) break; We can see xz_r runtime improved from 433s to 382s(>12%). It would be very valuable to do this kind of widening reading/checking.