https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #37 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 2 Jun 2020, guojiufu at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 > > --- Comment #36 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> --- > (In reply to Jakub Jelinek from comment #10) > > If the compiler knew say from PGO that pos is usually a multiple of certain > > power of two and that the loop usually iterates many times (I guess the > > latter can be determined from comparing the bb count of the loop itself and > > its header), it could emit something like: > > static int func2(int max, int pos, unsigned char *cur) > > { > > unsigned char *p = cur + pos; > > int len = 0; > > if (max > 32 && (pos & 7) == 0) > > { > > int l = ((1 - ((uintptr_t) cur)) & 7) + 1; > > while (++len != l) > > if (p[len] != cur[len]) > > goto end; > > unsigned long long __attribute__((may_alias)) *p2 = (unsigned long > > long *) &p[len]; > > unsigned long long __attribute__((may_alias)) *cur2 = (unsigned long > > long *) &cur[len]; > > while (len + 8 < max) > > { > > if (*p2++ != *cur2++) > > break; > > len += 8; > > } > > --len; > > } > > while (++len != max) > > if (p[len] != cur[len]) > > break; > > end: > > return cur[len]; > > } > > > > or so (untested). Of course, it could be done using SIMD too if there is a > > way to terminate the loop if any of the elts is different and could be done > > in that case at 16 or 32 or 64 characters at a time etc. > > But, without knowing that pos is typically some power of two this would just > > waste code size, dealing with the unaligned cases would be more complicated > > (one can't read the next elt until proving that the current one is all > > equal), so it would need to involve some rotations (or permutes for SIMD). > > Unaligned reading is supported on some platforms already, and reading > multi-bytes(64/128bits) takes far less cost than reading 8bits multi-times, > extremely, dword reading may cost the same cycles as byte reading. > As the above discussions, there are still a few kinds of stuff need to take > care of. I’m wondering if we could introduce this as a compiler optimization > in some circumstances. The challenge is as always to not regress cases too much where it ends up not profitable rather than just making it work for SPEC ;) Luckily chasing SPEC numbers isn't our primary objective.