https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #13 from Wilco <wilco at gcc dot gnu.org> --- So to add some real numbers to the discussion, the average number of iterations is 4.31. Frequency stats (16 includes all iterations > 16 too): 1: 29.0 2: 4.2 3: 1.0 4: 36.7 5: 8.7 6: 3.4 7: 3.0 8: 2.6 9: 2.1 10: 1.9 11: 1.6 12: 1.2 13: 0.9 14: 0.8 15: 0.7 16: 2.1 So unrolling 4x is perfect for this loop. Note the official xz version has optimized this loop since 2014(!) using unaligned accesses: https://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/common/memcmplen.h