https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122611

--- Comment #6 from IonuČ› Nicula <nicula at nicula dot xyz> ---
Never mind, I think I got it.

So basically GCC assumes that the loads within that 64-byte region are
side-effect-free. So the loop in the source code (not the assembly) has 2
possible cases:
1. The bitwise-or of the 4 bytes is non-zero, so `true` will be returned no
matter what; but since the other loads don't have side-effects they won't
change the result and they can be 'taken' from the next iteration;
2. The bitwise-or of the first 4 bytes is zero, so therefore we'd go to the
next iteration anyway to read the next 4 bytes which will be valid as per `i <
len`. In this case we apply the same logic from (1) and conclude that even the
following 4 bytes (so bytes 9-12) are OK to read. Repeat until the logic
essentially allows reading the entire 64-byte region.

Also pasting the analysis I got from Miguel Young (mcyoung.xyz) on this, since
his explanation might be better.

""" This is not a miscompilation. If GCC can prove that a load is side-effect
free, it can perform that load regardless of whether the load appears in the
original program. In this case, GCC assumes that if you can load any one byte
of an aligned 16-byte region, a load of any of the other 15 (in any order, and
with any load instruction) is side-effect free. The only situation in which
this would not be legal is if that region straddles an x86 page boundary, which
the alignment check precludes. (To exclude pathological cases, GCC further
assumes that loads to the same object can be fused or split into any sequence
of instructions, because it assumes that memory mappings are not changed from
under user code concurrently, as justified by C++ having a data-race-free
memory model.) """

Anyway, sorry for the false alarm and appreciate the quick response!

Reply via email to