https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902
--- Comment #33 from Rocco Tormenta <rocco at tormenta dot eu> --- (In reply to Andrew Pinski from comment #32) > Note this has always worked to avoid FMA formation since > __builtin_assoc_barrier was added but is only been documented recently. > See > https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index- > _005f_005fbuiltin_005fassoc_005fbarrier . Happy new year! Right, that works if you want to disable FMA and make the results consistent. Though what I was hoping to do was the same but the other way around, have both results use FMA. If anyone is looking to do that for the time being you can use __builtin_fmaf or fmaf from math.h, both seem to disable vectorization of the loop (I guess the latter is using the former), which is the real culprit here from what I've gathered.