https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57037
--- Comment #1 from Harald Anlauf <anlauf at gmx dot de> --- (In reply to Harald Anlauf from comment #0) > gfortran (using -Ofast -fprefetch-loop-arrays) exactly > reproduces the performance of the Intel compiler without > temporal stores. It appears that this is an important > optimization. I tried a current snapshot from trunk (r219084) and found that -fprefetch-loop-arrays now gives an additional boost, matching Intel v15 for the above code, even without the streaming stores.