Hi, I looked at a few performance anomalies between gfortran and Flang - it appears array slices are treated differently. Using -frepack-arrays fixed a performance issue in gfortran and didn't cause any regressions. Making input array slices contiguous helps both locality and enables more vectorization.
So I wonder whether it should be made the default (-O3 or just -Ofast)? Alternatively would it be feasible in Fortran to version functions or loops if all arguments are contiguous slices? Wilco