On Wed, Feb 12, 2025 at 4:38 PM Visda.Vokhshoori--- via Gcc <gcc@gcc.gnu.org> wrote: > > Embench is used for benchmarking on embedded devices. > This one project matmult-int has a function Multiply. It’s a matrix > multiplication for 20 x 20 matrix. > The device is a ATSAME70Q21B which is Cortex-M7 > The compiler is arm branch based on GCC version 13 > We are compiling with O3 which has loop-interchange pass on by default. > > When we compile with -fno-loop-interchange we get all 22% back plus 5% speed > up. > > When we do the loop interchange on the one loop nest that get interchanged it > is slightly (.7%) faster. > > Has anyone else seen large degradation as a result of loop interchange?
I would suggest to compare the -fopt-info diagnostic output with and without -fno-loop-interchange, the interchanged loop might for example no longer vectorize. Other than that - no, loop interchange isn't applied very often and it has a very conservative cost model. Are you able to share a testcase? Richard. > > Thanks