On Wed, Feb 12, 2025 at 4:38 PM Visda.Vokhshoori--- via Gcc
<gcc@gcc.gnu.org> wrote:
>
> Embench is used for benchmarking on embedded devices.
> This one project matmult-int has a function Multiply.  It’s a matrix 
> multiplication for 20 x 20 matrix.
> The device is a ATSAME70Q21B which is Cortex-M7
> The compiler is arm branch based on GCC version 13
> We are compiling with O3 which has loop-interchange pass on by default.
>
> When we compile with -fno-loop-interchange we get all 22% back plus 5% speed 
> up.
>
> When we do the loop interchange on the one loop nest that get interchanged it 
> is slightly (.7%) faster.
>
> Has anyone else seen large degradation as a result of loop interchange?

I would suggest to compare the -fopt-info diagnostic output with and
without -fno-loop-interchange,
the interchanged loop might for example no longer vectorize.  Other
than that - no, loop interchange
isn't applied very often and it has a very conservative cost model.

Are you able to share a testcase?

Richard.

>
> Thanks

Reply via email to