“the interchanged loop might for example no longer vectorize.” The loops are not vectorized. Which is ok, because this device doesn’t have the support for it. I just don’t think a pass could single handedly make code slower that much.
Loop interchange is supposed to interchange the loop nest index with outer index to improve cache locality. This is supposed to help -that is the next iteration we will have the data available in cache. The benchmark source –and the loop that gets interchanged is line 143 Source: https://github.com/embench/embench-iot/blob/master/src/matmult-int/matmult-int.c#L143 This loop is where most of the time is spent. But it would have been good if I had access to h/w tracing to see if the interchanged loop reduces cache misses as well as to see what is causing it to run this much slower. Thanks for your reply! From: Richard Biener <richard.guent...@gmail.com> Date: Thursday, February 13, 2025 at 2:57 AM To: Visda Vokhshoori - C51841 <visda.vokhsho...@microchip.com> Cc: gcc@gcc.gnu.org <gcc@gcc.gnu.org> Subject: Re: 22% degradation seen in embench:matmult-int [You don't often get email from richard.guent...@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe On Wed, Feb 12, 2025 at 4:38 PM Visda.Vokhshoori--- via Gcc <gcc@gcc.gnu.org> wrote: > > Embench is used for benchmarking on embedded devices. > This one project matmult-int has a function Multiply. It’s a matrix > multiplication for 20 x 20 matrix. > The device is a ATSAME70Q21B which is Cortex-M7 > The compiler is arm branch based on GCC version 13 > We are compiling with O3 which has loop-interchange pass on by default. > > When we compile with -fno-loop-interchange we get all 22% back plus 5% speed > up. > > When we do the loop interchange on the one loop nest that get interchanged it > is slightly (.7%) faster. > > Has anyone else seen large degradation as a result of loop interchange? I would suggest to compare the -fopt-info diagnostic output with and without -fno-loop-interchange, the interchanged loop might for example no longer vectorize. Other than that - no, loop interchange isn't applied very often and it has a very conservative cost model. Are you able to share a testcase? Richard. > > Thanks