https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99218
anlauf at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- Known to fail| |8.4.1 --- Comment #3 from anlauf at gcc dot gnu.org --- Playing with the reduced testcase: fTmp = matmul (transpose (G), lambda(::2)) is clean while fTmp = matmul (transpose (G), lambda(::1)) is not. It seems we run into a highly optimized code for rank-2 times rank-2. The following patch fixes the testcase: diff --git a/libgfortran/m4/matmul_internal.m4 b/libgfortran/m4/matmul_internal.m4 index 13fd7696238..0e96207a0fc 100644 --- a/libgfortran/m4/matmul_internal.m4 +++ b/libgfortran/m4/matmul_internal.m4 @@ -192,7 +192,8 @@ sinclude(`matmul_asm_'rtype_code`.m4')dnl } } - if (rxstride == 1 && axstride == 1 && bxstride == 1) + if (rxstride == 1 && axstride == 1 && bxstride == 1 + && GFC_DESCRIPTOR_RANK (b) != 1) { /* This block of code implements a tuned matmul, derived from Superscalar GEMM-based level 3 BLAS, Beta version 0.1