https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|ra | --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Simplified testcase which shows the issue even on x86: ``` typedef float float32_t; template<int N, int M, int K> void f(const float32_t *__restrict a, const float32_t *__restrict b, float32_t *c) { for (int i = 0; i < N; ++i) { for (int j=0; j < M; ++j) { for (int k=0; k < K; ++k) { c[i*N + j] += a[i*K + k] * b[k*M + j]; } } } } template void f<16, 16, 16>(const float32_t *__restrict a, const float32_t *__restrict b, float32_t *c); ```