https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|ra                          |

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Simplified testcase which shows the issue even on x86:
```
typedef float float32_t;
template<int N, int M, int K>
void f(const float32_t *__restrict a, const float32_t *__restrict b, float32_t
*c) {
    for (int i = 0; i < N; ++i) {
        for (int j=0; j < M; ++j) {
            for (int k=0; k < K; ++k) {
                c[i*N + j] += a[i*K + k] * b[k*M + j];
            }
        }
    }
}

template void f<16, 16, 16>(const float32_t *__restrict a, const float32_t
*__restrict b, float32_t *c);
```

Reply via email to