On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:
...

This is the best I could do: https://run.dlang.io/is/dm8LBP
For some reason, LDC refuses to vectorize or even just unroll the nonparallel version, and more than one `parallel` corrupts the results. But judging by the results you expected and what you described, you could maybe replace it by a ton of `c[] = a[] *operand* b[]` operations? Unless you use conditionals after or do something else that confuses the compiler, it will maybe use SSE/AVX instructions, and at worst use basic loop unrolling.

Reply via email to