On Thu, Feb 06, 2025 at 08:44:35AM +0000, chiranmoy.bhattacha...@fujitsu.com wrote: >> Does this hand-rolled loop unrolling offer any particular advantage? What >> do the numbers look like if we don't do this or if we process, say, 4 >> vectors at a time? > > The unrolled version performs better than the non-unrolled one, but > processing four vectors provides no additional benefit. The numbers > and code used are given below.
Hm. Any idea why that is? I wonder if the compiler isn't using as many SVE registers as it could for this. -- nathan