I am testing the latest GCC with not-yet-submitted GLIBC changes that implement libmvec on Aarch64.
While trying to run SPEC 2017 (specifically 521.wrf_r) I ran into a case where GCC was generating a call to _ZGVnN2vv_powf, that is a vectorized powf call for 2 (not 4) elements. This was a problem because I only implemented a 4 element 32 bit vectorized powf function for libmvec and not a 2 element version. I think this is due to aarch64_simd_clone_compute_vecsize_and_simdlen which allows for (element count * element size) to be either 64 or 128. I would like some thoughts on what we should do about this, should we require glibc/libmvec to provide 2 element 32 bit floating point vector functions (as well as the 4 element ones) or should we change aarch64_simd_clone_compute_vecsize_and_simdlen to only allow 4 element (128 total bit size) vectors and not 2 element (64 total bit size) ones? This is obviously a question for the pre-SVE vector instructions, I am not sure how this would be handled in SVE. Steve Ellcey sell...@marvell.com P.S. Here a test case in Fortran that generated the 2 element vector call. It unrolled the loop into one vector call of 2 elements and one scalar call. SUBROUTINE FOO(B,W,P) REAL, DIMENSION (3) :: W, P DO 10 I = 1, 3 P(I) = W(I) ** B 10 CONTINUE END SUBROUTINE FOO