https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Tamar Christina from comment #5) > __attribute__ ((__simd__ ("notinbranch"), const)) > double cos (double); So here the backend is then probably responsible to parse this into a valid list of simdlen cases. > void foo (float *a, double *b) > { > for (int i = 0; i < 12; i+=3) > { > b[i] = cos (5.0 * a[i]); > b[i+1] = cos (5.0 * a[i+1]); > b[i+2] = cos (5.0 * a[i+2]); > } > } > > Simple C example that shows the problem. > > This seems to happen when SLP succeeds and the group size is a non power of > two. > The vectorizer then unrolls to make it a power of two and during > vectorization > it seems to destroy the vector, make the call and reconstruct it. > > So this seems like an SLP vectorization bug. I can't seem to trigger it > however on GCC < 14 since SLP consistently fails for all my examples because > it tries a mode that's larger than the vector size. On the 13 branch and x86_64 the above results in a large VF and using _ZGVbN2v_cos, same on trunk. > So It may be a GCC 14 only regression, but I think it's latent in the > vectorizer. I think there's sth odd with the backend here, but I can confirm the behavior. Note it analyzes and costs VF == 4 and V2DF resulting in 6 calls but then code generation comes along doing sth different!?