On Mon, 26 Feb 2024, Andre Vieira (lists) wrote: > > > On 05/02/2024 09:56, Richard Biener wrote: > > On Thu, 1 Feb 2024, Andre Vieira (lists) wrote: > > > >> > >> > >> On 01/02/2024 07:19, Richard Biener wrote: > >>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > >>> > >>> > >>> The patch didn't come with a testcase so it's really hard to tell > >>> what goes wrong now and how it is fixed ... > >> > >> My bad! I had a testcase locally but never added it... > >> > >> However... now I look at it and ran it past Richard S, the codegen isn't > >> 'wrong', but it does have the potential to lead to some pretty slow > >> codegen, > >> especially for inbranch simdclones where it transforms the SVE predicate > >> into > >> an Advanced SIMD vector by inserting the elements one at a time... > >> > >> An example of which can be seen if you do: > >> > >> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 -fopenmp-simd t.c -S > >> > >> with the following t.c: > >> #pragma omp declare simd simdlen(4) inbranch > >> int __attribute__ ((const)) fn5(int); > >> > >> void fn4 (int *a, int *b, int n) > >> { > >> for (int i = 0; i < n; ++i) > >> b[i] = fn5(a[i]); > >> } > >> > >> Now I do have to say, for our main usecase of libmvec we won't have any > >> 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course > >> that > >> doesn't mean user-code will. > > > > It seems to use SVE masks with vector(4) <signed-boolean:4> and the > > ABI says the mask is vector(4) int. You say that's because we choose > > a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5). > > > > The vectorizer creates > > > > _44 = VEC_COND_EXPR <loop_mask_41, { 1, 1, 1, 1 }, { 0, 0, 0, 0 }>; > > > > and then vector lowering decomposes this. That means the vectorizer > > lacks a check that the target handles this VEC_COND_EXPR. > > > > Of course I would expect that SVE with VLS vectors is able to > > code generate this operation, so it's missing patterns in the end. > > > > Richard. > > > > What should we do for GCC-14? Going forward I think the right thing to do is > to add these patterns. But I am not even going to try to do that right now and > even though we can codegen for this, the result doesn't feel like it would > ever be profitable which means I'd rather not vectorize, or well pick a > different vector mode if possible. > > This would be achieved with the change to the targethook. If I change the hook > to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now?
Passing in a mode is OK. I'm still not fully understanding why the clone isn't fully specifying 'mode' and if it does not why the vectorizer itself can not disregard it. >From the past discussion I understood the existing situation isn't as bad as initially thought and no bad things happen right now? Thanks, Richard.