On Tue, 2025-09-23 at 15:56 +0200, Richard Biener wrote:
> On Tue, 23 Sep 2025, Avinash Jayakar wrote:
> 
> > Hi,
> > 
> > I had a question regarding the function vect_pattern_recog that is
> > triggered in the slp/vectorization pass. 
> > In case the original code is already in vector form, for example
> > below
> > is the original gimple dump of a vector function
> > 
> > 
> > ;; Function lshift1_64 (null)
> > ;; enabled by -tree-original
> > 
> > 
> > {
> >   __vector unsigned long long D.4059 = { 2, 2 };
> > 
> >   return <<< Unknown tree: compound_literal_expr
> >         __vector unsigned long long D.4059 = { 2, 2 }; >>> * a;
> > }
> > 
> > 
> > This however does not go through the pattern recognition since 
> > 1. It is already in vector form
> > 2. This check fails in slp pass
> >   if (bb_vinfo->grouped_stores.is_empty ()
> >       && bb_vinfo->roots.is_empty ())
> > 
> > So my main question is, suppose this line of code (in this example
> > v1 *
> > {2, 2}) do go through the pattern recognition, it could have
> > generated
> > better code like v1 + v1 or v1 << {1,1}) which could result in
> > vectorization in certain target which does not have native double
> > word
> > multiplication support, but might have double word shift or add. 
> > Is this intended or is there a way have this code pattern
> > recognized?
> 
> The vectorizer is currently not set up for re-vectorizing already
> vectorized code.  Instead for the situation you describe, a target
> without vector multiplication support, it would be the task of the
> vector lowering pass (tree-vect-generic.cc) to turn this into a
> supported operation.
> 
I looked into the tree-vect-generic.cc, the function
expand_vector_operations_1 function. 


I encountered this while fixing the PR119702. If I write a simple
scalar code

void lshift1_64(uint64_t *a) {
  a[0] *= 2;
  a[1] *= 2;
}
This does get vectorized as a << {1,1}. But writing this 
vector unsigned long long
lshift1_64 (vector unsigned long long a, vector unsigned long long b)
{
  return a * (vector unsigned long long) { 2, 2 };
}
gets converted to scalar code during veclower pass. 

I see 2 ways of fixing it for multiply expression
1. In expand_vector_operations_1 function, before lowering to scalar we
can check if code is MULT_EXPR and see if it can be implemented with
shifts/add/sub (as done in pattern recognition), like it is done for
LROTATE_EXPR and RROTATE_EXPR. 
2. Implement mulv2di3 for this specific target (which does exactly what
scalar code would do), and let expand pass (expand_mult) take care of
converting mult to shift/add/sub.

Thanks and regards,
Avinash Jayakar

Reply via email to