On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote:
> On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
> [...]
> > > > I can't really insist on the single lane load.. something like:
> > > > 
> > > > vc:V4SI[0] = c
> > > > vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
> > > > va:V4SI = vb:V4SI <op> vt:V4SI
> > > > 
> > > > Or is there any other way to do this?
> > > 
> > > Can you elaborate on "I can't really insist on the single lane load"?
> > > What's the single lane load in your example? 
> > 
> > Loading just one lane of the vector like this:
> > 
> > vc:V4SI[0] = c // from the above scalar example
> > 
> > or 
> > 
> > vc:V4SI[0] = c[2] 
> > 
> > is what I meant by single lane load. In this example:
> > 
> > t = c[2] 
> > ...
> > vb:v4si = b[0:3] 
> > vc:v4si = { t, t, t, t }
> > va:v4si = vb:v4si <op> vc:v4si 
> > 
> > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I cannot
> > insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which could be 
> > seen as vec_select:SI (vect_t 0) ). 
> > 
> > > I'd expect the instruction
> > > pattern as quoted to just work (and I hope we expand an uniform
> > > constructor { a, a, a, a } properly using vec_duplicate).
> > 
> > As much as I went through the code, this is only done using vect_init. It is
> > not expanded as vec_duplicate from, for example, store_constructor() of 
> > expr.c
> 
> Do you see any issues if we expand such constructor as vec_duplicate directly 
> instead of going through vect_init way? 

Sorry, that was a bad question.

But here's what I would like to propose as a first step. Please tell me if this
is acceptable or if it makes sense:

- Introduce standard pattern names 

"vmulim4" - vector muliply with second operand as indexed operand

Example:

(define_insn "vmuliv4si4"
   [set (match_operand:V4SI 0 "register_operand")
        (mul:V4SI (match_operand:V4SI 1 "register_operand")
                  (vec_duplicate:V4SI
                    (vec_select:SI
                      (match_operand:V4SI 2 "register_operand")
                      (match_operand:V4SI 3 "immediate_operand)))))]
 ...
)

"vlmovmn3" - move where one of the operands is specific lane of a vector and 
             other is a scalar. 

Example:

(define_insn "vlmovv4sisi3"
  [set (vec_select:SI (match_operand:V4SI 0 "register_operand")
                      (match_operand:SI 1 "immediate_operand"))
       (match_operand:SI 2 "memory_operand")]
  ...
)

- Identify the following idiom and expand through the above standard patterns:

  t = c[m] 
  vc[0:n] = { t, t, t, t}
  a[0:n] = b[0:n] * vc[0:n] 

as 

 (insn (set (vec_select:SI (reg:V4SI 0) 0) (mem:SI ... )))
 (insn (set (reg:V4SI 1)
            (mult:V4SI (reg:V4SI 2)
                       (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 0) 0)))))

If this path is acceptable, then I can extend this to support 

"vmaddim4" - multiply and add (with indexed element as multiplier)
"vmsubim4" - multiply and subtract (with indexed element as multiplier)

Please let me know your thoughts.

Cheers
VP


Reply via email to