On Thu, Jan 11, 2024 at 10:52 AM Robin Dapp <rdapp....@gmail.com> wrote:
>
> On 1/11/24 10:46, juzhe.zh...@rivai.ai wrote:
> > Oh. I see I think I have done wrong here.
> >
> > I should adjust cost for VEC_EXTRACT not VEC_SET.
> >
> > But it's odd, I didn't see loop vectorizer is scanning scalar_to_vec
> > cost in vect.dump.
>
> The slidedown/vmv.x.s part is of course vec_extract but we indeed
> don't seem to cost it as vec_to_scalar here.

It looks like a vectorized live operation as it's not in the loop body
(and thus really irrelevant for costing in practice).  This has

      /* ???  Enable for loop costing as well.  */
      if (!loop_vinfo)
        record_stmt_cost (cost_vec, 1, vec_to_scalar, stmt_info, NULL_TREE,
                          0, vect_epilogue);

so live ops are not costed at all.  I would suggest to try unconditionally
enabling this?

> vmv.vx correspond to scalar_to_vec and I'd say 3 seems a
> bit high when a regular vector instruction is "1".
> It should rather be dependent on the latency between register
> files.  We can't really say in general but I'd say "2" is not so bad.
>
> I would suggest adding special handling in builtin_vectorization_cost
> like:
>
> /* Add register-register latency.  */
> case scalar_to_vec:
>   return common_costs->scalar_to_vec_cost + riscv_register_move_cost (...)
>
> and adjust register_move_cost accordingly.  Instead of using
> register_move_cost we could also use a cost structure directly.
> (E.g. like aarch64's regmove tuning structures.  Those don't
> contain VRs but for us it could make sense to add them).
>
> > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -ftree-vectorize 
> > -fdump-tree-vect-details" } */
> With a cost of "3" we still vectorize for zvl512b and larger.
> Is that intended?  I don't really see why 512 should vectorized
> but 256 not.  Disregarding that everything should be optimized
> away, 2 iterations for the whole loop with 256 bits doesn't
> seem that bad.
>
> Regards
>  Robin
>

Reply via email to