"丁乐华" <lehua.d...@rivai.ai> writes:
> > I don't think this pattern is correct, because SEL isn't commutative
> > in the vector operands.
>
> Indeed, I think I should invert PRED operand or the comparison
> operator which produce the PRED operand first.

That would work, but it would no longer be a win.  The vectoriser already
has code to try to reuse existing predicates where possible, to increase
the chances that the operand order of VEC_COND_EXPRs is reasonable.

> > I think this should be:
> >
> > if (...)
> >  to = XEXP (to, 0);>
> > and should be before the REG_P test. We don't want to treat
> > arbitrary duplicates as profitable.
>
> Agree, the adjustment is more rigorous.
>
> > It's not obvious that vec_duplicate is special enough that we should
> > treat it differently from other unary operators. For example,
> > zero_extend and sign_extend don't seem fundamentally more expensive
> > than vec_duplicate.
>
> Juzhe and I also discussed offline recently. We also have widened vector
> operator that needs to be added, this can be finished in RTL with forwarding
> instead of adding widen GIMPLE internal function. We think we can add a
> TARGET HOOK, for example: 
> `rtx try_forward (rtx dest, rtx src, rtx use_insn, rtx def_insn)`
>
>
> If it returns NULL_RTX, it means that it cannot be forwarded, otherwise
> it means replace the dest part in use_insn with the returned rtx.
> Letting the backend decide which ones can be forwarded has several
> advantages compared to:
> 1. Let the insn related to TARGET, such as unspec, also can be forwarded,
>   and when forwarding, the corresponding content can be extracted
>   from def_insn instead of the complete src part.
> 2. By default this HOOK returns NULL_TREE, which can reduce compatibility
>   issues.

Personally, I'm not in favour of a hook along these lines.  I think
it would effectively split the pass between target-independent and
target-specific code, which (a) tends to lead to more duplication
between targets and (b) makes it harder to test for correctness
(as opposed to performance) when updating the target-independent code.

If a value can't be forwarded, then either (a) substitution will fail
to give a valid instruction or (b) the new instruction will be more
costly than the old one (as measured by existing hooks).

The possible downsides (e.g. on register pressure, as you mention below)
are something that target-independent code should deal with, since it
can look at the function as a whole.

> > It's a while since I looked at this code, but I assume that, even after
> > this change, we will still require the new in-loop instruction to be
> > no more expensive than the old in-loop instruction. Is that right?
>
>
> Yeah. Forwarding vec_duplicate maybe reduce the use of vector registers,
> but increase the life cycle of scalar registers. If the scalar register 
> pressure
> is higher, this change may become more expensive. This decision does not
> feel very easy to make, is there some way to do this?

Yeah.  But on many architectures, scalar floats are stored in the same
register file as vectors, so whether this is a problem will depend also
on the mode of the scalar.

Also, the cost is different if we eliminate all uses of the duplicate in
the loop vs. if we only eliminate some.

The handling of flag_ira_hoist_pressure is one example of code that
tries to use register pressure to guide optimisation, but I don't
know the code very well.  (Of course, if we did reuse that,
we'd want to commonise it rather than duplicate it.)

Thanks,
Richard

Reply via email to