Richard Biener <rguent...@suse.de> writes:
> The x86 backend looks at the SLP node passed to the add_stmt_cost
> hook when costing vec_construct, looking for elements that require
> a move from a GPR to a vector register and cost that.  But since
> vect_prologue_cost_for_slp decomposes the cost for an external
> SLP node into individual pieces this cost gets applied N times
> without a chance for the backend to know it's just dealing with
> a part of the SLP node.  Just looking at a part is also not perfect
> since the GPR to XMM move cost applies only once per distinct
> element so handling the whole SLP node one more correctly reflects
> cost (albeit without considering other external SLP nodes).
>
> The following addresses the issue by passing down the SLP node
> only for one piece and nullptr for the rest.  The x86 backend
> is currently the only one looking at it.
>
> In the future the cost of external elements is something to deal
> with globally but that would require the full SLP tree be available
> to costing.
>
> It's difficult to write a testcase, at the tipping point not
> vectorizing is better so I'll followup with x86 specific adjustments
> and will see to add a testcase later.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Richard, we talked about this issue two weeks ago and I was looking
> for a solution that would be OK for backporting if the need arises.
> The following is what I could come up with that retains the whole
> SLP-node wide "CSE" of the element move cost.  Is that OK until
> we come up with a better plan for trunk at some point?

Yeah, seems like a neat workaround to me FWIW.

Thanks,
Richard

>
> Thanks,
> Richard.
>
>       PR tree-optimization/109747
>       * tree-vect-slp.cc (vect_prologue_cost_for_slp): Pass down
>       the SLP node only once to the cost hook.
> ---
>  gcc/tree-vect-slp.cc | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index e5c9d7e766e..a6f277c5e21 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -6069,6 +6069,7 @@ vect_prologue_cost_for_slp (slp_tree node,
>      }
>    /* ???  We're just tracking whether vectors in a single node are the same.
>       Ideally we'd do something more global.  */
> +  bool passed = false;
>    for (unsigned int start : starts)
>      {
>        vect_cost_for_stmt kind;
> @@ -6078,7 +6079,15 @@ vect_prologue_cost_for_slp (slp_tree node,
>       kind = scalar_to_vec;
>        else
>       kind = vec_construct;
> -      record_stmt_cost (cost_vec, 1, kind, node, vectype, 0, vect_prologue);
> +      /* The target cost hook has no idea which part of the SLP node
> +      we are costing so avoid passing it down more than once.  Pass
> +      it to the first vec_construct or scalar_to_vec part since for those
> +      the x86 backend tries to account for GPR to XMM register moves.  */
> +      record_stmt_cost (cost_vec, 1, kind,
> +                     (kind != vector_load && !passed) ? node : nullptr,
> +                     vectype, 0, vect_prologue);
> +      if (kind != vector_load)
> +     passed = true;
>      }
>  }

Reply via email to