Richard Biener <rguent...@suse.de> writes: > On Fri, 12 May 2023, Richard Sandiford wrote: > >> Richard Biener <rguent...@suse.de> writes: >> > On Fri, 12 May 2023, Andre Vieira (lists) wrote: >> > >> >> I have dealt with, I think..., most of your comments. There's quite a few >> >> changes, I think it's all a bit simpler now. I made some other changes to >> >> the >> >> costing in tree-inline.cc and gimple-range-op.cc in which I try to >> >> preserve >> >> the same behaviour as we had with the tree codes before. Also added some >> >> extra >> >> checks to tree-cfg.cc that made sense to me. >> >> >> >> I am still regression testing the gimple-range-op change, as that was a >> >> last >> >> minute change, but the rest survived a bootstrap and regression test on >> >> aarch64-unknown-linux-gnu. >> >> >> >> cover letter: >> >> >> >> This patch replaces the existing tree_code widen_plus and widen_minus >> >> patterns with internal_fn versions. >> >> >> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and >> >> DEF_INTERNAL_OPTAB_NARROWING_HILO_FN >> >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN >> >> respectively >> >> except they provide convenience wrappers for defining conversions that >> >> require >> >> a hi/lo split. Each definition for <NAME> will require optabs for _hi >> >> and _lo >> >> and each of those will also require a signed and unsigned version in the >> >> case >> >> of widening. The hi/lo pair is necessary because the widening and >> >> narrowing >> >> operations take n narrow elements as inputs and return n/2 wide elements >> >> as >> >> outputs. The 'lo' operation operates on the first n/2 elements of input. >> >> The >> >> 'hi' operation operates on the second n/2 elements of input. Defining an >> >> internal_fn along with hi/lo variations allows a single internal function >> >> to >> >> be returned from a vect_recog function that will later be expanded to >> >> hi/lo. >> >> >> >> >> >> For example: >> >> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO >> >> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> >> >> (u/s)addl2 >> >> IFN_VEC_WIDEN_PLUS_LO -> >> >> vec_widen_<su>add_lo_<mode> >> >> -> (u/s)addl >> >> >> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS >> >> tree >> >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. >> > >> > What I still don't understand is how we are so narrowly focused on >> > HI/LO? We need a combined scalar IFN for pattern selection (not >> > sure why that's now called _HILO, I expected no suffix). Then there's >> > three possibilities the target can implement this: >> > >> > 1) with a widen_[su]add<mode> instruction - I _think_ that's what >> > RISCV is going to offer since it is a target where vector modes >> > have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead >> > RVV can do a V4HI to V4SI widening and widening add/subtract >> > using vwadd[u] and vwsub[u] (the HI->SI widening is actually >> > done with a widening add of zero - eh). >> > IIRC GCN is the same here. >> >> SVE currently does this too, but the addition and widening are >> separate operations. E.g. in principle there's no reason why >> you can't sign-extend one operand, zero-extend the other, and >> then add the result together. Or you could extend them from >> different sizes (QI and HI). All of those are supported >> (if the costing allows them). > > I see. So why does the target the expose widen_[su]add<mode> at all?
It shouldn't (need to) do that. I don't think we should have an optab for the unsplit operation. At least on SVE, we really want the extensions to be fused with loads (where possible) rather than with arithmetic. We can still do the widening arithmetic in one go. It's just that fusing with the loads works for the mixed-sign and mixed-size cases, and can handle more than just doubling the element size. >> If the target has operations to do combined extending and adding (or >> whatever), then at the moment we rely on combine to generate them. >> >> So I think this case is separate from Andre's work. The addition >> itself is just an ordinary addition, and any widening happens by >> vectorising a CONVERT/NOP_EXPR. >> >> > 2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree >> > codes currently support (exclusively) >> > 3) similar, but widen_[su]add{_even,_odd}<mode> >> > >> > that said, things like decomposes_to_hilo_fn_p look to paint us into >> > a 2) corner without good reason. >> >> I suppose one question is: how much of the patch is really specific >> to HI/LO, and how much is just grouping two halves together? > > Yep, that I don't know for sure. > >> The nice >> thing about the internal-fn grouping macros is that, if (3) is >> implemented in future, the structure will strongly encourage even/odd >> pairs to be supported for all operations that support hi/lo. That is, >> I would expect the grouping macros to be extended to define even/odd >> ifns alongside hi/lo ones, rather than adding separate definitions >> for even/odd functions. >> >> If so, at least from the internal-fn.* side of things, I think the question >> is whether it's OK to stick with hilo names for now, or whether we should >> use more forward-looking names. > > I think for parts that are independent we could use a more > forward-looking name. Maybe _halves? Using _halves for the ifn macros sounds good to me FWIW. > But I'm also not sure > how much of that is really needed (it seems to be tied around > optimizing optabs space?) Not sure what you mean by "this". Optabs space shouldn't be a problem though. The optab encoding gives us a full int to play with, and it could easily go up to 64 bits if necessary/convenient. At least on the internal-fn.* side, the aim is really just to establish a regular structure, so that we don't have arbitrary differences between different widening operations, or too much cut-&-paste. Thanks, Richard