https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
--- Comment #19 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 31 Jan 2023, tnfchris at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583 > > --- Comment #18 from Tamar Christina <tnfchris at gcc dot gnu.org> --- > > > > > > Ack, that also tracks with what I tried before, we don't indeed track > > > ranges > > > for vector ops. The general case can still be handled slightly better (I > > > think) > > > but it doesn't become as clear of a win as this one. > > > > > > > You probably did so elsewhere some time ago, but what exactly are those > > > > four instructions? (pointers to specifications appreciated) > > > > > > For NEON we use: > > > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/ADDHN--ADDHN2--Add-returning-High-Narrow- > > > > so thats a add + pack high > > > > Yes, though with no overflow, the addition is done in twice the precision of > the original type. So it's more a widening add + pack high which narrows it > back and zero extends. > > > > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/UADDW--UADDW2--Unsigned-Add-Wide- > > > > and that unpacks (zero-extends) the high/low part of one operand of an add > > > > I wonder if we'd open-code the pack / unpack and use regular add whether > > combine can synthesize uaddw and addhn? The pack and unpack would be > > vec_perms on GIMPLE (plus V_C_E). > > I don't think so for addhn, because it wouldn't truncate the top bits, it > truncates the bottom bits. > > The instruction does > element1 = Elem[operand1, e, 2*esize]; > element2 = Elem[operand2, e, 2*esize]; > > So it widens on input. OK, so that's an ADD_HIGHPART_EXPR then? Though the highpart of an add is only a single bit, isn't it? For scalar you'd use the carry bit here and instructions like adc to consume it. Is addhn to do such thing on vectors? When writing generic vector code is combine able to synthesize addhn from widen, plus, pack-high? As said in the old discussion I'm not opposed to adding new IFNs, but I'd like to see useful building blocks (that ideally map to ISAs) instead of IFN-for-complex-pattern-X The alternative way was to improve division expansion in general which is the can_div_special_by_const_p thing, but we do not seem to be able to capture the requirements correctly here. > > So the difficulty here will be to decide whether that's in the end > > better than what the pattern handling code does now, right? Because > > I think most targets will be able to do the above but lacking the > > special adds it will be slower because of the extra packing/unpacking? > > > > That said, can we possibly do just that costing (would be a first in > > the pattern code I guess) with a target hook? Or add optabs for > > the addh operations so we can query support? > > We could, the alternative wouldn't be correct for costing I think.. if we > generate *+ , vec_perm that's gonna be more expensive. Well, the target cost model can always detect such patterns ... But sure, using the actual ISA is preferable for costing and also to avoid "breaking" the combination by later "optimization". OTOH at least some basic constant folding for all such ISA IFNs is required to avoid regressing cases where complete unrolling later allows constant evaluation but first vectorizing breaking that.