[Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2

rguenther at suse dot de via Gcc-bugs Tue, 31 Jan 2023 23:29:55 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583


--- Comment #19 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 31 Jan 2023, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583
> 
> --- Comment #18 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > > 
> > > Ack, that also tracks with what I tried before, we don't indeed track 
> > > ranges
> > > for vector ops. The general case can still be handled slightly better (I 
> > > think)
> > > but it doesn't become as clear of a win as this one.
> > > 
> > > > You probably did so elsewhere some time ago, but what exactly are those
> > > > four instructions?  (pointers to specifications appreciated)
> > > 
> > > For NEON we use:
> > > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/ADDHN--ADDHN2--Add-returning-High-Narrow-
> > 
> > so thats a add + pack high
> > 
> 
> Yes, though with no overflow, the addition is done in twice the precision of
> the original type. So it's more a widening add + pack high which narrows it
> back and zero extends.
> 
> > > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/UADDW--UADDW2--Unsigned-Add-Wide-
> > 
> > and that unpacks (zero-extends) the high/low part of one operand of an add
> > 
> > I wonder if we'd open-code the pack / unpack and use regular add whether
> > combine can synthesize uaddw and addhn?  The pack and unpack would be
> > vec_perms on GIMPLE (plus V_C_E).
> 
> I don't think so for addhn, because it wouldn't truncate the top bits, it
> truncates the bottom bits.
> 
> The instruction does
>     element1 = Elem[operand1, e, 2*esize];
>     element2 = Elem[operand2, e, 2*esize];
> 
> So it widens on input. 

OK, so that's an ADD_HIGHPART_EXPR then?  Though the highpart of an
add is only a single bit, isn't it?  For scalar you'd use the
carry bit here and instructions like adc to consume it.  Is addhn
to do such thing on vectors?

When writing generic vector code is combine able to synthesize addhn
from widen, plus, pack-high?

As said in the old discussion I'm not opposed to adding new IFNs,
but I'd like to see useful building blocks (that ideally map to ISAs)
instead of IFN-for-complex-pattern-X

The alternative way was to improve division expansion in general
which is the can_div_special_by_const_p thing, but we do not seem
to be able to capture the requirements correctly here.

> > So the difficulty here will be to decide whether that's in the end
> > better than what the pattern handling code does now, right?  Because
> > I think most targets will be able to do the above but lacking the
> > special adds it will be slower because of the extra packing/unpacking?
> > 
> > That said, can we possibly do just that costing (would be a first in
> > the pattern code I guess) with a target hook?  Or add optabs for
> > the addh operations so we can query support?
> 
> We could, the alternative wouldn't be correct for costing I think.. if we
> generate *+ , vec_perm that's gonna be more expensive.

Well, the target cost model can always detect such patterns ...

But sure, using the actual ISA is preferable for costing and also to
avoid "breaking" the combination by later "optimization".  OTOH at least
some basic constant folding for all such ISA IFNs is required to
avoid regressing cases where complete unrolling later allows constant
evaluation but first vectorizing breaking that.

[Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2

Reply via email to