RE: [RFC] Implementing detection of saturation and rounding arithmetic

Tamar Christina via Gcc Wed, 12 May 2021 02:15:04 -0700

> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Wednesday, May 12, 2021 9:48 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc@gcc.gnu.org; Richard Biener <rguent...@suse.de>
> Subject: Re: [RFC] Implementing detection of saturation and rounding
> arithmetic
> 
> Tamar Christina <tamar.christ...@arm.com> writes:
> > Hi All,
> >
> > We are looking to implement saturation support in the compiler.  The
> > aim is to recognize both Scalar and Vector variant of typical saturating
> expressions.
> >
> > As an example:
> >
> > 1. Saturating addition:
> >    char sat (char a, char b)
> >    {
> >       int tmp = a + b;
> >       return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >    }
> >
> > 2. Saturating abs:
> >    char sat (char a)
> >    {
> >       int tmp = abs (a);
> >       return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >    }
> >
> > 3. Rounding shifts
> >    char rndshift (char dc)
> >    {
> >       int round_const = 1 << (shift - 1);
> >       return (dc + round_const) >> shift;
> >    }
> >
> > etc.
> >
> > Of course the first issue is that C does not really have a single
> > idiom for expressing this.
> >
> > At the RTL level we have ss_truncate and us_truncate and
> > float_truncate for truncation.
> >
> > At the Tree level we have nothing for truncation (I believe) for
> > scalars. For Vector code there already seems to be VEC_PACK_SAT_EXPR
> > but it looks like nothing actually generates this at the moment. it's just 
> > an
> unused tree code.
> >
> > For rounding there doesn't seem to be any existing infrastructure.
> >
> > The proposal to handle these are as follow, keep in mind that all of
> > these also exist in their scalar form, as such detecting them in the
> > vectorizer would be the wrong place.
> >
> > 1. Rounding:
> >    a) Use match.pd to rewrite various rounding idioms to shifts.
> >    b) Use backwards or forward prop to rewrite these to internal functions
> >       where even if the target does not support these rounding instructions
> they
> >       have a chance to provide a more efficient implementation than what
> would
> >       be generated normally.
> >
> > 2. Saturation:
> >    a) Use match.pd to rewrite the various saturation expressions into
> min/max
> >       operations which opens up the expressions to further optimizations.
> >    b) Use backwards or forward prop to convert to internal functions if the
> >       resulting min/max expression still meet the criteria for being a
> >       saturating expression.  This follows the algorithm as outlined in "The
> >       Software Vectorization handbook" by Aart J.C. Bik.
> >
> >       We could get the right instructions by using combine if we don't 
> > rewrite
> >       the instructions to an internal function, however then during
> Vectorization
> >       we would overestimate the cost of performing the saturation.  The
> constants
> >       will the also be loaded into registers and so becomes a lot more 
> > difficult
> >       to cleanup solely in the backend.
> >
> > The one thing I am wondering about is whether we would need an
> > internal function for all operations supported, or if it should be
> > modelled as an internal FN which just "marks" the operation as
> > rounding/saturating. After all, the only difference between a normal
> > and saturating expression in RTL is the xx_truncate RTL surrounding
> > the expression.  Doing so would also mean that all targets whom have
> saturating instructions would automatically benefit from this.
> 
> I might have misunderstood what you meant here, but the *_truncate RTL
> codes are true truncations: the operand has to be wider than the result.
> Using this representation for general arithmetic is a problem if you're
> operating at the maximum size that the target supports natively.
> E.g. representing a 64-bit saturating addition as:
> 
>   - extend to 128 bits
>   - do a 128-bit addition
>   - truncate to 64 bits
> 

Ah, that wasn't clear from the documentation.. The one for the normal truncate
mentions that the modes have to be wider but the _truncate and friends don't
mention this constraint.  That would indeed not work..

> is going to be hard to cost and code-generate on targets that don't support
> native 128-bit operations (or at least, don't support them cheaply).
> This might not be a problem when recognising C idioms, since the C source
> code has to be able do the wider operation before truncating the result, but
> it could be a problem if we provide built-in functions or if we want to
> introduce compiler-generated saturating operations.
> 
> RTL already has per-operation saturation such as ss_plus/us_plus,
> ss_minus/us_minus, ss_neg/us_neg, ss_mult/us_mult, ss_div,
> ss_ashift/us_ashift and ss_abs.  I think we should do the same in gimple,
> using internal functions like you say.

Oh, I didn't know about those. Indeed those look like a better fit here.
Having all the operations separate in RTL already does seem to imply
That separate internal-fns is the way to go.

Thanks!

Regards,
Tamar

> 
> Thanks,
> Richard
RE: [RFC] Implementing detection of saturation and rounding arithmetic

Reply via email to