Re: [RFC] Implementing detection of saturation and rounding arithmetic

Richard Biener Wed, 12 May 2021 02:28:27 -0700

On Wed, 12 May 2021, Richard Sandiford wrote:

> Tamar Christina <tamar.christ...@arm.com> writes:
> > Hi All,
> >
> > We are looking to implement saturation support in the compiler.  The aim is 
> > to
> > recognize both Scalar and Vector variant of typical saturating expressions.
> >
> > As an example:
> >
> > 1. Saturating addition:
> >    char sat (char a, char b)
> >    {
> >       int tmp = a + b;
> >       return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >    }
> >
> > 2. Saturating abs:
> >    char sat (char a)
> >    {
> >       int tmp = abs (a);
> >       return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >    }
> >
> > 3. Rounding shifts
> >    char rndshift (char dc)
> >    {
> >       int round_const = 1 << (shift - 1);
> >       return (dc + round_const) >> shift;
> >    }
> >
> > etc.
> >
> > Of course the first issue is that C does not really have a single idiom for
> > expressing this.
> >
> > At the RTL level we have ss_truncate and us_truncate and float_truncate for
> > truncation.
> >
> > At the Tree level we have nothing for truncation (I believe) for scalars. 
> > For
> > Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like
> > nothing actually generates this at the moment. it's just an unused tree 
> > code.
> >
> > For rounding there doesn't seem to be any existing infrastructure.
> >
> > The proposal to handle these are as follow, keep in mind that all of these 
> > also
> > exist in their scalar form, as such detecting them in the vectorizer would 
> > be
> > the wrong place.
> >
> > 1. Rounding:
> >    a) Use match.pd to rewrite various rounding idioms to shifts.
> >    b) Use backwards or forward prop to rewrite these to internal functions
> >       where even if the target does not support these rounding instructions 
> > they
> >       have a chance to provide a more efficient implementation than what 
> > would
> >       be generated normally.
> >
> > 2. Saturation:
> >    a) Use match.pd to rewrite the various saturation expressions into 
> > min/max
> >       operations which opens up the expressions to further optimizations.
> >    b) Use backwards or forward prop to convert to internal functions if the
> >       resulting min/max expression still meet the criteria for being a
> >       saturating expression.  This follows the algorithm as outlined in "The
> >       Software Vectorization handbook" by Aart J.C. Bik.
> >
> >       We could get the right instructions by using combine if we don't 
> > rewrite
> >       the instructions to an internal function, however then during 
> > Vectorization
> >       we would overestimate the cost of performing the saturation.  The 
> > constants
> >       will the also be loaded into registers and so becomes a lot more 
> > difficult
> >       to cleanup solely in the backend.
> >
> > The one thing I am wondering about is whether we would need an internal 
> > function
> > for all operations supported, or if it should be modelled as an internal FN 
> > which
> > just "marks" the operation as rounding/saturating. After all, the only 
> > difference
> > between a normal and saturating expression in RTL is the xx_truncate RTL 
> > surrounding
> > the expression.  Doing so would also mean that all targets whom have 
> > saturating
> > instructions would automatically benefit from this.
> 
> I might have misunderstood what you meant here, but the *_truncate
> RTL codes are true truncations: the operand has to be wider than the
> result.  Using this representation for general arithmetic is a problem
> if you're operating at the maximum size that the target supports natively.
> E.g. representing a 64-bit saturating addition as:
> 
>   - extend to 128 bits
>   - do a 128-bit addition
>   - truncate to 64 bits
> 
> is going to be hard to cost and code-generate on targets that don't support
> native 128-bit operations (or at least, don't support them cheaply).
> This might not be a problem when recognising C idioms, since the C source
> code has to be able do the wider operation before truncating the result,
> but it could be a problem if we provide built-in functions or if we want
> to introduce compiler-generated saturating operations.
> 
> RTL already has per-operation saturation such as ss_plus/us_plus,
> ss_minus/us_minus, ss_neg/us_neg, ss_mult/us_mult, ss_div,
> ss_ashift/us_ashift and ss_abs.  I think we should do the same
> in gimple, using internal functions like you say.


I think that for followup optimizations using regular arithmetic
ops and just new saturating truncations is better.  Maybe we can
also do both, with first only matching the actual saturation
with a new tree code and then later match the optabs the target
actually supports (in ISEL for example)?

Truly saturating ops might provide an interesting example how
to deal with -ftrapv - one might think we can now simply
use the trapping optabs as internal functions to reflect
-ftrapv onto the IL ...

Richard.

> Thanks,
> Richard
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [RFC] Implementing detection of saturation and rounding arithmetic

Reply via email to