On Tue, 11 May 2021, Tamar Christina wrote:

> Hi All,
> 
> We are looking to implement saturation support in the compiler.  The aim is to
> recognize both Scalar and Vector variant of typical saturating expressions.
> 
> As an example:
> 
> 1. Saturating addition:
>    char sat (char a, char b)
>    {
>       int tmp = a + b;
>       return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
>    }
> 
> 2. Saturating abs:
>    char sat (char a)
>    {
>       int tmp = abs (a);
>       return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
>    }
> 
> 3. Rounding shifts
>    char rndshift (char dc)
>    {
>       int round_const = 1 << (shift - 1);
>       return (dc + round_const) >> shift;
>    }
> 
> etc.
> 
> Of course the first issue is that C does not really have a single idiom for
> expressing this.
> 
> At the RTL level we have ss_truncate and us_truncate and float_truncate for
> truncation.
> 
> At the Tree level we have nothing for truncation (I believe) for scalars. For
> Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like
> nothing actually generates this at the moment. it's just an unused tree code.
> 
> For rounding there doesn't seem to be any existing infrastructure.
> 
> The proposal to handle these are as follow, keep in mind that all of these 
> also
> exist in their scalar form, as such detecting them in the vectorizer would be
> the wrong place.
> 
> 1. Rounding:
>    a) Use match.pd to rewrite various rounding idioms to shifts.
>    b) Use backwards or forward prop to rewrite these to internal functions
>       where even if the target does not support these rounding instructions 
> they
>       have a chance to provide a more efficient implementation than what would
>       be generated normally.
> 
> 2. Saturation:
>    a) Use match.pd to rewrite the various saturation expressions into min/max
>       operations which opens up the expressions to further optimizations.
>    b) Use backwards or forward prop to convert to internal functions if the
>       resulting min/max expression still meet the criteria for being a
>       saturating expression.  This follows the algorithm as outlined in "The
>       Software Vectorization handbook" by Aart J.C. Bik.
> 
>       We could get the right instructions by using combine if we don't rewrite
>       the instructions to an internal function, however then during 
> Vectorization
>       we would overestimate the cost of performing the saturation.  The 
> constants
>       will the also be loaded into registers and so becomes a lot more 
> difficult
>       to cleanup solely in the backend.
> 
> The one thing I am wondering about is whether we would need an internal 
> function
> for all operations supported, or if it should be modelled as an internal FN 
> which
> just "marks" the operation as rounding/saturating. After all, the only 
> difference
> between a normal and saturating expression in RTL is the xx_truncate RTL 
> surrounding
> the expression.  Doing so would also mean that all targets whom have 
> saturating
> instructions would automatically benefit from this.
> 
> But it does mean a small adjustment to the costing, which would need to cost 
> the
> internal function call and the argument together as a whole.
> 
> Any feedback is appreciated to minimize the number of changes required to the
> final patch.  Any objections to the outlined approach?

I think it makes sense to pattern-match the operations on GIMPLE
and follow the approach take by __builtin_add_overflow & friends.
Maybe quickly check whether clang provides some builtins already
which we could implement.

There's some appeal to mimicing what RTL does - thus have
the saturation be represented as saturating truncation.
Maybe that's what users expect of builtins as well.

I'm not sure what the rounding shift would do - 'shift' isn't
an argument to rndshift here.  It feels like it's a
rounding division but only by powers of two.  Does
ROUND_DIV_EXPR already provide the desired semantics?

Richard.

Reply via email to