On Tue, 11 May 2021, Tamar Christina wrote: > Hi All, > > We are looking to implement saturation support in the compiler. The aim is to > recognize both Scalar and Vector variant of typical saturating expressions. > > As an example: > > 1. Saturating addition: > char sat (char a, char b) > { > int tmp = a + b; > return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp); > } > > 2. Saturating abs: > char sat (char a) > { > int tmp = abs (a); > return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp); > } > > 3. Rounding shifts > char rndshift (char dc) > { > int round_const = 1 << (shift - 1); > return (dc + round_const) >> shift; > } > > etc. > > Of course the first issue is that C does not really have a single idiom for > expressing this. > > At the RTL level we have ss_truncate and us_truncate and float_truncate for > truncation. > > At the Tree level we have nothing for truncation (I believe) for scalars. For > Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like > nothing actually generates this at the moment. it's just an unused tree code. > > For rounding there doesn't seem to be any existing infrastructure. > > The proposal to handle these are as follow, keep in mind that all of these > also > exist in their scalar form, as such detecting them in the vectorizer would be > the wrong place. > > 1. Rounding: > a) Use match.pd to rewrite various rounding idioms to shifts. > b) Use backwards or forward prop to rewrite these to internal functions > where even if the target does not support these rounding instructions > they > have a chance to provide a more efficient implementation than what would > be generated normally. > > 2. Saturation: > a) Use match.pd to rewrite the various saturation expressions into min/max > operations which opens up the expressions to further optimizations. > b) Use backwards or forward prop to convert to internal functions if the > resulting min/max expression still meet the criteria for being a > saturating expression. This follows the algorithm as outlined in "The > Software Vectorization handbook" by Aart J.C. Bik. > > We could get the right instructions by using combine if we don't rewrite > the instructions to an internal function, however then during > Vectorization > we would overestimate the cost of performing the saturation. The > constants > will the also be loaded into registers and so becomes a lot more > difficult > to cleanup solely in the backend. > > The one thing I am wondering about is whether we would need an internal > function > for all operations supported, or if it should be modelled as an internal FN > which > just "marks" the operation as rounding/saturating. After all, the only > difference > between a normal and saturating expression in RTL is the xx_truncate RTL > surrounding > the expression. Doing so would also mean that all targets whom have > saturating > instructions would automatically benefit from this. > > But it does mean a small adjustment to the costing, which would need to cost > the > internal function call and the argument together as a whole. > > Any feedback is appreciated to minimize the number of changes required to the > final patch. Any objections to the outlined approach?
I think it makes sense to pattern-match the operations on GIMPLE and follow the approach take by __builtin_add_overflow & friends. Maybe quickly check whether clang provides some builtins already which we could implement. There's some appeal to mimicing what RTL does - thus have the saturation be represented as saturating truncation. Maybe that's what users expect of builtins as well. I'm not sure what the rounding shift would do - 'shift' isn't an argument to rndshift here. It feels like it's a rounding division but only by powers of two. Does ROUND_DIV_EXPR already provide the desired semantics? Richard.