On Wed, 12 May 2021, Richard Sandiford wrote: > Tamar Christina <tamar.christ...@arm.com> writes: > > Hi All, > > > > We are looking to implement saturation support in the compiler. The aim is > > to > > recognize both Scalar and Vector variant of typical saturating expressions. > > > > As an example: > > > > 1. Saturating addition: > > char sat (char a, char b) > > { > > int tmp = a + b; > > return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp); > > } > > > > 2. Saturating abs: > > char sat (char a) > > { > > int tmp = abs (a); > > return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp); > > } > > > > 3. Rounding shifts > > char rndshift (char dc) > > { > > int round_const = 1 << (shift - 1); > > return (dc + round_const) >> shift; > > } > > > > etc. > > > > Of course the first issue is that C does not really have a single idiom for > > expressing this. > > > > At the RTL level we have ss_truncate and us_truncate and float_truncate for > > truncation. > > > > At the Tree level we have nothing for truncation (I believe) for scalars. > > For > > Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like > > nothing actually generates this at the moment. it's just an unused tree > > code. > > > > For rounding there doesn't seem to be any existing infrastructure. > > > > The proposal to handle these are as follow, keep in mind that all of these > > also > > exist in their scalar form, as such detecting them in the vectorizer would > > be > > the wrong place. > > > > 1. Rounding: > > a) Use match.pd to rewrite various rounding idioms to shifts. > > b) Use backwards or forward prop to rewrite these to internal functions > > where even if the target does not support these rounding instructions > > they > > have a chance to provide a more efficient implementation than what > > would > > be generated normally. > > > > 2. Saturation: > > a) Use match.pd to rewrite the various saturation expressions into > > min/max > > operations which opens up the expressions to further optimizations. > > b) Use backwards or forward prop to convert to internal functions if the > > resulting min/max expression still meet the criteria for being a > > saturating expression. This follows the algorithm as outlined in "The > > Software Vectorization handbook" by Aart J.C. Bik. > > > > We could get the right instructions by using combine if we don't > > rewrite > > the instructions to an internal function, however then during > > Vectorization > > we would overestimate the cost of performing the saturation. The > > constants > > will the also be loaded into registers and so becomes a lot more > > difficult > > to cleanup solely in the backend. > > > > The one thing I am wondering about is whether we would need an internal > > function > > for all operations supported, or if it should be modelled as an internal FN > > which > > just "marks" the operation as rounding/saturating. After all, the only > > difference > > between a normal and saturating expression in RTL is the xx_truncate RTL > > surrounding > > the expression. Doing so would also mean that all targets whom have > > saturating > > instructions would automatically benefit from this. > > I might have misunderstood what you meant here, but the *_truncate > RTL codes are true truncations: the operand has to be wider than the > result. Using this representation for general arithmetic is a problem > if you're operating at the maximum size that the target supports natively. > E.g. representing a 64-bit saturating addition as: > > - extend to 128 bits > - do a 128-bit addition > - truncate to 64 bits > > is going to be hard to cost and code-generate on targets that don't support > native 128-bit operations (or at least, don't support them cheaply). > This might not be a problem when recognising C idioms, since the C source > code has to be able do the wider operation before truncating the result, > but it could be a problem if we provide built-in functions or if we want > to introduce compiler-generated saturating operations. > > RTL already has per-operation saturation such as ss_plus/us_plus, > ss_minus/us_minus, ss_neg/us_neg, ss_mult/us_mult, ss_div, > ss_ashift/us_ashift and ss_abs. I think we should do the same > in gimple, using internal functions like you say.
I think that for followup optimizations using regular arithmetic ops and just new saturating truncations is better. Maybe we can also do both, with first only matching the actual saturation with a new tree code and then later match the optabs the target actually supports (in ISEL for example)? Truly saturating ops might provide an interesting example how to deal with -ftrapv - one might think we can now simply use the trapping optabs as internal functions to reflect -ftrapv onto the IL ... Richard. > Thanks, > Richard > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)