Hi All, We are looking to implement saturation support in the compiler. The aim is to recognize both Scalar and Vector variant of typical saturating expressions.
As an example: 1. Saturating addition: char sat (char a, char b) { int tmp = a + b; return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp); } 2. Saturating abs: char sat (char a) { int tmp = abs (a); return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp); } 3. Rounding shifts char rndshift (char dc) { int round_const = 1 << (shift - 1); return (dc + round_const) >> shift; } etc. Of course the first issue is that C does not really have a single idiom for expressing this. At the RTL level we have ss_truncate and us_truncate and float_truncate for truncation. At the Tree level we have nothing for truncation (I believe) for scalars. For Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like nothing actually generates this at the moment. it's just an unused tree code. For rounding there doesn't seem to be any existing infrastructure. The proposal to handle these are as follow, keep in mind that all of these also exist in their scalar form, as such detecting them in the vectorizer would be the wrong place. 1. Rounding: a) Use match.pd to rewrite various rounding idioms to shifts. b) Use backwards or forward prop to rewrite these to internal functions where even if the target does not support these rounding instructions they have a chance to provide a more efficient implementation than what would be generated normally. 2. Saturation: a) Use match.pd to rewrite the various saturation expressions into min/max operations which opens up the expressions to further optimizations. b) Use backwards or forward prop to convert to internal functions if the resulting min/max expression still meet the criteria for being a saturating expression. This follows the algorithm as outlined in "The Software Vectorization handbook" by Aart J.C. Bik. We could get the right instructions by using combine if we don't rewrite the instructions to an internal function, however then during Vectorization we would overestimate the cost of performing the saturation. The constants will the also be loaded into registers and so becomes a lot more difficult to cleanup solely in the backend. The one thing I am wondering about is whether we would need an internal function for all operations supported, or if it should be modelled as an internal FN which just "marks" the operation as rounding/saturating. After all, the only difference between a normal and saturating expression in RTL is the xx_truncate RTL surrounding the expression. Doing so would also mean that all targets whom have saturating instructions would automatically benefit from this. But it does mean a small adjustment to the costing, which would need to cost the internal function call and the argument together as a whole. Any feedback is appreciated to minimize the number of changes required to the final patch. Any objections to the outlined approach? Thanks, Tamar