On 3/18/24 3:09 AM, Jivan Hakobyan wrote:
As RV has round instructions it is reasonable to use them instead of
calling the library functions.
With my patch for the following C code:
double foo(double a) {
return ceil(a);
}
GCC generates the following ASM code (before it was tail call)
foo:
fabs.d fa4,fa0
lui a5,%hi(.LC0)
fld fa3,%lo(.LC0)(a5)
flt.d a5,fa4,fa3
beq a5,zero,.L3
fcvt.l.d a5,fa0,rup
fcvt.d.l fa4,a5
fsgnj.d fa0,fa4,fa0
.L3:
ret
.LC0:
.word 0
.word 1127219200 // 0x4330000000000000
The patch I have evaluated on SPEC2017.
Counted dynamic instructions counts and got the following improvements
510.parest_r 262 m -
511.povray_r 2.1 b 0.04%
521.wrt_r 269 m -
526.blender_r 3 b 0.1%
527.cam4_r 15 b 0.6%
538.imagick_r 365 b 7.6%
Overall executed 385 billion fewer instructions which is 0.5%.
A few more notes.
The sequence Jivan is using is derived from LLVM. The condition in the
generated code tests for those values were are supposed to pass through
unaltered. The condition in the pattern ensures we do something
sensible WRT FE_INEXACT and mirrors how other ports handle these insns.
Our internal testing shows a benefit well beyond the 7% reduction in
icounts. Presumably due to fewer calls, fewer transfers across the
register files, better scheduling around the call site, etc.
Obviously for Zfa we'll use the more efficient instructions for that
extension. But there's no reason to not go forward with this change for
gcc-15.
Jeff