On 3/18/24 3:09 AM, Jivan Hakobyan wrote:
As RV has round instructions it is reasonable to use them instead of calling the library functions.

With my patch for the following C code:
double foo(double a) {
     return ceil(a);
}

GCC generates the following ASM code (before it was tail call)
foo:
         fabs.d  fa4,fa0
         lui     a5,%hi(.LC0)
         fld     fa3,%lo(.LC0)(a5)
         flt.d   a5,fa4,fa3
         beq     a5,zero,.L3
         fcvt.l.d a5,fa0,rup
         fcvt.d.l        fa4,a5
         fsgnj.d fa0,fa4,fa0
.L3:
         ret

.LC0:
         .word   0
         .word   1127219200     // 0x4330000000000000


The patch I have evaluated on SPEC2017.
Counted dynamic instructions counts and got the following improvements

510.parest_r       262 m      -
511.povray_r      2.1  b        0.04%
521.wrt_r            269 m       -
526.blender_r    3 b             0.1%
527.cam4_r       15 b           0.6%
538.imagick_r    365 b         7.6%

Overall executed 385 billion fewer instructions which is 0.5%.
A few more notes.

The sequence Jivan is using is derived from LLVM. The condition in the generated code tests for those values were are supposed to pass through unaltered. The condition in the pattern ensures we do something sensible WRT FE_INEXACT and mirrors how other ports handle these insns.

Our internal testing shows a benefit well beyond the 7% reduction in icounts. Presumably due to fewer calls, fewer transfers across the register files, better scheduling around the call site, etc.

Obviously for Zfa we'll use the more efficient instructions for that extension. But there's no reason to not go forward with this change for gcc-15.


Jeff

Reply via email to