Hello,

It's time for CRLibm developpers to step in this discussion.

We confirm that CRLibm is as fast as other portable libraries, or faster, and that it keeps improving (some benchmarks below). When we are slower, it is because we wanted cleaner code or smaller tables or we tuned the code on a given processor and it turns out later that it is a bad choice on others. In any case, we can still copy the faster code. Not to say that we are the best overall ever but we are concerned with performance.

We confirm that CRLibm can be turned into a 0.503 ulp (more or less) library at the cost of a few #ifs. We might even add a flag for that in the next release. Still it is proven 0.503 ulp, the proof won't go away. You will win on x86 5-20 cycles in average (depending on the function), and much more in worst case time.

Our opinion is that reproducibility (through correct rounding and in general C99 and standard compliance) should be the default for a system
AND that flags should be able to disable it if wanted.


Now we would like to know what GCC people mean with "having a libm with GCC". Is it a one-size-fits-all libm written in C ? With GCC-controlled #ifs ? With builtins etc ?
 a library written in GIMPLE ?
 a library generator within GCC ?
etc...

What we are interested in writing at the moment is a library generator (let us call it metalibm) that can output libms for mainstream precisions and optimize for various objectives (correct rounding or not, speed, memory footprint, register usage, parallelism, whatever).

We could right now write a simple metalibm that mostly removes the repetitive work from CRLibm development. This would be mostly printfs and ifs, and it would be simple enough one may imagine that it ends up in GCC codebase.

Then, since we have a fairly mature expertise with automatic generation of optimized polynomial approximation, we could enhance it so that it would also include such optimizations, and target single, double-extended and quad. Range reduction exploration is also at hand, but more target-specific optimisations are not. But then, this metalibm begins to depend on so many libraries and has such impredictible runtime (it should even have an ATLAS-like profiling mode) that nobody would want it in the GCC code base . And it is a lot more development, of course. And it is more interesting.

So what should we start with ?

We also confirm that many functions are missing from CRLibm. This is mostly a matter of workforce. We'd like to add some of the missing ones as a demonstration of a metalibm. Some are more difficult than the others, but the only really difficult one is gamma.
We may start with asinh and acosh, is that OK?

    Christoph and Florent

Here is a sample of performance results obtained on an Opteron
( Linux 2.6.17-2-amd64
  gcc (GCC) 4.1.2 20060901 (prerelease) (Debian 4.1.1-13) )

     log     avg time max time
default libm   210    150919      Rem: 12KB of tables
 CRLibm        146       903      Rem: using double-extended arithmetic
                                   (this was an experiment)
 CRLibm        266      1459      Rem: this one using SSE2 doubles

    exp      avg time max time
default libm   141   1105468      REM: 13KB of tables
 CRLibm        184      1210

     sin     avg time max time
default libm   147   1018622
 CRLibm        171      3895

     cos     avg time max time
default libm   152    028302
 CRLibm        171      3752

     tan     avg time max time
default libm   244   1101230
 CRLibm        280      9949

    asin     avg time max time
default libm   107   1018823   Rem: 20KB of tables shard with acos
 CRLibm        316      2296

    acos     avg time max time
default libm    83   1018786
 CRLibm        264      3660

    atan     avg time max time
default libm   144    100066     Rem: 43KB of tables
 CRLibm        243      4724

   log10     avg time max time
default libm   249      1061    NOT correctly rounded
 CRLibm        321      1706

    log2     avg time max time
default libm   174       274    NOT correctly rounded
 CRLibm        320      1663



(while building this table I just noticed a bug line 27 of sysdeps/ieee754/dbl-64/sincos.tbl : 40 should be 440)




The same, on an IBM power5 server (time units are arbitrary)


     log     avg time max time
default libm    61     23471
 CRLibm         57       307   Rem: using double-precision arithmetic

    exp      avg time max time
default libm    41     25019
 CRLibm         42       242

     sin     avg time max time
default libm    37    132435
 CRLibm         43      1910

     cos     avg time max time
default libm    38    134045
 CRLibm         44      1946

     tan     avg time max time
default libm    65    141885
 CRLibm         72      4671

    asin     avg time max time
default libm    29    132912
 CRLibm         62       465

    acos     avg time max time
default libm    24    132798
 CRLibm         57       765

    atan     avg time max time
default libm    35     18785
 CRLibm         53      2315

   log10     avg time max time
default libm    65       153
 CRLibm         65       355

    log2     avg time max time
default libm    47        59
 CRLibm         65       351

Reply via email to