Hello,
It's time for CRLibm developpers to step in this discussion.
We confirm that CRLibm is as fast as other portable libraries, or
faster, and that it keeps improving (some benchmarks below). When we are
slower, it is because we wanted cleaner code or smaller tables or we
tuned the code on a given processor and it turns out later that it is a
bad choice on others. In any case, we can still copy the faster code.
Not to say that we are the best overall ever but we are concerned with
performance.
We confirm that CRLibm can be turned into a 0.503 ulp (more or less)
library at the cost of a few #ifs. We might even add a flag for that in
the next release. Still it is proven 0.503 ulp, the proof won't go away.
You will win on x86 5-20 cycles in average (depending on the function),
and much more in worst case time.
Our opinion is that reproducibility (through correct rounding and in
general C99 and standard compliance) should be the default for a system
AND that flags should be able to disable it if wanted.
Now we would like to know what GCC people mean with "having a libm with
GCC". Is it
a one-size-fits-all libm written in C ? With GCC-controlled #ifs ?
With builtins etc ?
a library written in GIMPLE ?
a library generator within GCC ?
etc...
What we are interested in writing at the moment is a library generator
(let us call it metalibm) that can output libms for mainstream
precisions and optimize for various objectives (correct rounding or not,
speed, memory footprint, register usage, parallelism, whatever).
We could right now write a simple metalibm that mostly removes the
repetitive work from CRLibm development. This would be mostly printfs
and ifs, and it would be simple enough one may imagine that it ends up
in GCC codebase.
Then, since we have a fairly mature expertise with automatic generation
of optimized polynomial approximation, we could enhance it so that it
would also include such optimizations, and target single,
double-extended and quad. Range reduction exploration is also at hand,
but more target-specific optimisations are not. But then, this metalibm
begins to depend on so many libraries and has such impredictible runtime
(it should even have an ATLAS-like profiling mode) that nobody would
want it in the GCC code base . And it is a lot more development, of
course. And it is more interesting.
So what should we start with ?
We also confirm that many functions are missing from CRLibm. This is
mostly a matter of workforce. We'd like to add some of the missing ones
as a demonstration of a metalibm. Some are more difficult than the
others, but the only really difficult one is gamma.
We may start with asinh and acosh, is that OK?
Christoph and Florent
Here is a sample of performance results obtained on an Opteron
( Linux 2.6.17-2-amd64
gcc (GCC) 4.1.2 20060901 (prerelease) (Debian 4.1.1-13) )
log avg time max time
default libm 210 150919 Rem: 12KB of tables
CRLibm 146 903 Rem: using double-extended arithmetic
(this was an experiment)
CRLibm 266 1459 Rem: this one using SSE2 doubles
exp avg time max time
default libm 141 1105468 REM: 13KB of tables
CRLibm 184 1210
sin avg time max time
default libm 147 1018622
CRLibm 171 3895
cos avg time max time
default libm 152 028302
CRLibm 171 3752
tan avg time max time
default libm 244 1101230
CRLibm 280 9949
asin avg time max time
default libm 107 1018823 Rem: 20KB of tables shard with acos
CRLibm 316 2296
acos avg time max time
default libm 83 1018786
CRLibm 264 3660
atan avg time max time
default libm 144 100066 Rem: 43KB of tables
CRLibm 243 4724
log10 avg time max time
default libm 249 1061 NOT correctly rounded
CRLibm 321 1706
log2 avg time max time
default libm 174 274 NOT correctly rounded
CRLibm 320 1663
(while building this table I just noticed a bug line 27 of
sysdeps/ieee754/dbl-64/sincos.tbl : 40 should be 440)
The same, on an IBM power5 server (time units are arbitrary)
log avg time max time
default libm 61 23471
CRLibm 57 307 Rem: using double-precision arithmetic
exp avg time max time
default libm 41 25019
CRLibm 42 242
sin avg time max time
default libm 37 132435
CRLibm 43 1910
cos avg time max time
default libm 38 134045
CRLibm 44 1946
tan avg time max time
default libm 65 141885
CRLibm 72 4671
asin avg time max time
default libm 29 132912
CRLibm 62 465
acos avg time max time
default libm 24 132798
CRLibm 57 765
atan avg time max time
default libm 35 18785
CRLibm 53 2315
log10 avg time max time
default libm 65 153
CRLibm 65 355
log2 avg time max time
default libm 47 59
CRLibm 65 351