Hi, 

Over the past couple of years, I have hand-assembled a new floating point 
library for the ARM Cortex M0 architecture.  I know the M0 is not generally 
regarded as a number-crunching machine, but I felt it deserved at least some of 
the attention that has previously been bestowed on the AVR architecture.  As 
this work has been incidental to my employer's line of business, they have 
tentatively agreed to assign the copyright and facilitate a release of this 
library as open source.  

I have efficient implementations of all of the integer and single-precision 
AEABI functions:

*  clzsi2, clzdi2, umulsidi3, mulsidi3, muldi3 (aeabi_lmul)
*  ashldi3 (aeabi_llsl), lshrdi3 (aeabi_llsr), ashrdi3 (aeabi_lasr)
*  aeabi_lcmp, aeabi_ulcmp
*  udivsi3 (aeabi_uidivmod), divsi3 (aeabi_idivmod), udivdi3 _aeabi_uldivmod), 
divdi3 (aeabi_ldivmod)
*  addsf3 (aeabi_fadd), subsf3 (aeabi_fsub, aeabi_frsub), mulsf3 (aeabi_fmul), 
divsf3 (aeabi_fdiv), fdimf
*  cmpsf2 (aeabi_fcmpun), eqsf2 (aeabi_fcmpeq), nesf2 (aeabi_fcmpne), gesf2 
(aeabi_fcmpge), gtsf2, unordsf2
*  floatundisf (aeabi_ul2f),floatunsisf (aeabi_ui2f),floatdisf 
(aeabi_l2f),floatsisf (aeabi_i2f)
*  fixsfdi (aeabi_f2lz), fixunssfdi (aeabi_f2ulz), fixsfsi (aeabi_f2iz), 
fixunssfsi (aeabi_f2uiz)
*  aeabi_f2d, aeabi_d2f, aeabi_h2f, aeabi_f2h

I also have efficient implementations of several of the simpler libm functions:

*  frexpf, ldexpf, scalbnf
*  fmaxf, fminf
*  rintf, lrintf, ulrintf, llrintf, ullrintf, roundf, lroundf, ulroundf, 
llroundf, ullroundf
*  truncf, ceilf, floorf
*  fpclassifyf, isnormalf, isnanf, isinff, isfinitef, isposf, isnegf
*  ilogbf, logbf, modff
*  sqrtf, cbrtf
*  log2f, logf, log10f, log1p2f, log1pf, log1p10f, logXf, log1pXf
*  sinf, cosf, sincosf, sinpif, cospif, sincospif
*  tanf, cotf, tanpif, cotpif

Presently, the library comprises about 40 files with about 8000 lines of asm 
(unified syntax).  The test vectors weigh significantly more.  All of the 
floating point functions are IEEE754 compliant.  I can provide more complete 
performance statistics on request, but here are a few highlights: 

* Small: Less than 3kb for everything above.  Only 450 bytes for basic addsf3, 
subsf3, mulsf3, divsf3, and cmpsf2.
* Fast: addsf3 = 75 instruction cycles, subsf3 = 80, mulsf3 = 95, divsf3 = 260 
to 360, cmpsf2 = 35.
* Correct: Simultaneous calculation of sincosf() in less than 500 instruction 
cycles, accurate within +/- 1 ulp, including arbitrarily large values of 'x'.
* Bonus: round10iff(x, n) (a non-standard function) correctly rounds floating 
point values 'x' to an integer power of 10 'n'; this function simulates 
conversion to a decimal string, truncation, and conversion back to binary32 
without any string-handling overhead.

To date, I have only built this library as part of a user space embedded 
application.  I have not attempted to build or patch the GCC toolchain itself.  
If accepted, I suspect there will be at least a little work to restructure it 
for inclusion with libgcc.  But, before proceeding with that work, I need to 
have some idea of direction and goal.  

The first question, then, is what might the best home for this library be?  
Many of the lower level functions (e.f. clzsi2, addsf3) replace the generic 
implementations of libgcc.  However, the higher level functions (e.g. ldexpf, 
sincosf) traditionally link from libm, which I don't believe is typically 
distributed with gcc.  The compact nature of this library of course follows 
from a tight integration between higher and lower level functions.  I have 
considered a few strategies: 

* Add everything into the base libgcc, 
* Add everything into libm (newlib?) and rely on link order to supersede 
libgcc, 
* Split the implementation with some magic to ensure that libm functions only 
link in the presence of the correct libgcc,
* Establish an independent library specific to the Cortex M0 architecture, or
* Something else entirely...

If there is any interest in incorporating this work into GCC, please advise.  

Thanks,
Daniel Engel

Reply via email to