On Tue, Nov 6, 2018, 10:32 PM Daniel Engel <lib...@danielengel.com wrote:
> Hi, > > Over the past couple of years, I have hand-assembled a new floating point > library for the ARM Cortex M0 architecture. I know the M0 is not generally > regarded as a number-crunching machine, but I felt it deserved at least > some of the attention that has previously been bestowed on the AVR > architecture. As this work has been incidental to my employer's line of > business, they have tentatively agreed to assign the copyright and > facilitate a release of this library as open source. > > I have efficient implementations of all of the integer and > single-precision AEABI functions: > > * clzsi2, clzdi2, umulsidi3, mulsidi3, muldi3 (aeabi_lmul) > * ashldi3 (aeabi_llsl), lshrdi3 (aeabi_llsr), ashrdi3 (aeabi_lasr) > * aeabi_lcmp, aeabi_ulcmp > * udivsi3 (aeabi_uidivmod), divsi3 (aeabi_idivmod), udivdi3 > _aeabi_uldivmod), divdi3 (aeabi_ldivmod) > * addsf3 (aeabi_fadd), subsf3 (aeabi_fsub, aeabi_frsub), mulsf3 > (aeabi_fmul), divsf3 (aeabi_fdiv), fdimf > * cmpsf2 (aeabi_fcmpun), eqsf2 (aeabi_fcmpeq), nesf2 (aeabi_fcmpne), > gesf2 (aeabi_fcmpge), gtsf2, unordsf2 > * floatundisf (aeabi_ul2f),floatunsisf (aeabi_ui2f),floatdisf > (aeabi_l2f),floatsisf (aeabi_i2f) > * fixsfdi (aeabi_f2lz), fixunssfdi (aeabi_f2ulz), fixsfsi (aeabi_f2iz), > fixunssfsi (aeabi_f2uiz) > * aeabi_f2d, aeabi_d2f, aeabi_h2f, aeabi_f2h > > I also have efficient implementations of several of the simpler libm > functions: > > * frexpf, ldexpf, scalbnf > * fmaxf, fminf > * rintf, lrintf, ulrintf, llrintf, ullrintf, roundf, lroundf, ulroundf, > llroundf, ullroundf > * truncf, ceilf, floorf > * fpclassifyf, isnormalf, isnanf, isinff, isfinitef, isposf, isnegf > * ilogbf, logbf, modff > * sqrtf, cbrtf > * log2f, logf, log10f, log1p2f, log1pf, log1p10f, logXf, log1pXf > * sinf, cosf, sincosf, sinpif, cospif, sincospif > * tanf, cotf, tanpif, cotpif > > Presently, the library comprises about 40 files with about 8000 lines of > asm (unified syntax). The test vectors weigh significantly more. All of > the floating point functions are IEEE754 compliant. I can provide more > complete performance statistics on request, but here are a few highlights: > > * Small: Less than 3kb for everything above. Only 450 bytes for basic > addsf3, subsf3, mulsf3, divsf3, and cmpsf2. > * Fast: addsf3 = 75 instruction cycles, subsf3 = 80, mulsf3 = 95, divsf3 = > 260 to 360, cmpsf2 = 35. > * Correct: Simultaneous calculation of sincosf() in less than 500 > instruction cycles, accurate within +/- 1 ulp, including arbitrarily large > values of 'x'. > * Bonus: round10iff(x, n) (a non-standard function) correctly rounds > floating point values 'x' to an integer power of 10 'n'; this function > simulates conversion to a decimal string, truncation, and conversion back > to binary32 without any string-handling overhead. > This sounds like a nice body of work. Congratukations. Does paranoia pass? > > To date, I have only built this library as part of a user space embedded > application. I have not attempted to build or patch the GCC toolchain > itself. If accepted, I suspect there will be at least a little work to > restructure it for inclusion with libgcc. But, before proceeding with that > work, I need to have some idea of direction and goal. > > The first question, then, is what might the best home for this library > be? Many of the lower level functions (e.f. clzsi2, addsf3) replace the > generic implementations of libgcc. However, the higher level functions > (e.g. ldexpf, sincosf) traditionally link from libm, which I don't believe > is typically distributed with gcc. The compact nature of this library of > course follows from a tight integration between higher and lower level > functions. I have considered a few strategies: > > * Add everything into the base libgcc, > * Add everything into libm (newlib?) and rely on link order to supersede > libgcc, > This will almost certainly break at some point, for someone, and be hard to even figure out it happened because the code will work but just be bigger or slower. * Split the implementation with some magic to ensure that libm functions > only link in the presence of the correct libgcc, > I think this is the proper solution. It just puts better implementations in the place the infrastructure already supports having a target specific option. * Establish an independent library specific to the Cortex M0 architecture, > or > This is likely to get you the smallest number of users. People have to find it and then integrate it on their own. Don't make it hard for folks to find and use your work. * Something else entirely... > > If there is any interest in incorporating this work into GCC, please > advise. > I think so but I am just one voice from the RTEMS community. But I think any M0 user would be pleased. --joel > > Thanks, > Daniel Engel >