Hi, On Fri, 13 Nov 2020 at 00:03, Daniel Engel <lib...@danielengel.com> wrote: > > Hi, > > This patch adds an efficient assembly-language implementation of IEEE-754 > compliant floating point routines for Cortex M0 EABI (v6m, thumb-1). This is > the libgcc portion of a larger library originally described in 2018: > > https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html > > Since that time, I've separated the libm functions for submission to newlib. > The remaining libgcc functions in the attached patch have the following > characteristics: > > Function(s) Size (bytes) Cycles Stack > Accuracy > __clzsi2 42 23 0 > exact > __clzsi2 (OPTIMIZE_SIZE) 22 55 0 > exact > __clzdi2 8+__clzsi2 4+__clzsi2 0 > exact > > __umulsidi3 44 24 0 > exact > __mulsidi3 30+__umulsidi3 24+__umulsidi3 8 > exact > __muldi3 (__aeabi_lmul) 10+__umulsidi3 6+__umulsidi3 0 > exact > __ashldi3 (__aeabi_llsl) 22 13 0 > exact > __lshrdi3 (__aeabi_llsr) 22 13 0 > exact > __ashrdi3 (__aeabi_lasr) 22 13 0 > exact > > __aeabi_lcmp 20 13 0 > exact > __aeabi_ulcmp 16 10 0 > exact > > __udivsi3 (__aeabi_uidiv) 56 72 – 385 0 > < 1 lsb > __divsi3 (__aeabi_idiv) 38+__udivsi3 26+__udivsi3 8 > < 1 lsb > __udivdi3 (__aeabi_uldiv) 164 103 – 1394 16 > < 1 lsb > __udivdi3 (OPTIMIZE_SIZE) 142 120 – 1392 16 > < 1 lsb > __divdi3 (__aeabi_ldiv) 54+__udivdi3 36+__udivdi3 32 > < 1 lsb > > __shared_float 178 > __shared_float (OPTIMIZE_SIZE) 154 > > __addsf3 (__aeabi_fadd) 116+__shared_float 31 – 76 8 > <= 0.5 ulp > __addsf3 (OPTIMIZE_SIZE) 112+__shared_float 74 8 > <= 0.5 ulp > __subsf3 (__aeabi_fsub) 8+__addsf3 6+__addsf3 8 > <= 0.5 ulp > __aeabi_frsub 8+__addsf3 6+__addsf3 8 > <= 0.5 ulp > __mulsf3 (__aeabi_fmul) 112+__shared_float 73 – 97 8 > <= 0.5 ulp > __mulsf3 (OPTIMIZE_SIZE) 96+__shared_float 93 8 > <= 0.5 ulp > __divsf3 (__aeabi_fdiv) 132+__shared_float 83 – 361 8 > <= 0.5 ulp > __divsf3 (OPTIMIZE_SIZE) 120+__shared_float 263 – 359 8 > <= 0.5 ulp > > __cmpsf2/__lesf2/__ltsf2 72 33 0 > exact > __eqsf2/__nesf2 4+__cmpsf2 3+__cmpsf2 0 > exact > __gesf2/__gesf2 4+__cmpsf2 3+__cmpsf2 0 > exact > __unordsf2 (__aeabi_fcmpun) 4+__cmpsf2 3+__cmpsf2 0 > exact > __aeabi_fcmpeq 4+__cmpsf2 3+__cmpsf2 0 > exact > __aeabi_fcmpne 4+__cmpsf2 3+__cmpsf2 0 > exact > __aeabi_fcmplt 4+__cmpsf2 3+__cmpsf2 0 > exact > __aeabi_fcmple 4+__cmpsf2 3+__cmpsf2 0 > exact > __aeabi_fcmpge 4+__cmpsf2 3+__cmpsf2 0 > exact > > __floatundisf (__aeabi_ul2f) 14+__shared_float 40 – 81 8 > <= 0.5 ulp > __floatundisf (OPTIMIZE_SIZE) 14+__shared_float 40 – 237 8 > <= 0.5 ulp > __floatunsisf (__aeabi_ui2f) 0+__floatundisf 1+__floatundisf 8 > <= 0.5 ulp > __floatdisf (__aeabi_l2f) 14+__floatundisf 7+__floatundisf 8 > <= 0.5 ulp > __floatsisf (__aeabi_i2f) 0+__floatdisf 1+__floatdisf 8 > <= 0.5 ulp > > __fixsfdi (__aeabi_f2lz) 74 27 – 33 0 > exact > __fixunssfdi (__aeabi_f2ulz) 4+__fixsfdi 3+__fixsfdi 0 > exact > __fixsfsi (__aeabi_f2iz) 52 19 0 > exact > __fixsfsi (OPTIMIZE_SIZE) 4+__fixsfdi 3+__fixsfdi 0 > exact > __fixunssfsi (__aeabi_f2uiz) 4+__fixsfsi 3+__fixsfsi 0 > exact > > __extendsfdf2 (__aeabi_f2d) 42+__shared_float 38 8 > exact > __aeabi_d2f 56+__shared_float 54 – 58 8 <= > 0.5 ulp > __aeabi_h2f 34+__shared_float 34 8 > exact > __aeabi_f2h 84 23 – 34 0 > <= 0.5 ulp > > Copyright assignment is on file with the FSF. > > I've built the gcc-arm-none-eabi cross-compiler using the 20201108 snapshot > of GCC plus this patch, and successfully compiled a test program: > > extern int main (void) > { > volatile int x = 1; > volatile unsigned long long int y = 10; > volatile long long int z = x / y; // 64-bit division > > volatile float a = x; // 32-bit casting > volatile float b = y; // 64 bit casting > volatile float c = z / b; // float division > volatile float d = a + c; // float addition > volatile float e = c * b; // float multiplication > volatile float f = d - e - c; // float subtraction > > if (f != c) // float comparison > y -= (long long int)d; // float casting > } > > As one point of comparison, the test program links to 876 bytes of libgcc > code from the patched toolchain, vs 10276 bytes from the latest released > gcc-arm-none-eabi-9-2020-q2 toolchain. That's a 90% size reduction.
This looks awesome! > > I have extensive test vectors, and have passed these tests on an STM32F051. > These vectors were derived from UCB [1], Testfloat [2], and IEEECC754 [3] > sources, plus some of my own creation. Unfortunately, I'm not sure how "make > check" should work for a cross compiler run time library. > > Although I believe this patch can be incorporated as-is, there are at least > two points that might bear discussion: > > * I'm not sure where or how they would be integrated, but I would be happy to > provide sources for my test vectors. > > * The library is currently built for the ARM v6m architecture only. It is > likely that some of the other Cortex variants would benefit from these > routines. However, I would need some guidance on this to proceed without > introducing regressions. I do not currently have a test strategy for > architectures beyond Cortex M0, and I have NOT profiled the existing thumb-2 > implementations (ieee754-sf.S) for comparison. I tried your patch, and I see many regressions in the GCC testsuite because many tests fail to link with errors like: ld: /gcc/thumb/v6-m/nofp/libgcc.a(_arm_cmpdf2.o): in function `__clzdi2': /libgcc/config/arm/cm0/clz2.S:39: multiple definition of `__clzdi2';/gcc/thumb/v6-m/nofp/libgcc.a(_thumb1_case_sqi.o):/libgcc/config/arm/cm0/clz2.S:39: first defined here This happens with a toolchain configured with --target arm-none-eabi, default cpu/fpu/mode, --enable-multilib --with-multilib-list=rmprofile and running the tests with -mthumb/-mcpu=cortex-m0/-mfloat-abi=soft/-march=armv6s-m Does it work for you? Thanks, Christophe > > I'm naturally hoping for some action on this patch before the Nov 16th > deadline for GCC-11 stage 3. Please review and advise. > > Thanks, > Daniel Engel > > [1] http://www.netlib.org/fp/ucbtest.tgz > [2] http://www.jhauser.us/arithmetic/TestFloat.html > [3] http://win-www.uia.ac.be/u/cant/ieeecc754.html