Reviewing the ltoa sources [0] shows that the implementation's speed is not award winning: it uses 32-bit div+mod.
There was propose [1] to speed ltoa by means of LUT but that approach tries to use lookup tables for special radices. The new proposal inlines __divmodsi4 leading to an ~35% tick gain without increasing code size: #define VAL 22 #define STR 20 #define RADIX 18 #define REST 26 #define CNT 19 DEFUN my_ltoa movw r30, STR mov r27, STR+1 mov TMP, STR cpi RADIX, 36+1 cpc RADIX+1, ZERO brsh 99f cpi RADIX, 2 brlo 99f clt cpi RADIX, 10 brne 1f bst VAL+3, 7 brtc 1f com VAL+3 com VAL+2 com VAL+1 neg VAL sbci VAL+1, -1 sbci VAL+2, -1 sbci VAL+3, -1 1: clr REST ldi CNT, 32 2: lsl VAL rol VAL+1 rol VAL+2 rol VAL+3 rol REST cp REST, RADIX brlo 3f sub REST, RADIX inc VAL 3: dec CNT brne 2b subi REST, -'0' cpi REST, '9'+1 brlo 4f subi REST, '0'-'a'+10 4: st Z+, REST sbiw VAL+2, 0 sbci VAL+1, 0 sbci VAL, 0 brne 1b brtc 99f ldi CNT, '-' st Z+, CNT 99: mov r24, TMP mov r25, r27 st Z+, ZERO jmp strrev ENDF my_ltoa Some notes on the code: * The code is not yet documented or ready to be dropped into avr-libc sources (coding style, parametrize hardware) * Speed gain 34%..39% see [1] and [2] for detail analysis * Code size is same as old implementation * Won't drag in __divmodsi4, thus code size decrease of > 100 bytes if application otherwise does not need 32-bit div * Less stack usage * Strict radix checking: original version is not strict, e.g. 0x100 + 10 will slip through * Code size can be decreased further by moving the sanity check to C like so: static inline __attribute__((always_inline)) char* ltoa (long x, char *str, int radix) { if (radix < 2 || radix > 36) { *str = '\0'; return str; } else { extern char* ltoa_asm (long, char*, unsigned char); return ltoa_asm (x, str, radix); } } Presumably almost all use cases use radix known at compile time. Thus, the compiler can optimize away the sanity check. Moreover, radix can be passed as 8-bit value: no need to pass 10 as 0x000a. More notes: * Similar improvement can be used for ultoa * Code between [u]ltoa could be shared. * There is ultoa_invert [4] that - str[] will NOT 0 terminated. - Sequence of digits is inverted. - It returns pointer to first byte after a string. All these is readily available in ltoa, too. Again, code could be shared. A similar method could be used to speed to utoa/itoa with a speedup of > 20%. But it is harder to work against code size increase, and without the help of C inlining the code increases by ~10-16 bytes. Again, code increase is no issue if the application does not need 16-bit divmod, as current i/ltoa will then drag that code. Johann -- [0] AVR-Libc: ltoa.S http://svn.savannah.nongnu.org/viewvc/trunk/avr-libc/libc/misc/ltoa.S?revision=1944&root=avr-libc&view=markup [1] avr-libc: optimization for ltoa/printf http://lists.gnu.org/archive/html/avr-libc-dev/2011-06/msg00000.html [2] Plot: http://sourceforge.net/apps/mediawiki/mobilechessboar/nfs/project/m/mo/mobilechessboar/4/48/Ltoa.png [3] Plot Description http://sourceforge.net/apps/mediawiki/mobilechessboar/index.php?title=File:Ltoa.png#Description [4] AVR-Libc: ultoa_invert.S http://svn.savannah.nongnu.org/viewvc/trunk/avr-libc/libc/stdio/ultoa_invert.S?revision=2191&root=avr-libc&view=markup _______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list