Hi George, > (Am I annoying everyone by using this mailing list as my personal > coding blog?) >
No. But I speak for myself. Some of us are interested and reading but don't have time to comment or to work through the code. Your efforts are appreciated. > After considerable rearranging (and fixing one nasty logic bug in > the first algorithm posted), I have octal converison working to my > satisfaction. > > The logic bug was that I assumed I'd need at most one byte of zero-padding > to print a number. But I was checking for termination before printing > a digit. That ended up not working with 1-byte octal numbers where the > top digit is non-zero. By the time I was ready to print the fourth digit > (when the termination check would fire), the lsbyte wanted to hold bits > 9..16, and that meant loading a *second* byte (bits 16..23). > > So I changed to checking for termination *after* printing a digit, > which I knew would save time, but I unexpectedly found additional > space savings, too. > > Not counting preamble code shared with decimal printing (all the > stuff before the label "3:"), it's down to 29 instructions. Still > a bit more than 20, but I'm satisfied. > > It's even slightly faster than the previous code: > > Bits Old New > 0 56 42 > 8 144 113 > 16 276 232 > 24 364 314 > 32 496 430 > 40 546 > 48 628 > 56 744 > 64 860 > > > /* Arguments */ > #define out X /* Arrives in r24:r25, but we move it immediately */ > #define out_lo r26 > #define out_hi r27 > #define bin Z /* Arrives in r22:r23, but we move it immediately */ > #define bin_lo r30 > #define bin_hi r31 > #define len r20 > #define flags r18 /* Mask, after removing two lsbits */ > > /* Local variables */ > #define msb r25 /* Overlaps input */ > #define lsb r24 /* Overlaps input */ > #define digit r23 /* Overlaps input */ > #define delta r22 /* Overlaps input */ > #define tmask r21 > // len = r20 > #define k r19 > // flags = r18 > > .text > .global binprint > .type binprint, @function > binprint: > movw out_lo, r24 > movw bin_lo, r22 > #if 1 > add bin_lo, len > adc bin_hi, zero > #else > mov tmask, len > ; Conditional negate using the standard identity -x = ~x + 1. > ; Given mask of -1 or 0, (x ^ mask) - mask returns -x or x. > ; However, we would need the carry bit clear to start this, and > ; forming "mask" from the carry bit in one instruction preserves > ; the carry bit. So instead add zero with carry. > lsr flags ; Lsbit is negate flag > sbc k, k ; Set to 0 or -1, carry preserved > 1: > ld __tmp_reg__, bin > eor __tmp_reg__, k > adc __tmp_reg__, __zero_reg__ > st bin+, __tmp_reg__ > dec tmask > brne 1b > #endif > ; Strip trailing (most-significant) zeros from bin */ > 2: dec len > breq 3f ; If we've reached the end, stop > ld __tmp_reg__, -bin > or __tmp_reg__, __tmp_reg__ > breq 2b ; Continue as long as bytes are zero > > 3: movw bin_lo, r22 ; Reset bin to lsbyte > ; Len is now pre-decremented > > ; Done with args in r22-r25; now allowed to use delta, digit, lsb, msb > ldi delta, 'A'-'0'-10 > lsr flags > brcc 4f > ldi delta, 'a'-'0'-10 > 4: ldi msb, 1 > ld lsb, bin+ > > .L_digit_out: ; Spit out a digit > mov digit, lsb > and digit, flags > cpi digit, 10 > brcs 5f > add digit, delta ; Hex digit > 9 > 5: subi digit, -'0' > st X+, digit > ; Check for done: is len:lsb < 0:flags? > cp flags, lsb > cpc __zero_reg__, len > brcc .L_epilogue ; if (!lsb && !len) return X > mov tmask, flags > .L_bitloop: > lsr msb > brne 7f ; if ((msb >>= 1) == 0) get another byte > ; Fetch another byte > or len, len ; Preserves carry > breq 6f > dec len ; Preserves carry > ld msb, Z+ > 6: ror msb ; Shift carry=1 into msbit > 7: ror lsb > lsr tmask > brne .L_bitloop ; if ((tmask >>= 1)== 0) { > rjmp .L_digit_out > .size binprint, .-binprint > > _______________________________________________ > AVR-libc-dev mailing list > AVR-libc-dev@nongnu.org > https://lists.nongnu.org/mailman/listinfo/avr-libc-dev > _______________________________________________ AVR-libc-dev mailing list AVR-libc-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-libc-dev