(cc'ing gcc@gcc.gnu.org) On Nov 21, 2007 2:38 AM, Wouter van Gulik <[EMAIL PROTECTED]> wrote: > Also consider the fuse bit get routine. This scheme gives more knowledge > to the compiler, unfortunately gcc fails to see the loading of r31 can > done once: > > using this: > > ========================================================================= > static inline uint8_t boot_lock_fuse_bits_new(uint16_t address) > { > uint8_t result; > register uint16_t adr asm("r30") = address; //make sure it's in z > register aka r30:r31 > > asm volatile( > "sts %1, %2\n\t" > "lpm %0, Z" > : "=r" (result) > : "i" (_SFR_MEM_ADDR(__SPM_REG)), > "r" ((uint8_t)__BOOT_LOCK_BITS_SET), > "z" (adr) > : "r0" > ); > return result; > } > > uint8_t bar(void) > { > uint8_t temp; > uint16_t adr = 0; > temp = boot_lock_fuse_bits_new(adr++); > temp += boot_lock_fuse_bits_new(adr++); > temp += boot_lock_fuse_bits_new(adr++); > temp += boot_lock_fuse_bits_new(adr++); > return temp; > } > > ========================================================================= > > It gives this assembler output: > .global bar > .type bar, @function > bar: > /* prologue: frame size=0 */ > /* prologue end (size=0) */ > ldi r30,lo8(0) ; 8 *movhi/4 [length = 2] > ldi r31,hi8(0) > ldi r25,lo8(9) ; 10 *movqi/2 [length = 1] > /* #APP */ > sts 87, r25 > lpm r24, Z > /* #NOAPP */ > ldi r30,lo8(1) ; 16 *movhi/4 [length = 2] > ldi r31,hi8(1) > /* #APP */ > sts 87, r25 > lpm r30, Z > /* #NOAPP */ > add r24,r30 ; 22 addqi3/1 [length = 1] > ldi r30,lo8(2) ; 24 *movhi/4 [length = 2] > ldi r31,hi8(2) > /* #APP */ > sts 87, r25 > lpm r18, Z > /* #NOAPP */ > ldi r30,lo8(3) ; 29 *movhi/4 [length = 2] > ldi r31,hi8(3) > /* #APP */ > sts 87, r25 > lpm r25, Z > /* #NOAPP */ > add r25,r18 ; 36 addqi3/1 [length = 1] > add r24,r25 ; 37 addqi3/1 [length = 1] > clr r25 ; 45 zero_extendqihi2/1 [length = 1] > /* epilogue: frame size=0 */ > ret > /* epilogue end (size=1) */ > /* function bar size 30 (29) */ > .size bar, .-bar > > > This is not smaller nor faster but it could have been. If gcc would > leave r31, or do a adiw > I tried against 4.1.2 using -Wall -Os -mmcu=atmega16. Maybe 4.2.2 or > 4.3.0 is better? > > It does however use r30 as output which could save some speed and code > when no other register is available. > > HTH, > > Wouter
I have also noticed that a series of p = buf; *p++; *p++ *p++; get's optimized to buf[0]; buf[1]; buf[2]; which may be faster on some architectures, but loading constants is quite expensive on the AVR. I don't know a terrible lot about GCC optimisations, but I suspect it would be related to the constant pool management, to realise that we already have a 2 in the constant pool, and we can best introduce a 3 to the constant pool by incrementing 2. Cheers, Shaun