Hello, I'd like to reimplement __builtin_avr_delay_cycles() function in inline assembly. The reason is that __builtin_avr_delay_cycles() has too-early operand checking, so for example
static __attribute__((__always_inline__)) my_delay(int cycles) { __builtin_avr_delay_cycles(cycles); } Will still complain that __builtin_avr_delay_cycles(cycles) is not constant even for cases like my_delay(10). This could be resolved with having __delay_cycles() as normal function, so I did following for starters: ALWAYS_INLINE void __delay_cycles2(long delay) { uint16_t d = delay >> 2; asm volatile( "1: \n" "sbiw %0, 1 \n" "brne 1b \n" : : "w" (d) ); } The problem is that gcc doesn't know that "w" reg value is trashed after this assembly code executes, so __delay_cycles2(100000); __delay_cycles2(100000); leads to: 9c: 88 ea ldi r24, 0xA8 ; 168 9e: 91 e6 ldi r25, 0x61 ; 97 a2: 01 97 sbiw r24, 0x01 ; 1 a4: f1 f7 brne .-4 ; 0xa2 <main+0x8> a8: 01 97 sbiw r24, 0x01 ; 1 aa: f1 f7 brne .-4 ; 0xa8 <main+0xe> (I actually have a bit more complicated code than just 2 __delay_cycles2() calls in row, unrelated asm is not shown above). Well, what needs to do is to add clobber constraint. But how to do that? Having ': : "w" (d) : "r25", "r26"' just makes gcc use r26/r27 with the same effect. Trying to use matching constraint ': : "w" (d) : "0"' seems to be just ignored, leading to the same code as above. After some looking, I found am example workaround at http://www.nongnu.org/avr-libc/user-manual/inline_asm.html ("void delay(uint8_t ms)" in there). So, using ": "=&w" (d) : "0" (d)" at least doesn't produce broken code. But that really looks like a workaround - the code above *does not* produce any result, so telling compiler it should store "result" back into variable looks ugly and leaves only to pray for good live scope tracking (gcc to see that "result" is not used anywhere and not try to do stores). But I wonder if gcc does its job well. The code above actually compiles very optimally: 9a: 28 ea ldi r18, 0xA8 ; 168 9c: 31 e6 ldi r19, 0x61 ; 97 a0: c9 01 movw r24, r18 a2: 01 97 sbiw r24, 0x01 ; 1 a4: f1 f7 brne .-4 ; 0xa2 <main+0xa> a8: c9 01 movw r24, r18 aa: 01 97 sbiw r24, 0x01 ; 1 ac: f1 f7 brne .-4 ; 0xaa <main+0x12> So, gcc sees "common subexpression" and caches in another reg pair. However, using different vals: __delay_cycles2(100000); __delay_cycles2(100004); leads to: 9a: 48 ea ldi r20, 0xA8 ; 168 9c: 51 e6 ldi r21, 0x61 ; 97 9e: 29 ea ldi r18, 0xA9 ; 169 a0: 31 e6 ldi r19, 0x61 ; 97 a4: ca 01 movw r24, r20 a6: 01 97 sbiw r24, 0x01 ; 1 a8: f1 f7 brne .-4 ; 0xa6 <main+0xe> ac: c9 01 movw r24, r18 ae: 01 97 sbiw r24, 0x01 ; 1 b0: f1 f7 brne .-4 ; 0xae <main+0x16> That doesn't look optimal at all - I'd expect compiler to load values directly into r24 just before usage. So, I wonder if "=&w" plays role in this, and if there's a better way to do it (like, exactly specify that input reg is clobbered)? Oh, and btw, my initial attempt was at all with using "ldi r24, $0" with "M" constraint, but I hit the same issue as with __builtin_avr_delay_cycles() - "M" constrain apparently expects literal integer value, short-mindedly ignoring symbols which may be just "const int". (For comparison, "load immediate" approach works well with mspgcc: https://github.com/pfalcon/PeripheralTemplateLibrary/blob/master/include/delay_static_msp430.hpp#L73 ) Thanks, Paul mailto:pmis...@gmail.com _______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list