When the compiler needs to allocate stack space for a function, it uses the following assembly fragment (commented by me):
in r28,0x3d ; get stack pointer high in r29,0x3e ; get stack pointer low sbiw r28,N ; decrement value by N in r0,0x3f ; get status register cli ; disable interrupts out 0x3e,r29 ; write new stack pointer high out 0x3f,r0 ; enable interrupts, it takes one more insn to enable them out 0x3d,r28 ; write new stack pointer low, interrupt is still disabled Unfortunately, there is an AVR feature that the chip manual does not mention, but that we could confirm with a logic analyser slapped onto the chip. If there is an interrupt which arrives just after the cli but before the out 0x3f,r0, then the AVR indeed executes the out 0x3d,r28 before the interrupt is accepted, however, it does NOT decrement the stack pointer after pushing the return address to the stack. That is, if the content of r29:r28 that is written to the SP is 0x1234, then the return address will be pushed to the 0x1234, 0x1233 locations but the stack pointer value at the start of the interrupt routine will be 0x1234 instead of 0x1232. This naturally causes the interrupt to fetch its return address from 0x1235 and 0x1236, location that contain any unrelated data. We analysed this problem in an AT90CAN128 chip, an avr5 core. The error manifests itself under very special circumstances: you need a function that allocates stack space (most AVR code does not use large on-stack blocks and thus the compiler keeps everything in registers) and an interrupt that arrives after the "cli" but before the "out 0x3d,r28" instructions, a very narrow, 7 clock cycle window. The solution is to change the order of instructions from out 0x3f,r0 out 0x3d,r28 to out 0x3d,r28 out 0x3f,r0 in gcc/config/avr/avr.c in functions out_set_stack_ptr() and output_movhi(). The same bug exists in gcc version 3.4.x. To confirm the bug you should boot the ship, then do the following: ; write some code that causes a timer interrupt to happen in ; 9 clock cycles, then: sei ; interrupts are enabled clock is +1 in r24,0x3d ; get current stack pointer clock is +3 in r25,0x3e ; clock is +5 in r0,0x3f ; get status, clock is +7 cli ; disable interrupt, clock is +8 out 0x3e,r25 ; write sp high, clock is +10, interrupt just arrives out 0x3f,r0 ; write status, that enables the interrupt after 1 more insn out 0x3d,r24 ; stack low, interrupt accepted immediately after this insn and then in your interrupt service routine: isr_entry: in r26,0x3d in r27,0x3e and you will find that r26:27 and r24:25 will be the same, even though there should be a difference of 2 bytes (the return address) between them (we actually checked the bus activities using external RAM as stack and we also confirmed the behaviour using the above software-only method). We don't know if this behaviour of the AVR core is specific to the AT90CAN128 or all avr5 cores do the same, we don't have other chips handy. This bug is a ticking timebomb, it manifests itself under very special circumstances that are very hard to reproduce in a production system but that can nonetheless happen (as did in our case). -- Summary: A gcc primitive, under special circumstances, can crash the AVR Product: gcc Version: 4.0.1 Status: UNCONFIRMED Severity: critical Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: zoltan at bendor dot com dot au CC: gcc-bugs at gcc dot gnu dot org,zoltan at bendor dot com dot au GCC host triplet: i386-elf-linux GCC target triplet: avr-elf-unknown http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24027