When the compiler needs to allocate stack space for a function,
it uses the following assembly fragment (commented by me):

in r28,0x3d ; get stack pointer high
in r29,0x3e ; get stack pointer low
sbiw r28,N ; decrement value by N
in r0,0x3f ; get status register
cli ; disable interrupts
out 0x3e,r29 ; write new stack pointer high
out 0x3f,r0 ; enable interrupts, it takes one more insn to enable them
out 0x3d,r28 ; write new stack pointer low, interrupt is still disabled

Unfortunately, there is an AVR feature that the chip manual does not
mention, but that we could confirm with a logic analyser slapped onto
the chip.

If there is an interrupt which arrives just after the cli but before
the out 0x3f,r0, then the AVR indeed executes the out 0x3d,r28 before
the interrupt is accepted, however, it does NOT decrement the stack pointer
after pushing the return address to the stack.

That is, if the content of r29:r28 that is written to the SP is 0x1234,
then the return address will be pushed to the 0x1234, 0x1233 locations
but the stack pointer value at the start of the interrupt routine will
be 0x1234 instead of 0x1232. This naturally causes the interrupt to fetch
its return address from 0x1235 and 0x1236, location that contain any
unrelated data.

We analysed this problem in an AT90CAN128 chip, an avr5 core. The error
manifests itself under very special circumstances: you need a function 
that allocates stack space (most AVR code does not use large on-stack
blocks and thus the compiler keeps everything in registers) and an 
interrupt that arrives after the "cli" but before the "out 0x3d,r28"
instructions, a very narrow, 7 clock cycle window.

The solution is to change the order of instructions from

out 0x3f,r0
out 0x3d,r28

to

out 0x3d,r28
out 0x3f,r0

in gcc/config/avr/avr.c in functions out_set_stack_ptr() and output_movhi().

The same bug exists in gcc version 3.4.x.

To confirm the bug you should boot the ship, then do the following:

; write some code that causes a timer interrupt to happen in
; 9 clock cycles, then:

  sei ; interrupts are enabled clock is +1
  in r24,0x3d  ; get current stack pointer clock is +3
  in r25,0x3e ; clock is +5
  in r0,0x3f ; get status, clock is +7
  cli ; disable interrupt, clock is +8
  out 0x3e,r25 ; write sp high, clock is +10, interrupt just arrives
  out 0x3f,r0 ; write status, that enables the interrupt after 1 more insn
  out 0x3d,r24 ; stack low, interrupt accepted immediately after this insn

and then in your interrupt service routine:
isr_entry:
   in r26,0x3d
   in r27,0x3e

and you will find that r26:27 and r24:25 will be the same, even though
there should be a difference of 2 bytes (the return address) between
them (we actually checked the bus activities using external RAM as stack
and we also confirmed the behaviour using the above software-only method).

We don't know if this behaviour of the AVR core is specific to the
AT90CAN128 or all avr5 cores do the same, we don't have other chips
handy.

This bug is a ticking timebomb, it manifests itself under very special
circumstances that are very hard to reproduce in a production system but
that can nonetheless happen (as did in our case).

-- 
           Summary: A gcc primitive, under special circumstances,  can crash
                    the AVR
           Product: gcc
           Version: 4.0.1
            Status: UNCONFIRMED
          Severity: critical
          Priority: P2
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: zoltan at bendor dot com dot au
                CC: gcc-bugs at gcc dot gnu dot org,zoltan at bendor dot com
                    dot au
  GCC host triplet: i386-elf-linux
GCC target triplet: avr-elf-unknown


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24027

Reply via email to