Hello KR,

No need to apologize! Thank you for all your help on this issue. I actually
didn't know about these "trampoline" sections, I had to look them up after
your email. If you think it might make the code clearer, I can update the
comments about `_eronly` to reflect that it is no longer the end of .text
and .rodata, and instead just the start of .data? I can also mention these
trampoline sections in between.

What do you think?
Matteo

On Wed, May 28, 2025 at 8:11 PM <kr....@kerogit.eu> wrote:

> Well, I think I managed to figure it out.
>
> First, an apology - I didn't read the patch in #16457 properly and
> thought it just moves the _eronly assignment while it, in fact, changes
> ABSOLUTE to LOADADDR too. Sorry for that.
>
> After spotting this, I went and tried to find out how the change makes
> things different because from my understanding, it should do the same
> thing - assignment of current address where .data section begins should
> be the same as assignment of address of the .data section.
>
> Turns out there is a difference - the linker also generated a
> .trampoline section and placed it between _eronly assignment and
> beginning of .data section. My guess is that when this section is not
> mentioned in the linker script, the linker has free reign as to where to
> put it...? Anyway, that made _eronly value incorrect by .trampoline
> section size and things went badly from there.
>
> The issue can therefore also be fixed by this patch:
>
> diff --git a/boards/avr/atmega/arduino-mega2560/scripts/flash.ld
> b/boards/avr/atmega/arduino-mega2560/scripts/flash.ld
> index f365646dfa..f9ceeb3a08 100644
> --- a/boards/avr/atmega/arduino-mega2560/scripts/flash.ld
> +++ b/boards/avr/atmega/arduino-mega2560/scripts/flash.ld
> @@ -122,6 +122,12 @@ SECTIONS
>           _etext = . ;
>       } > flash
>
> +    .trampolines     :
> +    {
> +        *(.trampolines)
> +        *(.trampolines.*)
> +    } > flash
> +
>       _eronly = ABSOLUTE(.);
>
>       .data            :
>
> The reason why I wasn't able to reproduce this on mega1284p-xplained is
> then simple - there is no .trampolines section for that chip in the
> compiled binary.
>
> In my opinion, the proposed patch in #16457 is a better solution. While
> it does not preserve what the code comments say about the _eronly value
> ("this is where .text + .rodata ends"), it is a better match to what the
> _eronly value is actually used for ("this is start of .data section")
>
> On 2025-05-28 11:59, kr....@kerogit.eu wrote:
> > Alright, I looked into the possible bug mentioned in the last paragraph
> > of my previous message and it seems the code is actually correct. What
> > I thought was that if register pair Z overflows, subsequent reads would
> > read from incorrect memory locations. However, that does not seem to be
> > the case - according to AVR Instruction Set Manual DS40002198A, ELPM
> > instruction post-increment "applies to the entire 24-bit concatenation
> > of the RAMPZ and Z-Pointer Registers"
> >
> > As for the change of _eronly however, I still think that is incorrect.
> > Will try to look into that more during the evening unless someone
> > figures something else out.
> >
> > What compiler are you using anyway? From the kernel version mentioned
> > in GitHub I assume gcc 14.2? Might be the reason why everything works
> > fine for me and not for anyone else, I am still on 5.4 from Debian
> > stable. (Can try that too, Debian testing also has 14.2.)
> >
> > On 2025-05-28 09:16, kr....@kerogit.eu wrote:
> >> Hello,
> >>
> >> I don't that is a correct fix and in fact, from a quick look at things
> >> it seems to me that it actually introduces another bug. As described
> >> in src/atmega/atmega_head.S, that value is supposed to be "Start of
> >> .data section in FLASH". You changed it to end of .data section in
> >> FLASH. It is used after __do_copy_data label as an address where
> >> initialized data start in flash memory. So if I read this correctly,
> >> you are copying incorrect data to your initialized variables.
> >>
> >> If the program starts working this way, that might suggest that the
> >> initialization data are placed incorrectly in the image? Anyway, the
> >> linker script is pretty much identical to linker script of
> >> mega1284p-xplained and that one works out of the box without need to
> >> move _eronly.
> >>
> >> Hm, looking at this, I think I just spotted a bug in __do_copy_data
> >> but that would only hit if your initialized data region crossed a 64kB
> >> boundary. Is that the case in any chance? It would cause _some_
> >> initialized variables to contain random garbage...
> >>
> >> On 2025-05-28 07:17, Matteo Golin wrote:
> >>> Managed to find a fix here:
> >>> https://github.com/apache/nuttx/pull/16457
> >>>
> >>> On Tue, May 27, 2025 at 11:24 PM Matteo Golin
> >>> <matteo.go...@gmail.com>
> >>> wrote:
> >>>
> >>>> Thank you for your help KR!
> >>>>
> >>>> I also noticed this CONFIG_RAM_START=0x800100. Unfortunately,
> >>>> changing
> >>>> that value to the proper 0x800200 did not really help the issue.
> >>>>
> >>>> I went ahead and generated the disassembly like you suggested, and
> >>>> it does
> >>>> appear that the .text section starts with jump instructions for all
> >>>> the
> >>>> interrupt vectors:
> >>>>
> >>>> 00000000 <_stext>:
> >>>>        0:       0c 94 72 00     jmp     0xe4    ; 0xe4 <__start>
> >>>>        4:       0c 94 95 00     jmp     0x12a   ; 0x12a
> >>>> <atmega_int0>
> >>>>        8:       0c 94 98 00     jmp     0x130   ; 0x130
> >>>> <atmega_int1>
> >>>>        c:       0c 94 9b 00     jmp     0x136   ; 0x136
> >>>> <atmega_int2>
> >>>>       10:       0c 94 9e 00     jmp     0x13c   ; 0x13c
> >>>> <atmega_int3>
> >>>>       14:       0c 94 a1 00     jmp     0x142   ; 0x142
> >>>> <atmega_int4>
> >>>>       18:       0c 94 a4 00     jmp     0x148   ; 0x148
> >>>> <atmega_int5>
> >>>>       1c:       0c 94 a7 00     jmp     0x14e   ; 0x14e
> >>>> <atmega_int6>
> >>>>       20:       0c 94 aa 00     jmp     0x154   ; 0x154
> >>>> <atmega_int7>
> >>>>       24:       0c 94 ad 00     jmp     0x15a   ; 0x15a
> >>>> <atmega_pcint0>
> >>>>       28:       0c 94 b0 00     jmp     0x160   ; 0x160
> >>>> <atmega_pcint1>
> >>>>       2c:       0c 94 b3 00     jmp     0x166   ; 0x166
> >>>> <atmega_pcint2>
> >>>>       30:       0c 94 b6 00     jmp     0x16c   ; 0x16c <atmega_wdt>
> >>>>       34:       0c 94 b9 00     jmp     0x172   ; 0x172
> >>>> <atmega_tim2_compa>
> >>>>       38:       0c 94 bc 00     jmp     0x178   ; 0x178
> >>>> <atmega_tim2_compb>
> >>>>       3c:       0c 94 bf 00     jmp     0x17e   ; 0x17e
> >>>> <atmega_tim2_ovf>
> >>>>       40:       0c 94 c2 00     jmp     0x184   ; 0x184
> >>>> <atmega_tim1_capt>
> >>>>       44:       0c 94 c5 00     jmp     0x18a   ; 0x18a
> >>>> <atmega_tim1_compa>
> >>>>       48:       0c 94 c8 00     jmp     0x190   ; 0x190
> >>>> <atmega_tim1_compb>
> >>>> ...and so on...
> >>>>
> >>>> `g_idle_topstack` is 0x800e2b in the disassembly. With RAM_START
> >>>> being
> >>>> 0x800200, and a size of 8192 bytes, that gives a `heap_size` of 5076
> >>>> bytes.
> >>>> That all seems fine.
> >>>>
> >>>> However, printing out the value of `g_idle_topstack` (in binary
> >>>> using
> >>>> `up_putc`) I can see that the value does not match. The printed
> >>>> value is
> >>>> "1000011001110110" while the binary representation of 0x800e2b is
> >>>> "100000000000111000101011".
> >>>> Printing out the `heap_size` tells me it's "1111100111001101", which
> >>>> is
> >>>> not 5076.
> >>>>
> >>>> It seems your suspicion about a mismatch is probably correct,
> >>>> although
> >>>> it's not due to a misplaced vector table it seems.
> >>>>
> >>>> I did, for fun, try enabling `CONFIG_DEBUG_OPT_UNUSED_SECTIONS` and
> >>>> it did
> >>>> remove the vectors at the start of the assembly file. That image
> >>>> didn't
> >>>> work.
> >>>> Then I kept `CONFIG_DEBUG_OPT_UNUSED_SECTIONS` enabled and put the
> >>>> `KEEP(*(.vectors))` suggestion in the linker script. Vectors are
> >>>> kept in
> >>>> the image, and `g_idle_topstack` changes to 0x800be4. This image
> >>>> didn't
> >>>> work either.
> >>>>
> >>>> That got me thinking if there was another section getting
> >>>> unexpectedly
> >>>> tossed, so I put `KEEP(...)` around basically everything in the
> >>>> linker
> >>>> script. That didn't produce a different result, and still had the
> >>>> `g_idle_topstack` of 0x800e2b.
> >>>>
> >>>> I will continue to debug this, but if you have any other suggestions
> >>>> please let me know. Thank you very much for all your help so far!
> >>>>
> >>>> Matteo
> >>>>
> >>>> On Tue, May 27, 2025 at 7:33 PM <kr....@kerogit.eu> wrote:
> >>>>
> >>>>> Thanks to you as well.
> >>>>>
> >>>>> I spotted an error in the config posted both in the issue #16444
> >>>>> and PR
> >>>>> #16443. It is the
> >>>>>
> >>>>> CONFIG_RAM_START=0x800100
> >>>>>
> >>>>> Unless I overlooked something, the RAM starts at address 0x200
> >>>>> (0x800200) for AtMega 2560.
> >>>>>
> >>>>> Not sure if that's what causing the problem though - as far as I
> >>>>> can
> >>>>> see, that value is actually only used as an input to CONFIG_RAM_END
> >>>>> calculation and incorrect value in this macro simply reduces heap
> >>>>> size.
> >>>>>
> >>>>> The symptoms of the bug would match though - the program would be
> >>>>> trying
> >>>>> to use I/O registers as RAM.
> >>>>>
> >>>>> Thing is - seeing the bug was narrowed down to heap initialization
> >>>>> in
> >>>>> nx_start - I remember that I ran into pretty much the same issue
> >>>>> when
> >>>>> adding support for AVR DA/DB family... and I can't remember what
> >>>>> was the
> >>>>> problem.
> >>>>>
> >>>>> Maybe check the disassembly file? (RAW_DISASSEMBLY needs to be set
> >>>>> for
> >>>>> the file to be generated.) Verify that the .text section starts
> >>>>> with jmp
> >>>>> instructions for interrupt vectors. This is only a vague memory but
> >>>>> I
> >>>>> think that I ran into the issue because the linker removed .vectors
> >>>>> section, thinking it was unreferenced. (But that should not happen
> >>>>> here,
> >>>>> the linked config shows CONFIG_DEBUG_OPT_UNUSED_SECTIONS is not
> >>>>> set.)
> >>>>>
> >>>>> Anyway, when that happened, the whole program got "shifted" -
> >>>>> except for
> >>>>> program memory that stored values of initialized variables. The
> >>>>> data
> >>>>> copy loop in initialization code (atmega_head.S) then read the
> >>>>> values
> >>>>> from the shifted location leading to a mismatch - variables were
> >>>>> populated with wrong data. One of those variables was
> >>>>> g_idle_topstack
> >>>>> which directly sets start of heap. I think it initialized it to
> >>>>> zero...?
> >>>>>
> >>>>> Again, all of this is based on vague memory. TLDR version is: if
> >>>>> able,
> >>>>> check g_idle_topstack in up_allocate_heap and verify that it is
> >>>>> correct.
> >>>>>
> >>>>>
> >>>>> On 2025-05-27 15:28, Alan C. Assis wrote:
> >>>>> > Thank you KR,
> >>>>> >
> >>>>> > I posted your message in that issue for reference.
> >>>>> >
> >>>>> > I have a BK-AVR128 board:
> >>>>> > https://aliexpress.com/item/1005006234334573.html
> >>>>> > and will test NuttX on it to confirm.
> >>>>> >
> >>>>> > If everything is working as expected I will submit the board
> support to
> >>>>> > the
> >>>>> > mainline.
> >>>>> >
> >>>>> > BR,
> >>>>> >
> >>>>> > Alan
> >>>>> >
> >>>>> > On Tue, May 27, 2025 at 6:39 AM <kr....@kerogit.eu> wrote:
> >>>>> >
> >>>>> >> Hello,
> >>>>> >>
> >>>>> >> as mentioned in one of my previous e-mails to this list (RFC:
> >>>>> >> decoupling
> >>>>> >> ability to always panic from board_reset), I noticed #16444 and
> #16443
> >>>>> >> on GitHub.
> >>>>> >>
> >>>>> >> I found some time to look into this and I believe I found the
> reason
> >>>>> >> for
> >>>>> >> "If you add avr_lowputc calls in the board initialization code,
> you'll
> >>>>> >> see that the TX LED stays stuck on indefinitely."
> >>>>> >>
> >>>>> >> Provided the board initialization code that sentence is talking
> about
> >>>>> >> is
> >>>>> >> atmega_boardinitialize() in avr_boot.c, then avr_lowputc() likely
> does
> >>>>> >> not work because of the configuration used. According to #16444,
> the
> >>>>> >> configuration has:
> >>>>> >>
> >>>>> >> CONFIG_DEV_CONSOLE=y
> >>>>> >> CONFIG_CONSOLE_SYSLOG=y
> >>>>> >>
> >>>>> >> These two are processed in src/atmega/atmega_config.h
> >>>>> >>
> >>>>> >> #ifndef CONFIG_DEV_CONSOLE
> >>>>> >> #  undef  USE_SERIALDRIVER
> >>>>> >> #  undef  USE_EARLYSERIALINIT
> >>>>> >> #else
> >>>>> >> #  if defined(CONFIG_CONSOLE_SYSLOG)
> >>>>> >> #    undef  USE_SERIALDRIVER
> >>>>> >> #    undef  USE_EARLYSERIALINIT
> >>>>> >> #  elif defined(HAVE_USART_DEVICE)
> >>>>> >> #    define USE_SERIALDRIVER 1
> >>>>> >> #    define USE_EARLYSERIALINIT 1
> >>>>> >> #  else
> >>>>> >> #    undef  USE_SERIALDRIVER
> >>>>> >> #    undef  USE_EARLYSERIALINIT
> >>>>> >> #  endif
> >>>>> >> #endif
> >>>>> >>
> >>>>> >> With the configuration above, the outer ifndef is not true and
> first
> >>>>> >> if
> >>>>> >> defined in else block is true, which results into:
> >>>>> >>
> >>>>> >> undef  USE_SERIALDRIVER
> >>>>> >> undef  USE_EARLYSERIALINIT
> >>>>> >>
> >>>>> >> Since CONFIG_STANDARD_SERIAL is also set, the undef of
> >>>>> >> USE_SERIALDRIVER
> >>>>> >> is reverted by define USE_SERIALDRIVER 1 later. However,
> >>>>> >> USE_EARLYSERIALINIT remains unset.
> >>>>> >>
> >>>>> >> This causes avr_earlyserialinit() to not be built nor called from
> >>>>> >> avr_lowinit(). Serial port peripheral is therefore not
> initialized yet
> >>>>> >> when atmega_boardinitialize() is called. I don't know what exactly
> >>>>> >> happens when you attempt to transmit data with the port not
> enabled
> >>>>> >> but
> >>>>> >> my guess would be that "transmit data register empty" status flag
> is
> >>>>> >> just never cleared and the program ends up in a loop waiting for
> that
> >>>>> >> to
> >>>>> >> happen.
> >>>>> >>
> >>>>> >>
> >>>>> >> Other than that - I recently tested NSH on mega1284p-xplained
> (well, a
> >>>>> >> breadboard with the chip stuck in it actually) and it worked for
> me.
> >>>>> >> As
> >>>>> >> far as I can see, all AtMega devices use the same code for
> managing
> >>>>> >> serial ports so it should work out of the box.
> >>>>> >>
> >>>>> >> Someone somewhere in some forum on the net has or had a footer in
> his
> >>>>> >> posts saying something along the lines of that non-functional
> serial
> >>>>> >> port is 99% mismatching baud rates, might be worth a re-check.
> >>>>> >>
> >>>>> >> As for the PR itself (copying from e-mail mentioned at the
> beginning)
> >>>>> >> -
> >>>>> >> I would recommend trying to use KEEP(*(.vectors)) as seen in
> >>>>> >> boards/avr/avrdx/breadxavr/scripts/breadxavr.ld - the default
> config
> >>>>> >> should then not need the "# CONFIG_DEBUG_OPT_UNUSED_SECTIONS is
> not
> >>>>> >> set"
> >>>>> >> line.
> >>>>> >>
> >>>>>
> >>>>
>

Reply via email to