On 16 December 2014 at 16:21, Peter Howard <p...@northern-ridge.com.au> wrote: > On Tue, 2014-12-16 at 17:27 +1100, Peter Howard wrote: >> On Wed, 2014-12-10 at 19:10 -0700, Simon Glass wrote: >> > Hi Peter, >> > >> > On 10 December 2014 at 18:37, Simon Glass <s...@chromium.org> wrote: >> > > Hi Peter, >> > > >> > > On Dec 10, 2014 6:23 PM, "Peter Howard" <p...@northern-ridge.com.au> >> > > wrote: >> > >> >> > >> On Wed, 2014-12-10 at 17:49 -0700, Simon Glass wrote: >> > >> > Hi Peter, >> > >> > >> > >> > On 10 December 2014 at 17:19, Peter Howard >> > >> > <p...@northern-ridge.com.au> >> > >> > wrote: >> > >> > > On Wed, 2014-12-10 at 15:43 -0700, Simon Glass wrote: >> > >> > >> Hi Peter, >> > >> > >> >> > >> > >> On 10 December 2014 at 15:17, Peter Howard >> > >> > >> <p...@northern-ridge.com.au> wrote: >> > >> > >> > >> > >> > >> > On Tue, 2014-12-09 at 17:45 -0700, Simon Glass wrote: >> > >> > >> > > Hi Peter, >> > >> > >> > > >> > >> > >> > > On 9 December 2014 at 17:13, Peter Howard >> > >> > >> > > <p...@northern-ridge.com.au> wrote: >> > >> > >> > > > >> > >> > >> > > > On Wed, 2014-12-03 at 14:20 -0800, Simon Glass wrote: >> > >> > >> > > > > Hi Peter, >> > >> > >> > > > > >> > >> > >> > > > > On 3 December 2014 at 13:53, Peter Howard >> > >> > >> > > > > <p...@northern-ridge.com.au> wrote: >> > >> > >> > > > > > On Wed, 2014-12-03 at 06:38 -0700, Simon Glass wrote: >> > >> > >> > > > > >> Hi Peter, >> > >> > >> > > > > >> >> > >> > >> > > > > >> On 2 December 2014 at 14:59, Peter Howard >> > >> > >> > > > > >> <p...@northern-ridge.com.au> wrote: >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > I'm trying to make two changes to building u-boot for >> > >> > >> > > > > >> > the da850evm. >> > >> > >> > > > > >> > * Use the generic board code to get rid of the >> > >> > >> > > > > >> > warning, and >> > >> > >> > > > > >> > * Enable libfdt to allow booting of linux with a >> > >> > >> > > > > >> > standalone dtb >> > >> > >> > > > > >> > image. >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > The first part appears to be simple. Just adding >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > #define CONFIG_SYS_GENERIC_BOARD >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > in include/configs/da850evm.h works with no obvious >> > >> > >> > > > > >> > side-effects. >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > However, adding >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > #define CONFIG_OF_LIBFDT >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > is a different story. It appears to introduce memory >> > >> > >> > > > > >> > corruption when >> > >> > >> > > > > >> > loading the environment. On first boot it gives the >> > >> > >> > > > > >> > "bad CRC!" warning >> > >> > >> > > > > >> > and uses the default environment. If you *don't* save >> > >> > >> > > > > >> > the environment >> > >> > >> > > > > >> > you can boot fine (including manual editing of the >> > >> > >> > > > > >> > environment). However >> > >> > >> > > > > >> > if you save the environment via saveenv bad things >> > >> > >> > > > > >> > happen on the next >> > >> > >> > > > > >> > boot. An example log: >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > U-Boot SPL 2015.01-rc1 (Nov 27 2014 - 14:30:26) >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > U-Boot 2015.01-rc1 (Nov 27 2014 - 14:30:26) >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > I2C: ready >> > >> > >> > > > > >> > DRAM: 64 MiB >> > >> > >> > > > > >> > WARNING: Caches not enabled >> > >> > >> > > > > >> > MMC: davinci: 0 >> > >> > >> > > > > >> > SF: Detected M25P64 with page size 256 Bytes, erase >> > >> > >> > > > > >> > size >> > >> > >> > > > > >> > 64 KiB, total 8 MiB >> > >> > >> > > > > >> > In: serial >> > >> > >> > > > > >> > Out: serial >> > >> > >> > > > > >> > Err: serial >> > >> > >> > > > > >> > SF: Detected M25P64 with page size 256 Bytes, erase >> > >> > >> > > > > >> > size >> > >> > >> > > > > >> > 64 KiB, total 8 MiB >> > >> > >> > > > > >> > Warning: Invalid MAC address read from SPI flash >> > >> > >> > > > > >> > Net: DaVinci-EMAC >> > >> > >> > > > > >> > Error: DaVinci-EMAC address not set. >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > U-Boot > help >> > >> > >> > > > > >> > data abort >> > >> > >> > > > > >> > pc : [<c108ffd8>] lr : [<c10900b4>] >> > >> > >> > > > > >> > sp : c3e5f838 ip : 00000000 fp : c3e5fda4 >> > >> > >> > > > > >> > r10: c10b1f28 r9 : c3e5ff08 r8 : 0000000e >> > >> > >> > > > > >> > r7 : c10b22c4 r6 : c10aa2a0 r5 : 00000000 r4 : >> > >> > >> > > > > >> > 0000001b >> > >> > >> > > > > >> > r3 : c10b8f70 r2 : 00000001 r1 : c3e5f840 r0 : >> > >> > >> > > > > >> > ffffffff >> > >> > >> > > > > >> > Flags: Nzcv IRQs off FIQs off Mode SVC_32 >> > >> > >> > > > > >> > Resetting CPU ... >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > If I rebuild with CONFIG_OF_LIBFDT removed again from >> > >> > >> > > > > >> > da850evm.h the >> > >> > >> > > > > >> > problem disappears. And you can see that the saveenv >> > >> > >> > > > > >> > worked (i.e. the >> > >> > >> > > > > >> > environment is what was saved before the reboot and >> > >> > >> > > > > >> > data >> > >> > >> > > > > >> > abort). >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > I've traced the problem as far as the inline version >> > >> > >> > > > > >> > of >> > >> > >> > > > > >> > console_puts() >> > >> > >> > > > > >> > in common/console.c. The table dispatch there and the >> > >> > >> > > > > >> > fact that the >> > >> > >> > > > > >> > problem appears only when you load the environment >> > >> > >> > > > > >> > makes >> > >> > >> > > > > >> > me think it's >> > >> > >> > > > > >> > memory corruption. >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > Note: if you do *not* specify CONFIG_SYS_GENERIC_BOARD >> > >> > >> > > > > >> > you still get the >> > >> > >> > > > > >> > data abort, however it takes a bit more effort to >> > >> > >> > > > > >> > trigger (like actually >> > >> > >> > > > > >> > looking at the environment :-) ) >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > (Note: This is building against the u-boot-2015.01-rc1 >> > >> > >> > > > > >> > tree) >> > >> > >> > > > > >> > >> > >> > >> > > > > >> > Suggestions? >> > >> > >> > > > > >> >> > >> > >> > > > > >> In case it helps, I got the same symptom (help crashes) >> > >> > >> > > > > >> and it was due >> > >> > >> > > > > >> to BSS not being cleared. Stefan (on cc) found this >> > >> > >> > > > > >> problem - he said >> > >> > >> > > > > >> something to do with GDT calculation or handling. >> > >> > >> > > > > >> However >> > >> > >> > > > > >> it is just a >> > >> > >> > > > > >> guess and probably has nothing to do with your issue. >> > >> > >> > > > > > >> > >> > >> > > > > > I may be missing something, but the GDT appears to be >> > >> > >> > > > > > x86-specific >> > >> > >> > > > > > whereas I'm building for ARMv5. >> > >> > >> > > > > >> > >> > >> > > > > OK for some reason I thought this was PPC! >> > >> > >> > > > > >> > >> > >> > > > > Maybe you can find your pc in System.map and work out where >> > >> > >> > > > > it is >> > >> > >> > > > > going wrong? Are you hitting some image size limit? >> > >> > >> > > > > >> > >> > >> > > > > pc : [<c108ffd8>] >> > >> > >> > > > >> > >> > >> > > > >> > >> > >> > > > Sorry, been distracted on other stuff for a few days. >> > >> > >> > > > >> > >> > >> > > > First, I now understand the global descriptor a bit better. >> > >> > >> > > > For ARMv5 >> > >> > >> > > > It's stored in r9 and still looks sane. The relevant info: >> > >> > >> > > > >> > >> > >> > > > (gdb) print/x *((gd_t *)$r9) >> > >> > >> > > > $1 = {bd = 0xc3e5ffb0, flags = 0x183, baudrate = 0x1c200, >> > >> > >> > > > cpu_clk = 0x0, >> > >> > >> > > > bus_clk = 0x0, pci_clk = 0x0, mem_clk = 0x0, have_console = >> > >> > >> > > > 0x1, >> > >> > >> > > > env_addr = 0xc10a8fcc, env_valid = 0x1, ram_top = >> > >> > >> > > > 0xc4000000, >> > >> > >> > > > relocaddr = 0xc3f80000, ram_size = 0x4000000, mon_len = >> > >> > >> > > > 0x6ffb0, >> > >> > >> > > > irq_sp = 0xc3e5fef0, start_addr_sp = 0xc3e5fee0, reloc_off >> > >> > >> > > > = >> > >> > >> > > > 0x2f00000, >> > >> > >> > > > new_gd = 0xc3e5ff08, fdt_blob = 0x0, new_fdt = 0x0, >> > >> > >> > > > fdt_size >> > >> > >> > > > = 0x0, >> > >> > >> > > > jt = 0xc3e601c0, env_buf = {0x31, 0x31, 0x35, 0x32, 0x30, >> > >> > >> > > > 0x30, >> > >> > >> > > > 0x0 <repeats 26 times>}, cur_i2c_bus = 0x0, timebase_h = >> > >> > >> > > > 0x0, >> > >> > >> > > > timebase_l = 0x0, arch = {timer_rate_hz = 0x16e360, tbu = >> > >> > >> > > > 0x0, >> > >> > >> > > > tbl = 0x4cc62, lastinc = 0x0, timer_reset_value = 0x0, >> > >> > >> > > > tlb_addr = 0xc3ff0000, tlb_size = 0x4000}} >> > >> > >> > > > >> > >> > >> > > > >> > >> > >> > > > The pc is definitely bogus. The reloc address is 0xc3f80000 >> > >> > >> > > > whereas >> > >> > >> > > > that would be a pre-reloc address (starting at 0xc1080000). >> > >> > >> > > > And it's >> > >> > >> > > > definitely relocated by the time of failure. The only other >> > >> > >> > > > bit of >> > >> > >> > > > information I have right now is that adding CONFIG_OF_LIBFDT >> > >> > >> > > > drops the >> > >> > >> > > > reloc address from 0xc3f85000 to 0xc3f80000. >> > >> > >> > > > >> > >> > >> > > > Don't know if any of that gives additional insight. >> > >> > >> > > > Meanwhile >> > >> > >> > > > I >> > >> > >> > > > continue tracing. >> > >> > >> > > >> > >> > >> > > Yes, continue tracing. >> > >> > >> > > >> > >> > >> > > If ram_size is 0x40000000 and ram_top is 0xc4000000 then your >> > >> > >> > > RAM >> > >> > >> > > presumably starts at 0xc0000000. Then the relocation address >> > >> > >> > > actually >> > >> > >> > > seems reasonable to me. >> > >> > >> > > >> > >> > >> > > I don't know why the reloc address changes when you add >> > >> > >> > > CONFIG_OF_LIBFDT. >> > >> > >> > > >> > >> > >> > > You can add '#define DEBUG' at the very top of board_f/r.c to >> > >> > >> > > see >> > >> > >> > > addresses. >> > >> > >> > >> > >> > >> > I'm not sure what you meant by board_f/r.c as that file doesn't >> > >> > >> > seem to >> > >> > >> >> > >> > >> common/board_f.c >> > >> > >> common/board_r.c >> > >> > >> >> > >> > >> > >> > >> > >> > exist. I whacked '#define DEBUG' in da850evm.h and got a wealth >> > >> > >> > of >> > >> > >> > output. However, the only new bit of information I've gleaned is >> > >> > >> > that >> > >> > >> > the lower that the reloc address goes, the faster things die. It >> > >> > >> > goes >> > >> > >> > lower in -rc3 (0xc3f7f000), and it doesn't make it to the prompt >> > >> > >> > on >> > >> > >> > a >> > >> > >> > reset after saving the environment. Likewise with '#define >> > >> > >> > DEBUG'; >> > >> > >> > after saving the environment it doesn't get back to the prompt on >> > >> > >> > the >> > >> > >> > next reset. All the addresses printed seem reasonable. >> > >> > >> > >> > >> > >> > The only thing that doesn't look right is that the command >> > >> > >> > function >> > >> > >> > pointers all look to be pre-reloc addresses. Though I don't see >> > >> > >> > how >> > >> > >> > this change would cause a failure that wouldn't happen already. >> > >> > >> > >> > >> > >> > So it seems that _something_ is being overwritten by the >> > >> > >> > environment >> > >> > >> > load, but I'm yet to get an idea of what. >> > >> > >> > >> > >> > >> > -- >> > >> > >> > Peter Howard <p...@northern-ridge.com.au> >> > >> > >> > >> > >> > >> >> > >> > >> Me neither. But you do have a data abort so may be able to look >> > >> > >> around >> > >> > >> there and figure out where exactly it died. Better if you can use a >> > >> > >> debugger. >> > >> > >> >> > >> > > >> > >> > > >> > >> > > Here's what appears to be happening with a death on typing >> > >> > > "help" (-rc1): The logic flow gets to the (unrelocated) fputs() - >> > >> > > and >> > >> > > into the inline version of console_putc(). It looks up >> > >> > > stdio_devices[1] >> > >> > > (again, unrelocated addr) which is a valid pointer - sort of. The >> > >> > > value >> > >> > > is 0x2081004 which is outside of RAM, and the contents of the >> > >> > > address >> > >> > > are, according to gdb, zeroed out. Which means >> > >> > > stdio_devices[1]->putc() >> > >> > > is a jump to 0x0. I've stepped through that using JTAG+openocd+gdb. >> > >> > > >> > >> > > With extra debug statements, console output seems to cause a hang >> > >> > > from >> > >> > > somewhere in himport_r() (which is using relocated addresses >> > >> > > including >> > >> > > data). >> > >> > > >> > >> > > All this, to me, points to an issue with the unrelocated locations >> > >> > > being >> > >> > > used after environment import, but I don't know enough about u-boot >> > >> > > structure to know if that is right or not . . . >> > >> > > >> > >> > > >> > >> > > Peter Howard <p...@northern-ridge.com.au> >> > >> > > >> > >> > >> > >> > Perhaps look at how it gets to the unrelocated fputs()? If it can call >> > >> > the correct fputs() before initr_env() then you can perhaps narrow it >> > >> > down. >> > >> > >> > >> > But I can't see how you would be able to type at the console with this >> > >> > problem, since fputs() is used by the command line editor. >> > >> > >> > >> > I suspect you are actually seeing a symptom of something else. You >> > >> > could try enabling CONFIG_CONSOLE_MUX and see if that changes the bug. >> > >> >> > >> Hmmm. That produces a new failure - it goes into an endless loop in >> > >> fgetc(). And it does that: >> > >> * With CONFIG_GENERIC_BOARD and CONFIG_OF_LIBFDT - both with and >> > >> without saving the environment >> > >> * With CONFIG_GENERIC_BOARD only, >> > >> * Without CONFIG_GENERIC_BOARD. >> > >> >> > >> :-) >> > > >> > > so just adding the console config changes the behavior on your board? >> > > Does >> > > you BSS work? Do you have a custom link script? Are you writing to BSS >> > > before relocation? >> > >> > I see a few things: >> > >> > - 4KB stack (should be enough I suppose) >> > - SPL link script, but it doesn't look like it does anything useful. >> > Maybe drop it? >> > >> > But I'm pretty sure this is nothing to do with it. This is a bit of a >> > long shot, but if your relocation is broken you might be corrupting >> > BSS - the variables in System.map between __rel_dyn_start and >> > __rel_dyn_end. This can happen if you write to a BSS variable before >> > relocation. You can check the area (e.g. by checksumming it) early in >> > board_init_f() - e.g. setup_mon_len(). Put the result in a new member >> > of struct global_data (gd) - then checksum again and compare before >> > relocation in setup_reloc(). >> > >> > Probably nothing else but to keep digging. >> >> OK, I _think_ I have a handle on this. But hopefully there's someone >> out there who understands better than me how the da850 SPI flash is >> setup wrt. u-boot usage. >> >> It appears that the damage occurs with the actual writing of the env via >> saveenv (i.e. not the reading back of it next time round). Why? >> because stepping through crt0.S and relocate.S shows different results >> by relocate_done: in relocate.S When the environment is not read, all >> the relocations are correct. After saveenv is done and the board is >> reset, various relocations are incomplete - i.e. the addresses in the >> relocated tables point to the pre-relocation addresses. Which then get >> trashed when the environment is read (afterwards). >> >> I'm guessing that the problem is the size of the u-boot image is now >> overlapping in spi flash with the location of the environment. So >> saving the environment actually trashes part of the u-boot image. >> Further guessing is it involves the __rel_dyn area, so the address >> fixups don't happen. >> >> Does that sound believable? > > And yes, that _was_ the problem. After all that, a 2 line fix (apart > from the enabling of the generic platform and libfdt). Patch to follow. >
Great! - Simon _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot