Hi Heinrich, xypron.g...@gmx.de wrote on Thu, 20 Jul 2023 19:55:39 +0200:
> Am 20. Juli 2023 18:39:17 MESZ schrieb Miquel Raynal > <miquel.ray...@bootlin.com>: > >Hello, > > > >qianfangui...@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800: > > > >> It's very strange. And I can't detect it's a bug of usb or dlmalloc. > >> > >> 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok. > >> > >> 2. Starting u-boot and dhcp via am335x's usb net, data abort. > >> > >> 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's > >> ok. > > > >I am sorry to re-open a thread that is one year old but this is > >still an open bug. The BBB is affected. In particular the BBBW > >because there is no Ethernet connector, which makes the Eth-over-USB > >emulation even more important. All U-Boots since 2021 are affected: > >spurious data aborts, usually at the end of network interactions (tftp, > >ping). I could not bisect it because the boot was deeply broken as > >well on a significant range of commits :-/. > > > >On my side I narrowed it down to an env update which fails in malloc as > >well. If I comment the env update, it fails a bit later. It really > >looks like a stack corruption which is either related to the Ethernet > >USB gadget or the USB controller driver itself. Network transfers on > >the BBBW using regular Ethernet does not trigger any error. > > > >I also observe the very strange "fix" mentioned above: starting and > >killing fastboot makes all tftp pass... If anyone has more details to > >share, or perhaps a subsequent thread giving more details, I would > >really like to see this fixed upstream, I suppose I am not the only one > >:-) > > > >Thanks, > >Miquèl > > > Can this problem be reproduced on QEMU? I haven't tried on QEMU, what do you have in mind? What should we try to do? Thanks, Miquèl > > Best regards > > Heinrich > > > > >> > >> 在 2022/3/24 17:33, qianfan 写道: > >> > > >> > 在 2022/3/23 18:12, Heinrich Schuchardt 写道: > >> >> On 3/23/22 11:07, qianfan wrote: > >> >>> > >> >>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道: > >> >>>> On 3/23/22 10:13, qianfan wrote: > >> >>>>> > >> >>>>> 在 2022/3/23 16:02, qianfan 写道: > >> >>>>>> > >> >>>>>> > >> >>>>>> 在 2022/3/23 15:45, qianfan 写道: > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> 在 2022/3/23 10:28, qianfan 写道: > >> >>>>>>>> > >> >>>>>>>> Hi: > >> >>>>>>>> > >> >>>>>>>> I had a custom AM335X board connected my computer by usbnet. It > >> >>>>>>>> always report data abort when 'dhcp': > >> >>>>>>>> > >> >>>>>>>> Next it the log: > >> >>>>>>>> > >> >>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 > >> >>>>>>>> +0800) > >> >>>>>>>> > >> >>>>>>>> CPU : AM335X-GP rev 2.1 > >> >>>>>>>> Model: WISDOM AM335X CCT > >> >>>>>>>> DRAM: 512 MiB > >> >>>>>>>> NAND: 256 MiB > >> >>>>>>>> MMC: OMAP SD/MMC: 0 > >> >>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using > >> >>>>>>>> default environment > >> >>>>>>>> > >> >>>>>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 > >> >>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether > >> >>>>>>>> Hit any key to stop autoboot: 0 > >> >>>>>>>> => setenv autoload no > >> >>>>>>>> => dhcp > >> >>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > >> >>>>>>>> MAC de:ad:be:ef:00:01 > >> >>>>>>>> HOST MAC de:ad:be:ef:00:00 > >> >>>>>>>> RNDIS ready > >> >>>>>>>> musb-hdrc: peripheral reset irq lost! > >> >>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > >> >>>>>>>> USB RNDIS network up! > >> >>>>>>>> BOOTP broadcast 1 > >> >>>>>>>> BOOTP broadcast 2 > >> >>>>>>>> BOOTP broadcast 3 > >> >>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms) > >> >>>>>>>> data abort > >> >>>>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] > >> >>>>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] > >> >>>>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 > >> >>>>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 > >> >>>>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 > >> >>>>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 > >> >>>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > >> >>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a > >> >>>>>>>> Resetting CPU ... > >> >>>>>>>> > >> >>>>>>>> resetting ... > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> It's there has any doc about how to debug data abort? Or is the > >> >>>>>>>> bug > >> >>>>>>>> is already fixed? > >> >>>>>>>> > >> >>>>>>>> Thanks > >> >>>>>>>> > >> >>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and > >> >>>>>>> v2021.04-rc2 is bad. > >> >>>>>>> > >> >>>>>>> Also I had tested this on beaglebone black with > >> >>>>>>> am335x_evm_defconfig, > >> >>>>>>> has the simliar problem. > >> >>>>>>> > >> >>>>>>> find the first bug commit via 'git bisect': it told me that commit > >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very > >> >>>>>>> strange due to this commit doesn't touch any dhcp or network code. > >> >>>>>>> > >> >>>>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug > >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit > >> >>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 > >> >>>>>>> Author: Heinrich Schuchardt <xypron.g...@gmx.de> > >> >>>>>>> Date: Wed Jan 20 22:21:53 2021 +0100 > >> >>>>>>> > >> >>>>>>> fs: fat: consistent error handling for flush_dir() > >> >>>>>>> > >> >>>>>>> Provide function description for flush_dir(). > >> >>>>>>> Move all error messages for flush_dir() from the callers to the > >> >>>>>>> function. > >> >>>>>>> Move mapping of errors to -EIO to the function. > >> >>>>>>> Always check return value of flush_dir() (Coverity CID 316362). > >> >>>>>>> > >> >>>>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. > >> >>>>>>> > >> >>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.g...@gmx.de> > >> >>>>>>> > >> >>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e > >> >>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before > >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: > >> >>>>>>> > >> >>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent > >> >>>>>>> error > >> >>>>>>> handling for flush_dir() > >> >>>>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag > >> >>>>>>> 'u-boot-rockchip-20210121' of > >> >>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip > >> >>>>>>> |\ > >> >>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc > >> >>>>>>> based PCIe controller driver > >> >>>>>>> > >> >>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. > >> >>>>>>> > >> >>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 > >> >>>>>>> +0800) > >> >>>>>>> > >> >>>>>>> CPU : AM335X-GP rev 2.1 > >> >>>>>>> Model: TI AM335x BeagleBone Black > >> >>>>>>> DRAM: 512 MiB > >> >>>>>>> WDT: Started with servicing (60s timeout) > >> >>>>>>> NAND: 0 MiB > >> >>>>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 > >> >>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first > >> >>>>>>> E-fuse MAC > >> >>>>>>> Net: eth2: ethernet@4a100000, eth3: usb_ether > >> >>>>>>> Hit any key to stop autoboot: 0 > >> >>>>>>> => dhcp > >> >>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to > >> >>>>>>> complete......... TIMEOUT ! > >> >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > >> >>>>>>> MAC de:ad:be:ef:00:01 > >> >>>>>>> HOST MAC de:ad:be:ef:00:00 > >> >>>>>>> RNDIS ready > >> >>>>>>> musb-hdrc: peripheral reset irq lost! > >> >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > >> >>>>>>> USB RNDIS network up! > >> >>>>>>> BOOTP broadcast 1 > >> >>>>>>> BOOTP broadcast 2 > >> >>>>>>> BOOTP broadcast 3 > >> >>>>>>> DHCP client bound to address 192.168.200.157 (757 ms) > >> >>>>>>> Using usb_ether device > >> >>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 > >> >>>>>>> Filename 'u-boot.img'. > >> >>>>>>> Load address: 0x82000000 > >> >>>>>>> Loading: > >> >>>>>>> ################################################################# > >> >>>>>>> ################################################################# > >> >>>>>>> ################################################################# > >> >>>>>>> ######################### > >> >>>>>>> 2.5 MiB/s > >> >>>>>>> done > >> >>>>>>> Bytes transferred = 1123888 (112630 hex) > >> >>>>>>> => > >> >>>>>>> > >> >>>>> "data abort" messages: > >> >>>>> > >> >>>>> data abort > >> >>>>> pc : [<9ff8196c>] lr : [<9ffa1cd7>] > >> >>>>> reloc pc : [<8081496c>] lr : [<80834cd7>] > >> >>>>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 > >> >>>>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d > >> >>>>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 > >> >>>>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 > >> >>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > >> >>>>> Code: 0303 60ca 4403 6091 (685a) f042 > >> >>>>> Resetting CPU ... > >> >>>>> > >> >>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk > >> >>>>> > >> >>>>> unlink(victim, bck, fwd); > >> >>>>> 80814966: 60ca str r2, [r1, #12] > >> >>>>> set_inuse_bit_at_offset(victim, victim_size); > >> >>>>> 80814968: 4403 add r3, r0 > >> >>>>> unlink(victim, bck, fwd); > >> >>>>> 8081496a: 6091 str r1, [r2, #8] > >> >>>>> set_inuse_bit_at_offset(victim, victim_size); > >> >>>>> 8081496c: 685a ldr r2, [r3, #4] > >> >>>>> 8081496e: f042 0201 orr.w r2, r2, #1 > >> >>>>> 80814972: 605a str r2, [r3, #4] > >> >>>>> > >> >>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x. > >> >>>>> > >> >>>>> > >> >>>> > >> >>>> I have seen crashes in common/dlmalloc.c before after double free() or > >> >>>> free() with an incorrect pointer. > >> >>>> > >> >>>> The assert() statements in do_check_inuse_chunk() are meant to catch > >> >>>> this but assert() as defined in include/log.h does not stop the code > >> >>>> and > >> >>>> even does not print without _DEBUG=1. > >> >>>> > >> >>>> You should be able to get the assert output with > >> >>>> > >> >>>> #include <common.h> > >> >>>> #define _DEBUG 1 > >> >>>> #include <log.h> > >> >>>> > >> >>>> at the top of common/dlmalloc.c. > >> >>>> > >> >>>> You should get full malloc debug output with > >> >>> > >> >>> Hi: I had try add DEBUG marco before <log.h> and no other malloc > >> >>> message > >> >> > >> >> assert() checks for _DEBUG. Defining DEBUG after common.h will not > >> >> define _DEBUG. > >> > > >> > Finally I got a malloc error message on console: > >> > > >> > TFTP from server 192.168.200.1; our IP address is 192.168.200.39 > >> > Filename 'u-boot.img'. > >> > Load address: 0x82000000 > >> > Loading: > >> > ################################################################# > >> > ################################################################# > >> > ################################################################# > >> > ###################################################### 0 Bytes > >> > 1.9 MiB/s > >> > done > >> > Bytes transferred = 1274816 (1373c0 hex) > >> > common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= > >> > (char*)top' > failed. > >> > > >> > I had tried many times, do_check_chunk not always failed, and sometimes > >> > it > report common/dlmalloc.c:802: do_check_chunk: Assertion > >> > `!chunk_is_mmapped(p)' > failed. The situation is not the same. > >> > > >> > I got a bt stack when malloc failed: > >> > > >> > (gdb) bt > >> > #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 > >> > #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at > >> > lib/panic.c:49 > >> > #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, > >> > file=<optimized > out>, line=<optimized out>, function=<optimized out>) > >> > at lib/panic.c:56 > >> > #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > > >> > common/dlmalloc.c:866 > >> > #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, > >> > s=s@entry=24) > at common/dlmalloc.c:900 > >> > #5 0x9ff76da6 in malloc (bytes=<optimized out>) at > >> > common/dlmalloc.c:1552 > >> > #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > > >> > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 > >> > #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, > >> > name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184 > >> > #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at > >> > env/callback.c:67 > >> > #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, > >> > retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 > >> > #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized > >> > out>, > env_flag=512, flag=0) at cmd/nvedit.c:296 > >> > #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized > >> > out>) > at cmd/nvedit.c:318 > >> > #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 > >> > #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > >> > > argv=0x9df442c8) at cmd/net.c:268 > >> > #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, > >> > argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 > >> > #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > > >> > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 > >> > #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 > >> > #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 > >> > #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 > >> > #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > >> > > common/cli_hush.c:3206 > >> > #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 > >> > #21 0x9ff77c1a in cli_loop () at common/cli.c:229 > >> > #22 0x9ff70d3e in main_loop () at common/main.c:66 > >> > #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 > >> > #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > > >> > include/initcall.h:46 > >> > #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > >> > > common/board_r.c:822 > >> > Backtrace stopped: previous frame identical to this frame (corrupt > >> > stack?) > >> > > >> >> > >> >> Best regards > >> >> > >> >> Heinrich > >> >> > >> >>> printed. > >> >>> > >> >>>> > >> >>>> #define DEBUG 1 > >> >>>> #include <common.h> > >> >>>> #include <log.h> > >> >>>> > >> >>>> Best regards > >> >>>> > >> >>>> Heinrich > >> >>>