On Mon, 4 Nov 2024 at 13:16, Cédric Le Goater <c...@kaod.org> wrote:
>
> On 11/4/24 13:19, Cédric Le Goater wrote:
> > On 11/4/24 13:12, Thomas Huth wrote:
> >> On 28/10/2024 18.14, Peter Maydell wrote:
> >>> Trying a "make check-functional" I find that the func-arm-arm_aspeed
> >>> test seems to hit a timeout:
> >>>
> >>> 18/18 qemu:func-thorough+func-arm-thorough+thorough /
> >>> func-arm-arm_aspeed              TIMEOUT        600.08s   killed by
> >>> signal 15 SIGTERM
> >>>
> >>> This is with commit cea8ac78545a.
> >>
> >> Cédric, is it working reliable for you?
> >
> > I have never seen this issue on the systems I use. Let me try again.
>
> On a (slow) ARM SBC, I am seeing a timeout indeed.
>
> The log file contains :
>
> 2024-11-04 13:59:01,219: Starting dropbear sshd: OK
> 2024-11-04 13:59:01,628: Aspeed AST2600 EVB
> 2024-11-04 13:59:01,829: ast2600-evb login: root
> 2024-11-04 13:59:01,847: passw0rd
> 2024-11-04 13:59:01,850: echo lm75 0x4d > 
> /sys/class/i2c-dev/i2c-3/device/new_device
> 2024-11-04 14:00:13,916: Password:
> 2024-11-04 14:00:13,917: Login timed out after 60 seconds
> 2024-11-04 14:00:15,418: Aspeed AST2600 EVB
>
> This means that the sleep workaround failed :/
>
>          # the line before login:
>          self.wait_for_console_pattern(pattern)
>          time.sleep(0.1)
>          exec_command(self, 'root')
>          time.sleep(0.1)
>          exec_command(self, "passw0rd")

Those sleeps are definitely dubious, but I did try a local
change of bumping them to 2 seconds and it didn't fix the problem.
Looking at
 
tests/functional/arm/test_arm_aspeed.AST2x00Machine.test_arm_ast2500_evb_buildroot/console.log
in the build tree (which I assume is the relevant log)
it did log in OK, and proceeded through the test to try
to power the machine off:

2024-11-04 14:39:59,092: cat /sys/class/hwmon/hwmon1/temp1_input
2024-11-04 14:39:59,110: # cat /sys/class/hwmon/hwmon1/temp1_input
2024-11-04 14:39:59,131: 18000
2024-11-04 14:39:59,131: poweroff
2024-11-04 14:39:59,139: # poweroff
2024-11-04 14:39:59,323: # Stopping dropbear sshd: OK
2024-11-04 14:40:00,615: Stopping network:

but that's where the log ends.

Digging even further I have just found

tests/functional/arm/test_arm_aspeed.AST2x00Machine.test_arm_ast2500_evb_buildroot/qemu-machine-cgjbonzx/7f39a8515f90.log

which has what looks like QEMU's stderr output, which reveals
the culprit:

../../net/checksum.c:106:9: runtime error: member access within
misaligned address 0x55a2d1887a5e for type 'struct ip_header', which
requires 4 byte alignment
0x55a2d1887a5e: note: pointer points here
 04 a8 08 00 45 00  01 48 37 9a 40 00 40 11  e9 fa 0a 00 02 0f 0a 00
02 02 00 44 00 43 01 34  19 56
             ^
    #0 0x55a2cc1f87d4 in net_checksum_calculate
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../net/checksum.c:106:9
    #1 0x55a2cbcf00ad in ftgmac100_do_tx
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../hw/net/ftgmac100.c:611:17
    #2 0x55a2cbcf00ad in ftgmac100_write
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../hw/net/ftgmac100.c:843:13
    #3 0x55a2cc7af5af in memory_region_write_accessor
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../system/memory.c:497:5
    #4 0x55a2cc7aec6a in access_with_adjusted_size
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../system/memory.c:573:18
    #5 0x55a2cc7ae717 in memory_region_dispatch_write
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../system/memory.c
    #6 0x55a2cc7e5c85 in flatview_write_continue_step
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../system/physmem.c:2786:18
    #7 0x55a2cc7dba0c in flatview_write_continue
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../system/physmem.c:2816:19
    #8 0x55a2cc7dba0c in flatview_write
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../system/physmem.c:2847:12
    #9 0x55a2cc7e4efa in subpage_write
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../system/physmem.c:2425:12
    #10 0x55a2cc7afa66 in memory_region_write_with_attrs_accessor
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../system/memory.c:518:12
    #11 0x55a2cc7aec6a in access_with_adjusted_size
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../system/memory.c:573:18
    #12 0x55a2cc7ae717 in memory_region_dispatch_write
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../system/memory.c
    #13 0x55a2cc86f2c8 in int_st_mmio_leN
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../accel/tcg/cputlb.c:2489:13
    #14 0x55a2cc86f2c8 in do_st_mmio_leN
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../accel/tcg/cputlb.c:2524:12
    #15 0x55a2cc85d2e0 in do_st_4
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../accel/tcg/cputlb.c:2694:9
    #16 0x55a2cc85d2e0 in do_st4_mmu
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../accel/tcg/cputlb.c:2770:9
    #17 0x7fb73a213e05  (<unknown module>)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../../net/checksum.c:106:9 in


So the network device in this board has hit an undefined
behaviour which clang's sanitizer has detected and
made QEMU abort.

The test framework here seems to be severely lacking:

(1) If QEMU exits with failure or crashes then we should
report the test as a failure, not a timeout
(2) If QEMU prints warnings or errors to stderr, we
should collect these in the main log file, not buried
under a rock with no pointers from the main log file
to that rock
(3) Ideally we should print at least a summary of that
failure to the terminal so the user can clearly see
"oh, this test failed because QEMU asserted or whatever"
without having to dig through all the log files

If the actual cause of the test failure had been
better reported to the user running the tests I would
have figured this out a lot faster.

thanks
-- PMM

Reply via email to