We've been seeing a lot of the qemuppc shutdown issue and I decided to look into it. The really worrying thing looking at the logs locally is the serial ports are showing irq issues and becoming disabled as nobody would handle them. Errors like:
[ 9.194886] irq 36: nobody cared (try booting with the "irqpoll" option) [ 9.198712] CPU: 0 PID: 127 Comm: bootlogd Not tainted 5.10.30-yocto-standard #1 [ 9.202283] Call Trace: [ 9.205611] [d1005f00] [c00a0da8] __report_bad_irq+0x50/0x138 (unreliable) [ 9.209347] [d1005f30] [c00a0cc0] note_interrupt+0x324/0x378 [ 9.212855] [d1005f70] [c009d138] handle_irq_event+0xe8/0x104 [ 9.216353] [d1005fa0] [c00a1d9c] handle_fasteoi_irq+0xc0/0x29c [ 9.219960] [d1005fc0] [c009b798] generic_handle_irq+0x40/0x5c [ 9.223496] [d1005fd0] [c00075d0] __do_irq+0x58/0x188 [ 9.226948] [d1005ff0] [c0010040] call_do_irq+0x20/0x38 [ 9.230391] [d29eda60] [c0007788] do_IRQ+0x88/0xfc [ 9.233860] [d29eda90] [c0016454] ret_from_except+0x0/0x14 [ 9.237288] --- interrupt: 501 at __setup_irq+0x3c4/0x838 [ 9.237288] LR = __setup_irq+0x790/0x838 [ 9.244155] [d29edb88] [c009f0a4] request_threaded_irq+0x114/0x1c8 [ 9.247672] [d29edbb8] [c07a5a18] pmz_startup+0x17c/0x32c [ 9.251203] [d29edbd8] [c07a1140] uart_port_startup+0x184/0x2f8 [ 9.254651] [d29edc08] [c07a1974] uart_port_activate+0x78/0xf4 [ 9.258141] [d29edc28] [c07839f8] tty_port_open+0xd4/0x170 [ 9.261579] [d29edc58] [c079db74] uart_open+0x2c/0x48 [ 9.265116] [d29edc68] [c077a288] tty_open+0x168/0x640 [ 9.268574] [d29edcd8] [c0280be8] chrdev_open+0x138/0x2a4 [ 9.272123] [d29edd18] [c027421c] do_dentry_open+0x228/0x410 [ 9.275643] [d29edd48] [c028e9f4] path_openat+0xb04/0xf28 [ 9.279184] [d29eddd8] [c02917e4] do_filp_open+0x120/0x164 [ 9.282535] [d29ede98] [c0276238] do_sys_openat2+0xd8/0x19c [ 9.285790] [d29edee8] [c0276574] sys_openat+0x88/0xdc [ 9.289096] [d29edf38] [c00160d8] ret_from_syscall+0x0/0x34 [ 9.292620] --- interrupt: c01 at 0xfec3738 [ 9.292620] LR = 0xfec36e0 [ 9.299035] handlers: [ 9.302312] [<7f7f7da8>] pmz_interrupt [ 9.305541] Disabling IRQ #36 (and the irqpoll option does not help) This is problematic as the shutdown test uses the serial interface to shut down the system. If the serial interface fails to login or run the command, game over for the test. Looking at the defconfig, the serial stuff is handled through CONFIG_SERIAL_PMACZILOG and the pmac_zliog.c driver. There is an interesting section of code in there for CONFIG_SERIAL_PMACZILOG_CONSOLE which had me asking "really?" about what it does with interrupts. Chucking: # CONFIG_SERIAL_PMACZILOG_CONSOLE is not set into a .cfg for the kernel meant the above traces went away. Whether than was due to the console logging changing or it really fixed some kind of race, hard to say. My instinct says we should disable the above and test for a bit, see if the shutdown issues don't happen so much. Bruce: Any thoughts on that? I suspect there is also some qmp related serial console race as ever since we added the qmp code, the shutdown timeout issues spread to other arches. Cheers, Richard
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#151329): https://lists.openembedded.org/g/openembedded-core/message/151329 Mute This Topic: https://lists.openembedded.org/mt/82609331/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-