On Tue, Jul 5, 2022 at 7:59 PM Jan Beulich <jbeul...@suse.com> wrote: > Nothing useful in there. Yet independent of that I guess we need to > separate the issues you're seeing. Otherwise it'll be impossible to > know what piece of data belongs where. Yep, I think I'm seeing several different issues here: 1. The FLR related DPC / AER message seen on the 1st attempt only when pciback tries to seize and release the SN570 - Later-on pciback operations appear just fine. 2. MSI-X preparation failure message that shows up each time the SN570 is seized by pciback or when it's passed to domU. 3. XEN tries to map BAR from two devices to the same page 4. The "write-back to unknown field" message in QEMU log that goes away with permissive=1 passthrough config. 5. The "irq 16: nobody cared" message shows up *sometimes* in a pattern that I haven't figured out (See attached) 6. The FreeBSD domU sees the device but fails to use it because low level commands sent to it are aborted. 7. The device does not return to the pci-assignable-list when the domU it was assigned shuts-down. (See attached)
#3 appears to be a known issue that could be worked around with patches from the list. I suspect #1 may have something to do with the device itself. It's still not clear if it's deadly or just annoying. I was able to update the firmware to the latest version and confirmed that the new firmware didn't make any noticeable difference. I suspect issue #2, #4, #5, #6, #7 may be related, and the pass-through was not completely successful... Should I expect a debug build of XEN hypervisor to give better diagnose messages, without the debug patch that Roger mentioned? Thanks, Rui
[59213.312849] xenbr0: port 3(vif3.0) entered disabled state //domU shutdown sequence start from here [59215.247393] pciback 0000:05:00.0: xen_pciback: removing [59215.247395] pciback 0000:05:00.0: xen_pciback: found device to remove [59215.247396] pciback 0000:05:00.0: xen_pciback: pcistub_device_release [59215.352893] pciback 0000:05:00.0: xen_pciback: MSI-X release failed (-16) [59215.353199] xen_pciback: removed 0000:05:00.0 from seize list [59216.474139] pciback 0000:05:00.0: xen_pciback: probing... [59728.150053] xen_pciback: wants to seize 0000:05:00.0 //manual xl pci-assignable-add 05:00.0 [59728.150074] pciback 0000:05:00.0: xen_pciback: probing... [59728.150075] pciback 0000:05:00.0: xen_pciback: seizing device [59728.150076] pciback 0000:05:00.0: xen_pciback: pcistub_device_alloc [59728.150076] pciback 0000:05:00.0: xen_pciback: initializing... [59728.150077] pciback 0000:05:00.0: xen_pciback: initializing config [59728.150165] pciback 0000:05:00.0: xen_pciback: enabling device [59728.150247] xen: registering gsi 16 triggering 0 polarity 1 [59728.150250] Already setup the GSI :16 [59728.150293] pciback 0000:05:00.0: xen_pciback: MSI-X preparation failed (-6) [59728.150582] pciback 0000:05:00.0: xen_pciback: save state of device [59728.150731] pciback 0000:05:00.0: xen_pciback: resetting (FLR, D3, etc) the device [59728.257558] pciback 0000:05:00.0: xen_pciback: reset device
[ 3742.440487] irq 16: nobody cared (try booting with the "irqpoll" option) [ 3742.440516] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.120.gaia.78.xenpcibackdbg #4 [ 3742.440516] Hardware name: Gigabyte Technology Co., Ltd. C246N-WU2/C246N-WU2-CF, BIOS F1 10/02/2019 [ 3742.440517] Call Trace: [ 3742.440518] <IRQ> [ 3742.440522] dump_stack+0x6b/0x83 [ 3742.440524] __report_bad_irq+0x30/0xa2 [ 3742.440525] note_interrupt.cold+0xb/0x61 [ 3742.440527] handle_irq_event+0x9f/0xb0 [ 3742.440528] handle_fasteoi_irq+0x73/0x1c0 [ 3742.440529] generic_handle_irq+0x42/0x50 [ 3742.440531] __evtchn_fifo_handle_events+0x155/0x170 [ 3742.440533] __xen_evtchn_do_upcall+0x61/0xa0 [ 3742.440535] __xen_pv_evtchn_do_upcall+0x11/0x20 [ 3742.440536] asm_call_irq_on_stack+0x12/0x20 [ 3742.440537] </IRQ> [ 3742.440538] xen_pv_evtchn_do_upcall+0xa2/0xc0 [ 3742.440539] exc_xen_hypervisor_callback+0x8/0x10 [ 3742.440540] RIP: e030:xen_hypercall_sched_op+0xa/0x20 [ 3742.440542] Code: 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [ 3742.440542] RSP: e02b:ffffffff82403de0 EFLAGS: 00000246 [ 3742.440543] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff810023aa [ 3742.440544] RDX: 0000000002d1a31a RSI: 0000000000000000 RDI: 0000000000000001 [ 3742.440544] RBP: ffffffff82415940 R08: 00000066a173b5fc R09: 000003676ebf842f [ 3742.440545] R10: 00000000000340ee R11: 0000000000000246 R12: 0000000000000000 [ 3742.440545] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 3742.440546] ? xen_hypercall_sched_op+0xa/0x20 [ 3742.440548] ? xen_safe_halt+0xc/0x20 [ 3742.440549] ? default_idle+0x5/0x10 [ 3742.440550] ? default_idle_call+0x33/0xc0 [ 3742.440551] ? do_idle+0x1e9/0x260 [ 3742.440553] ? cpu_startup_entry+0x14/0x20 [ 3742.440555] ? start_kernel+0x503/0x526 [ 3742.440556] ? xen_start_kernel+0x60f/0x61b [ 3742.440556] ? startup_xen+0x3e/0x3e [ 3742.440557] handlers: [ 3742.440570] [<000000008e20908e>] i801_isr [i2c_i801] [ 3742.440585] Disabling IRQ #16