On 04/06/2020 08:08, Jan Beulich wrote: > On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote: >> Then, we get the main issue: >> >> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >> (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff >> (XEN) domain_crash called from io.c:178 >> >> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom >> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already. > I'd guess an issue with the shutdown deferral logic. Did you / can > you check whether XEN_DMOP_remote_shutdown managed to pause all > CPUs (I assume it didn't, since once they're paused there shouldn't > be any I/O there anymore, and hence no I/O emulation)?
The vcpu in question is talking to Qemu, so will have v->defer_shutdown intermittently set, and skip the pause in domain_shutdown() I presume this lack of pause is to allow the vcpu in question to still be scheduled to consume the IOREQ reply? (Its fairly opaque logic with 0 clarifying details). What *should* happen is that, after consuming the reply, the vcpu should notice and pause itself, at which point it would yield to the scheduler. This is the purpose of vcpu_{start,end}_shutdown_deferral(). Evidentially, this is not happening. Marek: can you add a BUG() after the weird PIO printing? That should confirm whether we're getting into handle_pio() via the handle_hvm_io_completion() path, or via the vmexit path (at which case, we're fully re-entering the guest). I suspect you can drop the debugging of XEN_DOMCTL_destroydomain - I think its just noise atm. However, it would be very helpful to see the vcpus which fall into domain_shutdown()'s "else if ( v->defer_shutdown ) continue;" path. > Another question though: In 4.13 the log message next to the > domain_crash() I assume you're hitting is "Weird HVM ioemulation > status", not "Weird PIO status", and the debugging patch you > referenced doesn't have any change there. Andrew's recent > change to master, otoh, doesn't use the word "weird" anymore. I > can therefore only guess that the value logged is still > hvmemul_do_pio_buffer()'s return value, i.e. X86EMUL_UNHANDLEABLE. > Please confirm. It's the first draft of the patch which I did, before submitting to xen-devel. We do have X86EMUL_UNHANDLEABLE at this point, but its not terribly helpful - there are loads of paths which fail silently with this error. ~Andrew