So it looks like the problem is:
- in one thread we call vcpu_set_state_locked() [from a VM_MAP_PPTDEV_MMIO call
from userspace]
-- both the new and old states are VCPU_FROZEN
-- the threads enters a loop while vcpu->state != VCPU_IDLE
-- it gets stuck here forever since nothing will ever change the state to
VCPU_IDLE
-- apparently this is to stop two ioctl()s acting on the same vCPU
simultaneously, but I don't see any other ioctl against the vCPU in kgdb.
- in all the other threads, we sit in vm_handle_rendezvous()
-- these threads are waiting for the rendezvous to complete
-- every vCPU has completed the rendezvous except for the one stuck in
vcpu_set_state_locked()
I see a lot of commits in -CURRENT since my cut of -STABLE, but nothing that
looks too relevant. I'll try against CURRENT next.
— RHC.
--- Original Message ---
On Tuesday, January 3rd, 2023 at 23:54, Robert Crowston
wrote:
> Still investigating this. AMD 1700, FreeBSD 13.1 stable@3dd6497894. VM is
> Windows 11 22H2.
>
> It happens on the setup disk -- at the TianoCore logo, before the "ring" has
> finished its first rotation -- so very early in the boot process. It's
> eventually happened for every Win 11 install I have made. Removing the
> passthrough devices and installing Windows, then re-adding the devices, a
> fresh install will boot with the passthrough devices a few times, but then
> shows the same hang behaviour forever after. Windows Boot Repair also hangs.
> On the host, bhyvectl --destroy hangs. gdb cannot stop bhyve and just hangs
> as well. None of these hangs show any CPU use. kldunload vmm crashes the host
> with a page fault. Only a reboot of the host will kill the guest.
>
> Setting the guest cpu count to 1, or removing all the passthrough devices
> allows Windows 11 to boot. The same behaviour happens for two different USB
> controllers I have and two different GPUs. The same bhyve configurations
> reliably boot Windows Server 2022 and Windows 10 with passthrough working.
>
> Debugging in userspace, I can see that Windows 11 does PCI enumeration in
> parallel across multiple cores, and sometimes during boot one vCPU writes a
> PCI config register at approximately the same time as another vCPU reads that
> exact register. The hang seems to be aligned with this synchronized
> write/read. Also, I can sometimes boot successfully under gdb when single
> stepping PCI cfg register writes, but it's difficult to be sure because my
> debugging is probably disturbing the timing. I looked at the bhyve code and I
> don't see what here could be racing in user space. In any event, it's a
> kernel-side bug.
>
> Spinning up the kernel debugger, what I always see is:
> 1. 1 bhyve thread in vioapic_mmio_write() -> ... -> vm_handle_rendezvous() ->
> _sleep()
>
> 2. 1 bhyve thread in vcpu_lock_one() -> ... -> vcpu_set_state_locked() ->
> msleep_spin_sbt()
>
> 3. All remaining bhyve threads, if any, in vm_run() -> vm_handle_rendezvous()
> -> _sleep().
>
>
> Example backtrace attached.
>
> So it looks like we have some kind of a deadlock between vcpu_lock_one() and
> vioapci_mmio_write()? Anyone seen anything like it?
>
> — RHC.