Commit 4be23939ab0d7019c7e59a37485b416fbbf0f073 makes ehci instantly zap any unlinked queue heads when the guest rings the dorbell.
While hacking up uas support this turned out to be a problem. The linux kernel can unlink and instantly relink the very same queue head, thereby killing any async packets in flight. That alone isn't an issue yet, the packet will canceled and resubmitted and everything is fine. We'll run into trouble though in case the async packet is completed already, so we can't cancel it any more. The transaction is simply lost then. usb_ehci_qh_ptrs q (nil) - QH @ 39c4f000: next 39c4f122 qtds 00000000,00000001,39c50000 usb_ehci_qh_fields QH @ 39c4f000 - rl 0, mplen 0, eps 0, ep 0, dev 0 usb_ehci_qh_ptrs q 0x7f95feba90a0 - QH @ 39c4f000: next 39c4f122 qtds 00000000,00000001,39c50000 usb_ehci_qh_fields QH @ 39c4f000 - rl 0, mplen 0, eps 0, ep 0, dev 0 usb_ehci_qh_ptrs q 0x7f95fe515210 - QH @ 39c4f120: next 39c4f0c2 qtds 29dbce40,29dbc4e0,00000009 usb_ehci_qh_fields QH @ 39c4f120 - rl 4, mplen 512, eps 2, ep 1, dev 2 usb_ehci_packet_action q 0x7f95fe515210 p 0x7f95fdec32a0: alloc usb_packet_state_change bus 0, port 2, ep 1, packet 0x7f95fdec32e0, state undef -> setup usb_ehci_packet_action q 0x7f95fe515210 p 0x7f95fdec32a0: process usb_uas_command dev 2, tag 0x2, lun 0, lun64 00000000-00000000 scsi_req_parsed target 0 lun 0 tag 2 command 42 dir 2 length 16384 scsi_req_parsed_lba target 0 lun 0 tag 2 command 42 lba 5933312 scsi_req_alloc target 0 lun 0 tag 2 scsi_req_continue target 0 lun 0 tag 2 scsi_req_data target 0 lun 0 tag 2 len 16384 usb_uas_scsi_data dev 2, tag 0x2, bytes 16384 usb_uas_write_ready dev 2, tag 0x2 usb_packet_state_change bus 0, port 2, ep 1, packet 0x7f95fdec32e0, state setup -> complete usb_ehci_packet_action q 0x7f95fe515210 p 0x7f95fdec32a0: free usb_ehci_qh_ptrs q 0x7f95fdec3210 - QH @ 39c4f0c0: next 39c4f002 qtds 29dbce40,00000001,00000009 usb_ehci_qh_fields QH @ 39c4f0c0 - rl 4, mplen 512, eps 2, ep 2, dev 2 usb_ehci_queue_action q 0x7f95fe5152a0: free usb_packet_state_change bus 0, port 2, ep 2, packet 0x7f95feba9170, state async -> complete ^^^ async packets completes. usb_ehci_packet_action q 0x7f95fdec3210 p 0x7f95feba9130: wakeup usb_ehci_qh_ptrs q (nil) - QH @ 39c4f000: next 39c4f122 qtds 00000000,00000001,39c50000 usb_ehci_qh_fields QH @ 39c4f000 - rl 0, mplen 0, eps 0, ep 0, dev 0 usb_ehci_qh_ptrs q 0x7f95feba90a0 - QH @ 39c4f000: next 39c4f122 qtds 00000000,00000001,39c50000 usb_ehci_qh_fields QH @ 39c4f000 - rl 0, mplen 0, eps 0, ep 0, dev 0 usb_ehci_qh_ptrs q 0x7f95fe515210 - QH @ 39c4f120: next 39c4f002 qtds 29dbc4e0,29dbc8a0,00000009 usb_ehci_qh_fields QH @ 39c4f120 - rl 4, mplen 512, eps 2, ep 1, dev 2 usb_ehci_queue_action q 0x7f95fdec3210: free usb_ehci_packet_action q 0x7f95fdec3210 p 0x7f95feba9130: free ^^^ endpoint #2 queue head removed from schedule, doorbell makes ehci zap the queue, the (completed) usb packet is freed too and gets lost. usb_ehci_qh_ptrs q (nil) - QH @ 39c4f000: next 39c4f0c2 qtds 00000000,00000001,39c50000 usb_ehci_qh_fields QH @ 39c4f000 - rl 0, mplen 0, eps 0, ep 0, dev 0 usb_ehci_qh_ptrs q 0x7f95feba90a0 - QH @ 39c4f000: next 39c4f0c2 qtds 00000000,00000001,39c50000 usb_ehci_qh_fields QH @ 39c4f000 - rl 0, mplen 0, eps 0, ep 0, dev 0 usb_ehci_queue_action q 0x7f9600dff570: alloc usb_ehci_qh_ptrs q 0x7f9600dff570 - QH @ 39c4f0c0: next 39c4f122 qtds 29dbce40,00000001,00000009 usb_ehci_qh_fields QH @ 39c4f0c0 - rl 4, mplen 512, eps 2, ep 2, dev 2 usb_ehci_packet_action q 0x7f9600dff570 p 0x7f95feba9130: alloc usb_packet_state_change bus 0, port 2, ep 2, packet 0x7f95feba9170, state undef -> setup usb_ehci_packet_action q 0x7f9600dff570 p 0x7f95feba9130: process usb_packet_state_change bus 0, port 2, ep 2, packet 0x7f95feba9170, state setup -> async usb_ehci_packet_action q 0x7f9600dff570 p 0x7f95feba9130: async ^^^ linux kernel relinked the queue head, ehci creates a new usb packet, but we should have delivered the completed one instead. usb_ehci_qh_ptrs q 0x7f95fe515210 - QH @ 39c4f120: next 39c4f002 qtds 29dbc4e0,29dbc8a0,00000009 usb_ehci_qh_fields QH @ 39c4f120 - rl 4, mplen 512, eps 2, ep 1, dev 2 Simply not zapping queue heads on doorbell rings fixes the issue, but of course re-introduces the risk of using cached but stale information. Cc: Hans de Goede <hdego...@redhat.com> Signed-off-by: Gerd Hoffmann <kra...@redhat.com> --- hw/usb/hcd-ehci.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 9fb6423..6310630 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -2227,8 +2227,10 @@ static void ehci_advance_async_state(EHCIState *ehci) * (section 4.8.2) */ if (ehci->usbcmd & USBCMD_IAAD) { +#if 0 /* Remove all unseen qhs from the async qhs queue */ ehci_queues_rip_unused(ehci, async, 1); +#endif DPRINTF("ASYNC: doorbell request acknowledged\n"); ehci->usbcmd &= ~USBCMD_IAAD; ehci_set_interrupt(ehci, USBSTS_IAA); -- 1.7.1