Well... the plot thickens. I have found the smoking gun. At some point during implementation of the STM USB stack must have run into the same problem with the communications choking during MSD init. And at the time (which involved a LOT of wireshark comparisons with a real USB drive on the host and on the DCW2 Rpi2 stack) I'd added the QEMU_PACKED directive to the usb_msd_csw struct.
Naturally, when the definitions were relocated to a header, that was lost when resolving the merge conflict because I wasn't paying attention. (D'oh!) So while it's working again... the question still remains why I had to make this tweak in the first place - so I dug further: I just re-compared with the MSD setup of a real drive on the PC. The status section of the MSD is only a single byte for the real drive but is 4 bytes in the packet received by the guest without the PACKED directive. Wireshark also cannot decode this "oversize" packet properly, which seems to further suggest that this may indeed be an actual bug in the MSD implementation - naturally it would only manifest on systems/compilers where the data alignment precipitates this problem. Further code inspection suggests this originates with the following line in usb-storage.c: > len = MIN(sizeof(s->csw), p->iov.size); specifically, if the iov.size is > than the MSD CSW. And that's where my knowledge of the QEMU USB system ends and I defer to those more knowledgeable. I'm guessing this might mean that things are fine if the receiving endpoint configures itself to exactly the expected CSW size - but breaks if the endpoint is configured with a larger receive buffer. At least for ARM kernel being run this isn't custom code and is part of the generic HAL and USB stack STM offers - so I have a hard time believing this is an unusual practice. I even went so far as to eliminate dev-storage from the picture entirely and use USB-host to connect a real physical USB drive with the simulated platform - and again, the wireshark capture reveals that the CSW packet size is a single byte as the "correct" behaviour regardless of how the device sets up its receive buffers. Thoughts? ~ VintagePC