On 12/14/2017 06:29 AM, Denis V. Lunev wrote: >> If this has been broken since 2.9, 2.11-rc3 is too late for a bandaid >> applied to something I can't diagnose. Let's discuss this for 2.12 and I >> will keep trying to figure out what the root cause is. > I have read the entire letter in 2 subsequent attempts, but > unfortunately I can not say much more additionally :( >
No problem, sometimes I don't understand myself. And the IDE code isn't exactly the nicest stuff to read. If I was smart enough I'd refactor the whole thing, but without breaking migration it's a little hard :( >> Some questions for you: >> >> (1) Is the guest Linux? Do we know why this one machine might be >> tripping up QEMU? (Is it running a fuzzer, a weird OS, etc...?) > This is running by the end-user by our customer and we do not have > access to that machine and customer. This is anonymized crash report > from the node. This is not a single crash. We observe 1-2 reports with > this crash in a day. > Yikes. Is this still on a 2.9-based VM, or have you upgraded to 2.10 or 2.11 at this point? (From memory this was a problem with a 2.9 based machine) >> (2) Does the VM actually have a CDROM inserted at some point? Is it >> possible we're racing on some kind of eject or graph manipulation failure? > unclear but IMHO probable. > If they're using a 2.10+ based VM, could you look at some trace points? either: trace_ide_atapi_cmd (just scsi byte 0), or trace_ide_atapi_cmd_packet (the entire scsi cdb) and trace_ide_exec_cmd the actual command bytes never get saved in the state struct, so it's hard to tell from traces what commands were being processed, but these traces help. >> (3) Is this using AHCI or IDE? > IDE. This is known 120%. We do not provide ability to enable AHCI > without manual tweaking. > At least that helps narrow down the path... >> If I can't figure it out within a week or so from here I'll just check >> in the band-aid with some /* FIXME */ comments attached. > No prob. We are going to ship my band-aid and see to report statistics. > > Thank you in advance, > Den I'll stage the band-aid with some FIXME comments, and maybe some scary error_report prints with some information in them. I'll send it to the list.