On Mon, Mar 17, 2014 at 03:28:08PM +0100, Laszlo Ersek wrote: > On 03/17/14 07:02, Dave Airlie wrote: > > So I'm looking at how best to do virtio gpu device error reporting, > > and how to deal with illegal stuff, > > > > I've two levels of errors I want to support, > > > > a) unrecoverable or bad guest kernel programming errors, > > > > b) per 3D context errors from the renderer backend, > > > > (b) I can easily report in an event queue and the guest kernel can in > > theory blow away the offenders, this is how GL works with some > > extensions, > > > > For (a) I can expect a response from every command I put into the main > > GPU control queue, the response should always be no error, but in some > > cases it will be because the guest hit some host resource error, or > > asked for something insane, (guest kernel drivers would be broken in > > most of these cases). > > > > Alternately I can use the separate event queue to send async errors > > when the guest does something bad, > > > > I'm also considering adding some sort of flag in config space saying > > the device needs a reset before it will continue doing anything, > > > > The main reason I'm considering this stuff is for security reasons if > > the guest asks for something really illegal or crazy what should the > > expected behaviour of the host be? (at least secure I know that). > > exit(1). > > If you grep qemu for it, you'll find such examples. Notably, > "hw/virtio/virtio.c" is chock full of them; if the guest doesn't speak > the basic protocol, there's nothing for the host to do. See also > virtio-blk.c (missing or incorrect headers), virtio-net.c (similar > protocol violations), virtio-scsi.c (wrong header size, bad config etc). > > For later, we have a use case on the horizon where all such exits -- not > just virtio, but exit(1) or abort() on invalid guest behavior in general > -- should be optionally coupled (dependent on the qemu command line) > with an automatic dump-guest-memory, in order to help debugging the guest.
Please don't use exit(1). Instead you can put the device into a "broken" state and wait for the guest to reset it. exit(1) is nasty because a failure in one driver shouldn't bring down the entire VM if we can prevent it. Also, it's a denial-of-service if we ever allow virtio passthrough to nested guests - a nested guest could kill its parent and hence all other nested guests. Stefan