On Mar 20, 2014, at 8:51 AM, Michael S. Tsirkin <m...@redhat.com> wrote:

> On Wed, Mar 19, 2014 at 11:04:19AM +1030, Rusty Russell wrote:
>> Dave Airlie <airl...@gmail.com> writes:
>>> So I'm looking at how best to do virtio gpu device error reporting,
>>> and how to deal with illegal stuff,
>>> 
>>> I've two levels of errors I want to support,
>>> 
>>> a) unrecoverable or bad guest kernel programming errors,
>> 
>> The QEMU standard approach is to exit at this point.  No, really.
> 
> It's easy on the hypervisor but often not very friendly for driver writers
> who might not be qemu experts.
> QEMU's moving away from exiting on errors and it would be nice
> to have a robust way to report driver bugs.
> How about setting VIRTIO_CONFIG_S_DEVICE_FAILED ?
> 
> Another idea that windows driver implemented is reporting
> failure reason hint. They wrote it out to ISR, specifically
> they notified host about watchdog timer expiration for net device
> in this way.

I removed it for now and really would like to have an official way to bring it 
back.

Also going back to the original question - Windows can handle graphic cards HW 
errors by reloading the driver and reseting the device (stating from Vista).

> 
>>> b) per 3D context errors from the renderer backend,
>>> 
>>> (b) I can easily report in an event queue and the guest kernel can in
>>> theory blow away the offenders, this is how GL works with some
>>> extensions,
>> 
>> That's probably sanest.
> 
> If it's possible to identify the offenders, I agree
> a VQ is better than config space for that.
> Need to make sure the queue is big enough to avoid
> underrun of that queue though. Is that always possible?
> 
>>> GPU control queue, the response should always be no error, but in some
>>> cases it will be because the guest hit some host resource error, or
>>> asked for something insane, (guest kernel drivers would be broken in
>>> most of these cases).
>>> 
>>> Alternately I can use the separate event queue to send async errors
>>> when the guest does something bad,
>>> 
>>> I'm also considering adding some sort of flag in config space saying
>>> the device needs a reset before it will continue doing anything,
>> 
>> I generally dislike error codes which Never Happen; it's like making
>> every void function return int just in case: the caller has no idea what
>> to do if it fails.
>> 
>> The litmus test: does *your* guest handle failures other than by giving
>> up on the device?  If so, sure, you need to have a sane error-reporting
>> strategy.
> 
> Right but driver development is also a valid need.
> 
>>> The main reason I'm considering this stuff is for security reasons if
>>> the guest asks for something really illegal or crazy what should the
>>> expected behaviour of the host be? (at least secure I know that).
>> 
>> If the guest userspace can do it, don't exit.  If the kernel only, and
>> it's should have known better, abort is OK.
> 
> I second that, at least for now.
> Maybe we will add more capabilities in virtio 1.0, or
> after that.
> 
>> Sure that doesn't help much!
>> Rusty.

Reply via email to