On Wed, 17 Nov 2010, Isaku Yamahata wrote: > > Because of such it seems like the only way to maintain consistency between > > the assigned device and it's corresponding driver is to perform the error > > detection/recovery phase in lockstep with the host? > > Maybe. At least at the first implementation, I suppose. > Then we would learn from its experience, then move on to next generation > implementation. > > To be honest, what I have in my mind very vaguely is > - something like pcie aer fd driver. > or enhancement to vfio > qemu polls the fd.
I'm currently working on a pcie aer driver. Few weeks ago I sent some rfc patches. I'm about to send another version. It's basically a simple UIO based pci-stub driver for AER and PM. Notification goes through eventfd and error code / error result are mmap directly over a 'logical' BAR. Qemu consume the eventfd or it goes directly to the guest with irqfd. > - error recovery in host will be directed by qemu > in concert with guest recovery action. To my view, this is the tricky part. Error recovery can be directed by qemu indeed but how do you get the information about the guest recovery action for every error callback? I think that because aer handling effectively 'merge' callback return code from multiple source it's hard to discriminate what value should be given back to the host for the corresponding assigned device (at least from the qemu side) > For latency necessary information would be shared by > qemu and host kernel, so that the aer driver in host kernel > could take responsibility to eliminate the latency caused by > qemu process. I'm sorry but I'm not sure to follow here. Can you elaborate more on this topic? > > I suppose there is no single right way for recovery action > in host/guest. So there should be room for recovery policies. Yes I agree. There is already a policy argument part of the uio pci_stub driver that I'm working on. -Etienne