On Tue, Nov 16, 2010 at 09:31:28PM -0800, Etienne Martineau wrote: > > On Wed, 17 Nov 2010, Isaku Yamahata wrote: > > > > Because of such it seems like the only way to maintain consistency > > > between > > > the assigned device and it's corresponding driver is to perform the error > > > detection/recovery phase in lockstep with the host? > > > > Maybe. At least at the first implementation, I suppose. > > Then we would learn from its experience, then move on to next generation > > implementation. > > > > To be honest, what I have in my mind very vaguely is > > - something like pcie aer fd driver. > > or enhancement to vfio > > qemu polls the fd. > > I'm currently working on a pcie aer driver. Few weeks ago I sent some rfc > patches. I'm about to send another version. > > It's basically a simple UIO based pci-stub driver for AER and PM. > Notification goes through eventfd and error code / error result are mmap > directly over a 'logical' BAR. Qemu consume the eventfd or it goes > directly to the guest with irqfd.
You mean '[RFC PATCH] kvm: BSimple stub driver with AER capabilities'. Yes, that is exactly what I've thought. Can you please add me to CC? > > - error recovery in host will be directed by qemu > > in concert with guest recovery action. > > To my view, this is the tricky part. Error recovery can be directed by > qemu indeed but how do you get the information about the guest recovery > action for every error callback? > > I think that because aer handling effectively 'merge' callback return code > from multiple source it's hard to discriminate what value should be given > back to the host for the corresponding assigned device (at least from the > qemu side) I don't have any clear idea yet. I've just figured it would be tricky like you. Maybe listing what kind of aer recovery we want would help, I suppose. > > For latency necessary information would be shared by > > qemu and host kernel, so that the aer driver in host kernel > > could take responsibility to eliminate the latency caused by > > qemu process. > I'm sorry but I'm not sure to follow here. Can you elaborate more on this > topic? I mean that we might want to move the code in qemu aer recovery into kvm host kernel for latency. something like in-kernel pic emulator. > > I suppose there is no single right way for recovery action > > in host/guest. So there should be room for recovery policies. > > Yes I agree. There is already a policy argument part of the uio > pci_stub driver that I'm working on. Great. -- yamahata