On 28.05.14 23:54, Benjamin Herrenschmidt wrote:
On Wed, 2014-05-28 at 13:24 +0200, Alexander Graf wrote:
int (*eeh_handler)(PCIDevice *pdev, int opcode, int option);
If we need a PCI device level callback, yes.
I think a PCI device level callback is preferable regardless of whether
the current VFIO creates a PHB for a group/domain/PE or not. Emulated
devices might do differently etc..
So from the layering the way I grasped EEH works is that it basically
operates on a mystical PE level. We basically have
PHB =n=> PE =m=> Devices
One PE can contain VFIO devices that all share the same "container" and
emulated devices.
What I think we need to do for emulated EEH is to implement it on a PE
level. What EEH really does is that it unlinks a device's access to
memory and it disallows for config space access.
Both of these operations really occur on the PE / PHB level. So we could
for example have a bit in the PE that says "you're broken". When that
bit is set, config space reads return -1 and writes become nops. When we
enable that bit, we disable all IOMMU translations as well, so devices
behind that PE become DMA incapable as well.
For the VFIO container, we have to notify it when something like this
happens. So when we set the "you're broken" bit in the virtual PE, we
have to notify the container to also break any real PE that VFIO devices
behind that virtual PE are on. The same goes for recovery operations and
so on.
Today, we don't really model multiple PEs, but assume that there's one
PE per emulated PHB. So the model becomes easier in the sense that we
can ignore the PHB -> PE abstraction and just maintain the code and
state that the PE entity would have inside the PHB device.
Later, if we model it nicely now, it should be trivial to rip apart the
PHB and PE entities and make them separate.
I hope that makes sense so far, and I hope we're all on the same page as
to directions here.
Alex