Il 02/07/2014 16:00, Konrad Rzeszutek Wilk ha scritto:
With this long thread I lost a bit context about the challenges
that exists. But let me try summarizing it here - which will hopefully
get some consensus.
1). Fix IGD hardware to not use Southbridge magic addresses.
We can moan and moan but I doubt it is going to change.
There are two problems:
- Northbridge (i.e. MCH i.e. PCI host bridge) configuration space addresses
- Southbridge (i.e. PCH i.e. ISA bridge) vendor/device ID; some versions
of the driver identify it by class, some versions identify it by slot
(1f.0).
To solve the first, make a new machine type, PIIX4-based, and pass
through the registers you need. The patch must document _exactly_ why
the registers are safe to pass. If they are not reserved on PIIX4, the
patch must document what the same offsets mean on PIIX4, and why it's
sensible to assume that firmware for virtual machine will not read/write
them. Bonus point for also documenting the same for Q35.
Regarding the second, fixing IGD hardware to not rely on chipset magic
is a no-go, I agree. I disagree that it's a no-go to define a
"backdoor" that lets a hypervisor pass the right information to the
driver without hacking the chipset device model.
The hardware folks would have to give us a place for a pair of registers
(something like data/address), and a bit somewhere else that would be
always 0 on hardware and always 1 if the hypervisor is implementing the
pair of registers. This is similar to CPUID, which has the HYPERVISOR
bit + hypervisor-defined leaves at 0x40000000.
The data/address pair could be in a BAR, in configuration space, in the
low VGA ports at 0x3c0-0x3df, wherever. The hypervisor bit can be in
the same place or somewhere else---again, whatever is convenient for the
hardware folks. We just need *one bit* that is known-zero on all
hardware, and 8 bytes in a reserved area. I don't think it's too hard
to find this space, and I really, really would like Intel to follow up
on a paravirtualized backdoor.
That said, we have the problem of existing guests, so I agree something
else is needed.
a) Two bridges - one 'passthrough' and the legacy ISA bridge
that QEMU emulates. Both Linux and Windows are OK with
two bridges (even thought it is pretty weird).
This is pretty much the only solution for existing Linux guests that
look up the southbridge by class.
The proposed solution here is to define a new "pci stub" device in QEMU
that lets you define a do-nothing device with your desired vendor ID,
device ID, class and optionally subsystem IDs.
The new machine type (the one that instantiates the special
IGD-passthrough-enabled northbridge) can then instantiate this stub
device at 1f.0 with the desired vendor ID, device ID and class ID.
If we cannot get the paravirtualized backdoor, it would also make sense to:
- have drivers standardize on a single way to probe the southbridge
- make this be neither by class (because the firmware wants to
distinguish the actual ISA bridge from the stub, and it can do so by
looking up the class), nor by slot (because this conflicts with the Q35
chipset model that has the southbridge at 1f.0).
mst's proposal was to probe by subsystem id. I'm not sure I understood
the details exactly, but I trust him. :) However, in case it wasn't
clear I think a paravirtualized backdoor would still be better.
b) One bridge - the one that QEMU emulates - and lets emulate
more of the registers (by emulate - I mean for some get the
data from the real hardware).
b1). We can't use the legacy because the registers are
above 256 (is that correct? Did I miss something?)
As I understand it, mst brought up Q35 because the northbridge
configuration space layout might be more similar to what the driver
expects than for PIIX4. But I don't think anyone really said whether
this is true or false.
I think Q35 is absolutely not a requirement for IGD passthrough,
especially until this statement is either proved or disproved.
4). Code does a bit of sysfs that could use some refacturing with
the KVM code.
Problem: More time needed to do the code restructing.
FWIW, I don't really care about code sharing with KVM. That's a
separate problem and it's not necessary to bring it up and make waters
even more muddy.
Paolo