Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

Laszlo Ersek Tue, 04 Oct 2016 09:46:02 -0700

On 10/04/16 18:10, Laine Stump wrote:
> On 10/04/2016 11:40 AM, Laszlo Ersek wrote:
>> On 10/04/16 16:59, Daniel P. Berrange wrote:
>>> On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:
>>>> On 09/01/16 15:22, Marcel Apfelbaum wrote:
>>>>> +2.3 PCI only hierarchy
>>>>> +======================
>>>>> +Legacy PCI devices can be plugged into pcie.0 as Integrated
>>>>> Devices or
>>>>> +into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI
>>>>> bridges
>>>>> +and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
>>>>> +only into pcie.0 bus.
>>>>> +
>>>>> +   pcie.0 bus
>>>>> +   ----------------------------------------------
>>>>> +        |                            |
>>>>> +   -----------               ------------------
>>>>> +   | PCI Dev |               | DMI-PCI BRIDGE |
>>>>> +   ----------                ------------------
>>>>> +                               |            |
>>>>> +                        -----------    ------------------
>>>>> +                        | PCI Dev |    | PCI-PCI Bridge |
>>>>> +                        -----------    ------------------
>>>>> +                                         |           |
>>>>> +                                  -----------     -----------
>>>>> +                                  | PCI Dev |     | PCI Dev |
>>>>> +                                  -----------     -----------
>>>>
>>>> Works for me, but I would again elaborate a little bit on keeping the
>>>> hierarchy flat.
>>>>
>>>> First, in order to preserve compatibility with libvirt's current
>>>> behavior, let's not plug a PCI device directly in to the DMI-PCI
>>>> bridge,
>>>> even if that's possible otherwise. Let's just say
>>>>
>>>> - there should be at most one DMI-PCI bridge (if a legacy PCI hierarchy
>>>> is required),
>>>
>>> Why do you suggest this ? If the guest has multiple NUMA nodes
>>> and you're creating a PXB for each NUMA node, then it looks valid
>>> to want to have a DMI-PCI bridge attached to each PXB, so you can
>>> have legacy PCI devices on each NUMA node, instead of putting them
>>> all on the PCI bridge without NUMA affinity.
>>
>> You are right. I meant the above within one PCI Express root bus.
>>
>> Small correction to your wording though: you don't want to attach the
>> DMI-PCI bridge to the PXB device, but to the extra root bus provided by
>> the PXB.
> 
> This made me realize something - the root bus on a pxb-pcie controller
> has a single slot and that slot can accept either a pcie-root-port
> (ioh3420) or a dmi-to-pci-bridge. If you want to have both express and
> legacy PCI devices on the same NUMA node, then you would either need to
> create one pxb-pcie for the pcie-root-port and another for the
> dmi-to-pci-bridge, or you would need to put the pcie-root-port and
> dmi-to-pci-bridge onto different functions of the single slot. Should
> the latter work properly?


Yes, I expect so. (Famous last words? :))

> 
> 
>>
>>>
>>>> - only PCI-PCI bridges should be plugged into the DMI-PCI bridge,
>>>
>>> What's the rational for that, as opposed to plugging devices directly
>>> into the DMI-PCI bridge which seems to work ?
>>
>> The rationale is that libvirt used to do it like this.
> 
> 
> Nah, that's just the *result* of the rationale that we wanted the
> devices to be hotpluggable. At some later date we learned the hotplug on
> a pci-bridge device doesn't work on a Q35 machine anyway, so it was kind
> of pointless (but we still do it because we hold out hope that hotplug
> of legacy PCI devices into a pci-bridge on Q35 machines will work one day)
> 
> 
>> And the rationale
>> for *that* is that DMI-PCI bridges cannot accept hotplugged devices,
>> while PCI-PCI bridges can.
>>
>> Technically nothing forbids (AFAICT) cold-plugging PCI devices into
>> DMI-PCI bridges, but this document is expressly not just about technical
>> constraints -- it's a policy document. We want to simplify / trim the
>> supported PCI and PCI Express hierarchies as much as possible.
>>
>> All valid *high-level* topology goals should be permitted / covered one
>> way or another by this document, but in as few ways as possible --
>> hopefully only one way. For example, if you read the rest of the thread,
>> flat hierarchies are preferred to deeply nested hierarchies, because
>> flat ones save on bus numbers
> 
> Do they?

Yes. Nesting implies bridges, and bridges take up bus numbers. For
example, in a PCI Express switch, the upstream port of the switch
consumes a bus number, with no practical usefulness.

IIRC we collectively devised a flat pattern elsewhere in the thread
where you could exhaust the 0..255 bus number space such that almost
every bridge (= taking up a bus number) would also be capable of
accepting a hot-plugged or cold-plugged PCI Express device. That is,
practically no wasted bus numbers.

Hm.... search this message for "population algorithm":

https://www.mail-archive.com/qemu-devel@nongnu.org/msg394730.html

and then Gerd's big improvement / simplification on it, with multifunction:

https://www.mail-archive.com/qemu-devel@nongnu.org/msg395437.html

In Gerd's scheme, you'd only need only one or two (I'm lazy to count
exactly :)) PCI Express switches, to exhaust all bus numbers. Minimal
waste due to upstream ports.

Thanks
Laszlo

>> , are easier to setup and understand,
>> probably perform better, and don't lose any generality for cold- or
>> hotplug.
>>
>> Thanks
>> Laszlo
>>
>

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

Reply via email to