Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device

Michael S. Tsirkin Sun, 21 Nov 2010 03:54:56 -0800

On Sun, Nov 21, 2010 at 12:19:03PM +0200, Gleb Natapov wrote:
> On Sun, Nov 21, 2010 at 11:50:18AM +0200, Michael S. Tsirkin wrote:
> > On Sun, Nov 21, 2010 at 10:32:11AM +0200, Gleb Natapov wrote:
> > > On Sat, Nov 20, 2010 at 10:17:09PM +0200, Michael S. Tsirkin wrote:
> > > > On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote:
> > > > > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote:
> > > > > > "Michael S. Tsirkin" <m...@redhat.com> writes:
> > > > > > 
> > > > > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote:
> > > > > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin 
> > > > > > >> wrote:
> > > > > > >> > Replace bus number with slot numbers of parent bridges up to 
> > > > > > >> > the root.
> > > > > > >> > This works for root bridge in a compatible way because bus 
> > > > > > >> > number there
> > > > > > >> > is hard-coded to 0.
> > > > > > >> > IMO nested bridges are broken anyway, no way to be compatible 
> > > > > > >> > there.
> > > > > > >> > 
> > > > > > >> > 
> > > > > > >> > Gleb, Markus, I think the following should be sufficient for 
> > > > > > >> > PCI.  What
> > > > > > >> > do you think?  Also - do we need to update QMP/monitor to 
> > > > > > >> > teach them to
> > > > > > >> > work with these paths?
> > > > > > >> > 
> > > > > > >> > This is on top of Alex's patch, completely untested.
> > > > > > >> > 
> > > > > > >> > 
> > > > > > >> > pci: fix device path for devices behind nested bridges
> > > > > > >> > 
> > > > > > >> > We were using bus number in the device path, which is clearly
> > > > > > >> > broken as this number is guest-assigned for all devices
> > > > > > >> > except the root.
> > > > > > >> > 
> > > > > > >> > Fix by using hierarchical list of slots, walking the path
> > > > > > >> > from root down to device, instead. Add :00 as bus number
> > > > > > >> > so that if there are no nested bridges, this is compatible
> > > > > > >> > with what we have now.
> > > > > > >> 
> > > > > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
> > > > > > >> because pci-to-pci bridge is pci function.
> > > > > > >> So the format should be
> > > > > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function
> > > > > > >> 
> > > > > > >> thanks,
> > > > > > >
> > > > > > > Hmm, interesting. If we do this we aren't backwards compatible
> > > > > > > though, so maybe we could try using openfirmware paths, just as 
> > > > > > > well.
> > > > > > 
> > > > > > Whatever we do, we need to make it work for all (qdevified) devices 
> > > > > > and
> > > > > > buses.
> > > > > > 
> > > > > > It should also be possible to use canonical addressing with 
> > > > > > device_add &
> > > > > > friends.  I.e. permit naming a device by (a unique abbreviation of) 
> > > > > > its
> > > > > > canonical address in addition to naming it by its user-defined ID.  
> > > > > > For
> > > > > > instance, something like
> > > > > > 
> > > > > >    device_del /pci/@1,1
> > > > > > 
> > > > > FWIW openbios allows this kind of abbreviation.
> > > > > 
> > > > > > in addition to
> > > > > > 
> > > > > >    device_del ID
> > > > > > 
> > > > > > Open Firmware is a useful source of inspiration there, but should it
> > > > > > come into conflict with usability, we should let usability win.
> > > > > 
> > > > > --
> > > > >                       Gleb.
> > > > 
> > > > 
> > > > I think that the domain (PCI segment group), bus, slot, function way to
> > > > address pci devices is still the most familiar and the easiest to map to
> > > Most familiar to whom?
> > 
> > The guests.
> Which one? There are many guests. Your favorite?
> 
> > For CLI, we need an easy way to map a device in guest to the
> > device in qemu and back.
> Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what
> you are saying is "lets use name that guest assigned to a device".


No I am saying let's use the name that our ACPI tables assigned.

> > 
> > > It looks like you identify yourself with most of
> > > qemu users, but if most qemu users are like you then qemu has not enough
> > > users :) Most users that consider themselves to be "advanced" may know
> > > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > > "device_del eth1" or "device_add /dev/sdb" command though. 
> > > 
> > > More important is that "domain" (encoded as number like you used to)
> > > and "bus number" has no meaning from inside qemu.
> > > So while I said many
> > > times I don't care about exact CLI syntax to much it should make sense
> > > at least. It can use id to specify PCI bus in CLI like this:
> > > device_del pci.0:1.1. Or it can even use device id too like this:
> > > device_del pci.0:ide.0. Or it can use HW topology like in FO device
> > > path. But doing ah-hoc device enumeration inside qemu and then using it
> > > for CLI is not it.
> > > 
> > > > functionality in the guests.  Qemu is buggy in the moment in that is
> > > > uses the bus addresses assigned by guest and not the ones in ACPI,
> > > > but that can be fixed.
> > > It looks like you confused ACPI _SEG for something it isn't.
> > 
> > Maybe I did. This is what linux does:
> > 
> > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> > *root)
> > {
> >         struct acpi_device *device = root->device;
> >         int domain = root->segment;
> >         int busnum = root->secondary.start;
> > 
> > And I think this is consistent with the spec.
> > 
> It means that one domain may include several host bridges.
> At that level
> domain is defined as something that have unique name for each device
> inside it thus no two buses in one segment/domain can have same bus
> number. This is what PCI spec tells you. 

And that really is enough for CLI because all we need is locate the
specific slot in a unique way.

> And this further shows that using "domain" as defined by guest is very
> bad idea. 

As defined by ACPI, really.

> > > ACPI spec
> > > says that PCI segment group is purely software concept managed by system
> > > firmware. In fact one segment may include multiple PCI host bridges.
> > 
> > It can't I think:
> Read _BBN definition:
>  The _BBN object is located under a PCI host bridge and must be unique for
>  every host bridge within a segment since it is the PCI bus number.
> 
> Clearly above speaks about multiple host bridge within a segment.

Yes, it looks like the firmware spec allows that.

> >     Multiple Host Bridges
> > 
> >     A platform may have multiple PCI Express or PCI-X host bridges. The base
> >     address for the
> >     MMCONFIG space for these host bridges may need to be allocated at
> >     different locations. In such
> >     cases, using MCFG table and _CBA method as defined in this section means
> >     that each of these host
> >     bridges must be in its own PCI Segment Group.
> > 
> This is not from ACPI spec,

PCI Firmware Specification 3.0

> but without going to deep into it above
> paragraph talks about some particular case when each host bridge must
> be in its own PCI Segment Group with is a definite prove that in other
> cases multiple host bridges can be in on segment group.

I stand corrected. I think you are right. But note that if they are,
they must have distinct bus numbers assigned by ACPI.

> > 
> > > _SEG
> > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
> > > (Current Resource Settings) for that just like OF. No surprise there.
> > 
> > OSPM uses both I think.
> > 
> > All I see linux do with CRS is get the bus number range.
> So lets assume that HW has two PCI host bridges and ACPI has:
>         Device(PCI0) {
>             Name (_HID, EisaId ("PNP0A03"))
>             Name (_SEG, 0x00)
>         }
>         Device(PCI1) {
>             Name (_HID, EisaId ("PNP0A03"))
>             Name (_SEG, 0x01)
>         }
> I.e no _CRS to describe resources. How do you think OSPM knows which of
> two pci host bridges is PCI0 and which one is PCI1?

You must be able to uniquely address any bridge using the combination of _SEG
and _BBN.

> > And the spec says, e.g.:
> > 
> >       the memory mapped configuration base
> >     address (always corresponds to bus number 0) for the PCI Segment Group
> >     of the host bridge is provided by _CBA and the bus range covered by the
> >     base address is indicated by the corresponding bus range specified in
> >     _CRS.
> > 
> Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
> solve the problem for PCI first and then move to PCI Express. Jumping from one
> to another destruct us from main discussion.

I think this is what confuses us.  As long as you are using cf8/cfc there's no
concept of a domain really.
Thus:
        /p...@i0cf8

is probably enough for BIOS boot because we'll need to make root bus numbers
unique for legacy guests/option ROMs.  But this is not a hardware requirement
and might become easier to ignore eith EFI.

> > 
> > > > 
> > > > That should be enough for e.g. device_del. We do have the need to
> > > > describe the topology when we interface with firmware, e.g. to describe
> > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > > with), but that's probably the only case.
> > > > 
> > > Describing HW topology is the only way to unambiguously describe device to
> > > something or someone outside qemu and have persistent device naming
> > > between different HW configuration.
> > 
> > Not really, since ACPI is a binary blob programmed by qemu.
> > 
> APCI is part of the guest, not qemu.

Yes it runs in the guest but it's generated by qemu. On real hardware,
it's supplied by the motherboard.

> Just saying "not really" doesn't
> prove much. I still haven't seen any proposition from you that actually
> solve the problem. No, "lets use guest naming" is not it. There is no
> such thing as "The Guest". 
> 
> --
>                       Gleb.

I am sorry if I didn't make this clear.  I think we should use the domain:bus
pair to name the root device. As these are unique and 

-- 
MST

Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device

Reply via email to