On 9/11/24 11:15 AM, Akihiko Odaki wrote:
> On 2024/09/11 22:53, Matthew Rosato wrote:
>> On 9/11/24 6:58 AM, Akihiko Odaki wrote:
>>> On 2024/09/11 18:38, Cédric Le Goater wrote:
>>>> +Matthew +Eric
>>>>
>>>> Side note for the maintainers :
>>>>
>>>> Before this change, the igb device, which is multifunction, was working
>>>> fine under Linux.
>>>>
>>>> Was there a fix in Linux since :
>>>>
>>>>     57da367b9ec4 ("s390x/pci: forbid multifunction pci device")
>>>>     6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug 
>>>> handler")
>>>>
>>>> ?
>> The timing of those particular commits predates the linux s390 kernel 
>> support of multifunction/SR-IOV.  At that time it was simply not possible on 
>> s390.
> 
> Is it OK to remove this check of multifunction now?

Yes, I think removing the check (which AFAIU was broken since 6069bcdeacee) 
will get us back to the behavior Cedric was seeing, where a device that reports 
multifunction in the config space is still allowed through but only the PF will 
be usable in the guest.

> 
> This code is not working properly with SR-IOV and misleading. It is better to 
> remove the code if it does no good.
> 
> It would be nice if anyone confirms multifunction works for s390x with the 
> code removed.

Even if you remove the check, multifunction itself won't work in the s390x 
guest without these missing CLP pieces too.  When I have some time I'll hack 
something together to fabricate some CLP data and try it out, but it sounds 
like Cedric could use his setup to right now at least verify that the PF itself 
should remain usable in the guest (current behavior).

> 
>>
>>>>
>>>> s390 PCI devices do not have extended capabilities, so the igb device
>>>> does not expose the SRIOV capability and only the PF is accessible but
>>>> it doesn't seem to be an issue. (Btw, CONFIG_PCI_IOV is set to y in the
>>>> default Linux config which is unexpected)
>>
>> The linux config option makes sense because the s390 kernel now supports 
>> SR-IOV/multifunction.
>>
>>>
>>> Doesn't s390x really see extended capabilities? hw/s390x/s390-pci-inst.c 
>>> has a call pci_config_size() and pci_host_config_write_common(), which 
>>> means it is exposing the whole PCI Express configuration space. Why can't 
>>> s390x use extended capabilities then?
>>>
>>
>> So, rather than poking around in config space, s390 (and thus the s390 
>> kernel) has an extra layer of 'capabilities' that it generally relies on to 
>> determine device functionality called 'CLP'.  Basically, there are pieces of 
>> CLP that are not currently generated (or forwarded from the host in the case 
>> of passthrough) by QEMU that would be needed by the guest to recognize the 
>> SRIOV/multifunction capability of a device, despite what config space has in 
>> it.  I suspect this is exactly why only the PF was available to your igb 
>> device then (missing CLP info made the device appear to not have 
>> multifunction capability as far as the s390 guest is concerned - fwiw adding 
>> CLP emulation to enable that is on our todo list).
> 
> What is expected to happen if you poke the configuration space anyway? I also 
> wonder if there is some public documentation of CLP and relevant aspect of 
> PCI support in s390x.

On s390, the contents of config space might be what is directly reported by the 
device or may have been modified by a firmware layer in between the device and 
the LPAR (logical partition that is the closest thing to bare-metal on s390 
where you could run a linux host).  It's not that config space can't be read on 
s390 or anything like that, rather that the base s390 PCI kernel layer makes 
its decisions about multifunction based specifically on what is reported via 
CLP.  If CLP doesn't report that the device is multifunction capable (which it 
never does for a s390x QEMU guest today) then it is treated as not having 
multifunction support.

Unfortunately, the CLP details are not publicly documented.

> 
>>
>> Sounds like the short-term solution here would be to continue allowing the 
>> PF without multifunction being visible to the guest (so as to not regress 
>> prior functionality) and then aim for proper support after with the 
>> necessary CLP pieces.
> 
> I agree; we can keep the PF working.

Thanks!

Matt


Reply via email to