On 15/07/2020 16:16, Oliver O'Halloran wrote:
> On Wed, Jul 15, 2020 at 3:24 PM Alexey Kardashevskiy <a...@ozlabs.ru> wrote:
>>
>>
>>> @@ -158,9 +157,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct
>>> pci_dev *pdev)
>>> goto disable_iov;
>>> pdev->dev.archdata.iov_data = iov;
>>>
>>> + /* FIXME: totalvfs > phb->ioda.total_pe_num is going to be a problem
>>> */
>>
>>
>> WARN_ON_ONCE() then?
>
> can't hurt
>
>>> @@ -173,50 +172,51 @@ static void pnv_pci_ioda_fixup_iov_resources(struct
>>> pci_dev *pdev)
>>> goto disable_iov;
>>> }
>>>
>>> - total_vf_bar_sz += pci_iov_resource_size(pdev,
>>> - i + PCI_IOV_RESOURCES);
>>> + vf_bar_sz = pci_iov_resource_size(pdev, i +
>>> PCI_IOV_RESOURCES);
>>>
>>> /*
>>> - * If bigger than quarter of M64 segment size, just round up
>>> - * power of two.
>>> + * Generally, one segmented M64 BAR maps one IOV BAR. However,
>>> + * if a VF BAR is too large we end up wasting a lot of space.
>>> + * If we've got a BAR that's bigger than greater than 1/4 of
>>> the
>>
>>
>> bigger, greater, huger? :)
>>
>> Also, a nit: s/got a BAR/got a VF BAR/
>
> whatever, it's just words
You are talking about these BARs and those BARs and since we want "to
help out
the next sucker^Wperson who needs to tinker with it", using precise term
is kinda essential here.
>
>>> + * default window's segment size then switch to using single
>>> PE
>>> + * windows. This limits the total number of VFs we can
>>> support.
>>
>> Just to get idea about absolute numbers here.
>>
>> On my P9:
>>
>> ./pciex@600c3c0300000/ibm,opal-m64-window
>> 00060200 00000000 00060200 00000000 00000040 00000000
>>
>> so that default window's segment size is 0x40.0000.0000/512 = 512MB?
>
> Yeah. It'll vary a bit since PHB3 and some PHB4s have 256.
>
>>> *
>>> - * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
>>> - * with other devices, IOV BAR size is expanded to be
>>> - * (total_pe * VF_BAR_size). When VF_BAR_size is half of M64
>>> - * segment size , the expanded size would equal to half of the
>>> - * whole M64 space size, which will exhaust the M64 Space and
>>> - * limit the system flexibility. This is a design decision to
>>> - * set the boundary to quarter of the M64 segment size.
>>> + * The 1/4 limit is arbitrary and can be tweaked.
>>> */
>>> - if (total_vf_bar_sz > gate) {
>>> - mul = roundup_pow_of_two(total_vfs);
>>> - dev_info(&pdev->dev,
>>> - "VF BAR Total IOV size %llx > %llx, roundup
>>> to %d VFs\n",
>>> - total_vf_bar_sz, gate, mul);
>>> - iov->m64_single_mode = true;
>>> - break;
>>> - }
>>> - }
>>> + if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
>>> + /*
>>> + * On PHB3, the minimum size alignment of M64 BAR in
>>> + * single mode is 32MB. If this VF BAR is smaller than
>>> + * 32MB, but still too large for a segmented window
>>> + * then we can't map it and need to disable SR-IOV for
>>> + * this device.
>>
>>
>> Why not use single PE mode for such BAR? Better than nothing.
>
> Suppose you could, but I figured VFs were mainly interesting since you
> could give each VF to a separate guest. If there's multiple VFs under
> the same single PE BAR then they'd have to be assigned to the same
True. But with one PE per VF we can still have 15 (or 14?) isolated VFs
which is not hundreds but better than 0.
> guest in order to retain the freeze/unfreeze behaviour that PAPR
> requires. I guess that's how it used to work, but it seems better just
> to disable them rather than having VFs which sort of work.
Well, realistically the segment size should be 8MB to make this matter
(or the whole window 2GB) which does not seem to happen so it does not
matter.
--
Alexey