Hi Burakov,

Please find my test case below. Thanks!

Br,
Tone

-----Original Message-----
From: Burakov, Anatoly <anatoly.bura...@intel.com> 
Sent: Tuesday, November 6, 2018 7:03 PM
To: Tone Zhang (Arm Technology China) <tone.zh...@arm.com>; dev@dpdk.org
Cc: nd <n...@arm.com>
Subject: Re: [dpdk-dev] [PATCH] pci_vfio: Support 64KB kernel page_size with 
vfio-pci driver

On 03-Nov-18 5:46 AM, Tone Zhang (Arm Technology China) wrote:
> Hi Burakov,
> 
> Thanks!
> Please check my feedback below.
> 
> Br,
> Tone
> 
> -----Original Message-----
> From: dev <dev-boun...@dpdk.org> On Behalf Of Burakov, Anatoly
> Sent: Thursday, November 1, 2018 6:01 PM
> To: Tone Zhang (Arm Technology China) <tone.zh...@arm.com>; 
> dev@dpdk.org
> Cc: nd <n...@arm.com>
> Subject: Re: [dpdk-dev] [PATCH] pci_vfio: Support 64KB kernel 
> page_size with vfio-pci driver
> 
> On 01-Nov-18 2:33 AM, Tone Zhang (Arm Technology China) wrote:
>> Hi Burakov,
>>
>> I'm sorry for the late response.
>>
>> Thanks a lot for your comments. Please find my response below (marked 
>> with "Tone:"). 😊
>>
>> Br,
>> Tone
>>
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.bura...@intel.com>
>> Sent: Wednesday, October 24, 2018 5:09 PM
>> To: Tone Zhang (Arm Technology China) <tone.zh...@arm.com>; 
>> dev@dpdk.org
>> Cc: nd <n...@arm.com>
>> Subject: Re: [dpdk-dev] [PATCH] pci_vfio: Support 64KB kernel 
>> page_size with vfio-pci driver
>>
>> On 24-Oct-18 3:20 AM, tone.zhang wrote:
>>> With a larger PAGE_SIZE it is possible for the MSI table to very 
>>> close to the end of the BAR s.t. when we align the MSI table to the 
>>> PAGE_SIZE, the end offset of the MSI table is out the PCI BAR 
>>> boundary.
>>>
>>> This patch addresses the issue by comparing both the start and the 
>>> end offset of the MSI table with the BAR size.
>>>
>>> The patch fixes the debug log as below:
>>> EAL: Skipping BAR0
>>>
>>> Signed-off-by: tone.zhang <tone.zh...@arm.com>
>>> Reviewed-by: Gavin Hu <gavin...@arm.com>
>>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
>>> Reviewed-by: Steve Capper <steve.cap...@arm.com>
>>> ---
>>>     drivers/bus/pci/linux/pci_vfio.c | 25 +++++++++++++++++++++----
>>>     1 file changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/bus/pci/linux/pci_vfio.c
>>> b/drivers/bus/pci/linux/pci_vfio.c
>>> index b1f0683..1373345 100644
>>> --- a/drivers/bus/pci/linux/pci_vfio.c
>>> +++ b/drivers/bus/pci/linux/pci_vfio.c
>>> @@ -445,9 +445,11 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct 
>>> mapped_pci_resource *vfio_res,
>>>             struct pci_msix_table *msix_table = &vfio_res->msix_table;
>>>             struct pci_map *bar = &vfio_res->maps[bar_index];
>>>     
>>> -   if (bar->size == 0)
>>> +   if (bar->size == 0) {
>>>                     /* Skip this BAR */
>>> +           RTE_LOG(INFO, EAL, "Skipping this BAR%d\n", bar_index);
>>>                     return 0;
>>
>> I feel like "this" is unnecessary here - just "Skipping BAR%d" should 
>> be enough :)
>>
>> Tone: Will update code and remove "this" in next version.
>>
>>> +   }
>>>     
>>>             if (msix_table->bar_index == bar_index) {
>>>                     /*
>>> @@ -457,7 +459,12 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct 
>>> mapped_pci_resource *vfio_res,
>>>                     uint32_t table_start = msix_table->offset;
>>>                     uint32_t table_end = table_start + msix_table->size;
>>>                     table_end = (table_end + ~PAGE_MASK) & PAGE_MASK;
>>> -           table_start &= PAGE_MASK;
>>> +           table_start = (table_start + ~PAGE_MASK) & PAGE_MASK;
>>
>> IMO these two additions should be replaced by RTE_ALIGN by page size.
>> Makes the purpose of the code much clearer.
>>
>> Tone: Sure, it is better! Will update code in next version. Thanks!
>>
>>> +           /* after rounding to PAGE_SIZE, it is over the bar->size,
>>> +            * fall back to the MSI-X table offset in the bar.
>>> +           */
>>> +           if (table_start >= bar->size)
>>> +                   table_start = msix_table->offset;
>>
>> If i understand things correctly, msix_table->offset value here may be 
>> unaligned, so falling back to this value may cause mapping failure, because 
>> we later use this value as a size of mapping (which needs to be page 
>> aligned). Shouldn't this be aligned using RTE_ALIGN_FLOOR by page size?
>>
>> Tone: It is a little tricky. Align msix_table->offset with RTE_ALIGN_FLOOR 
>> maybe get 0 if the offset is less than page size in the PCI bar. It will 
>> trigger mmap() error. IIRC the input parameter "size" in mmap() is not 
>> required to be aligned with page size, system will do it. But it is better 
>> if we can do it. If I was wrong, please correct me. Thanks a lot.
> 
> Apologies, you're correct - length can be misaligned (just tested it).
> 
> However, i think it's still worth aligning (and putting in an additional 
> check), because we want to make sure we *don't* attempt to map the MSI-X BAR, 
> and kernel might do that by adjusting length automatically and return mmap 
> failure that way.
> 
> Tone: Thanks a lot! I agree with you. It worth aligning the size. I will 
> update code (RTE_ALIGN_FLOOR by page size) in next version.
> I'd like to discuss one case with you. In the case, base->size is 16384, 
> msix_table->offset is 8192, page_size is 65536. After align 
> "msix_table->offset" with page_size (RTE_ALIGN_FLOOR), the value of 
> "table_start" is 0, mmap() will report error, and the memory mapping is 
> failed.
> For the case (table_start is 0 after the aglinment), may I continue falling 
> back the "table_start" to " msix_table->offset" (not aligned with page size), 
> and left system adjust the length automatically? Thanks!

Please correct me if i'm wrong, but this is a code path for when we're trying 
to mmap around the MSI-X BAR. Kernel will not allow us to do that, period, so 
whatever start/end addresses you get, they *must not* include a single byte of 
MSI-X BAR. So, in case like you described, i think we should just straight up 
refuse the map the entire BAR.

However, as i do not have a system with such properties to test on, so please 
correct me if i'm wrong here :)


Tone: I understand and agree with you.  😊

Please have a look at my test case. In my case, I tried to bind NVMe device 
with VFIO driver and the kernel page size is 64KB. Without the change, the test 
is failed. 

From the debug information, I observed that "bar->size" is 16384, 
"msix_table->offset" is 8192 and "msix_table->size" is 512. Regarding the page 
size is much bigger than the "bar->size", in the change, the code maps the 
first 8192 bytes ahead of MSI-X table. After align with the page size boundary, 
the "start" offset after the MSI-X table is over "bar->size", mmap() reports 
error. In this case, I can only map the memory before the MSI-X table. After 
fall back "table_start" to " msix_table->offset " (i.e. 8192 bytes), and NOT 
mapping the memory behind MSI-X table, the NVMe device can be bound to VFIO 
driver, and the test is passed. The kernel version in my test environment is 
4.16. 

So in the change, I do not map any byte of MSI-X table, unfortunately I cannot 
align the memory "size" in mmap() to page size boundary. From the test result, 
the change fixes the error. The case looks a little tricky. If we refuse the 
memory map here, it means we cannot bind VFIO driver with some PCI devices with 
64KB kernel page size. I hope we can support such case in DPDK. 😊

> 
> --
> Thanks,
> Anatoly
> 


--
Thanks,
Anatoly

Reply via email to