Hi Burakov, Please find my test case below. Thanks!
Br, Tone -----Original Message----- From: Burakov, Anatoly <anatoly.bura...@intel.com> Sent: Tuesday, November 6, 2018 7:03 PM To: Tone Zhang (Arm Technology China) <tone.zh...@arm.com>; dev@dpdk.org Cc: nd <n...@arm.com> Subject: Re: [dpdk-dev] [PATCH] pci_vfio: Support 64KB kernel page_size with vfio-pci driver On 03-Nov-18 5:46 AM, Tone Zhang (Arm Technology China) wrote: > Hi Burakov, > > Thanks! > Please check my feedback below. > > Br, > Tone > > -----Original Message----- > From: dev <dev-boun...@dpdk.org> On Behalf Of Burakov, Anatoly > Sent: Thursday, November 1, 2018 6:01 PM > To: Tone Zhang (Arm Technology China) <tone.zh...@arm.com>; > dev@dpdk.org > Cc: nd <n...@arm.com> > Subject: Re: [dpdk-dev] [PATCH] pci_vfio: Support 64KB kernel > page_size with vfio-pci driver > > On 01-Nov-18 2:33 AM, Tone Zhang (Arm Technology China) wrote: >> Hi Burakov, >> >> I'm sorry for the late response. >> >> Thanks a lot for your comments. Please find my response below (marked >> with "Tone:"). 😊 >> >> Br, >> Tone >> >> -----Original Message----- >> From: Burakov, Anatoly <anatoly.bura...@intel.com> >> Sent: Wednesday, October 24, 2018 5:09 PM >> To: Tone Zhang (Arm Technology China) <tone.zh...@arm.com>; >> dev@dpdk.org >> Cc: nd <n...@arm.com> >> Subject: Re: [dpdk-dev] [PATCH] pci_vfio: Support 64KB kernel >> page_size with vfio-pci driver >> >> On 24-Oct-18 3:20 AM, tone.zhang wrote: >>> With a larger PAGE_SIZE it is possible for the MSI table to very >>> close to the end of the BAR s.t. when we align the MSI table to the >>> PAGE_SIZE, the end offset of the MSI table is out the PCI BAR >>> boundary. >>> >>> This patch addresses the issue by comparing both the start and the >>> end offset of the MSI table with the BAR size. >>> >>> The patch fixes the debug log as below: >>> EAL: Skipping BAR0 >>> >>> Signed-off-by: tone.zhang <tone.zh...@arm.com> >>> Reviewed-by: Gavin Hu <gavin...@arm.com> >>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> >>> Reviewed-by: Steve Capper <steve.cap...@arm.com> >>> --- >>> drivers/bus/pci/linux/pci_vfio.c | 25 +++++++++++++++++++++---- >>> 1 file changed, 21 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/bus/pci/linux/pci_vfio.c >>> b/drivers/bus/pci/linux/pci_vfio.c >>> index b1f0683..1373345 100644 >>> --- a/drivers/bus/pci/linux/pci_vfio.c >>> +++ b/drivers/bus/pci/linux/pci_vfio.c >>> @@ -445,9 +445,11 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct >>> mapped_pci_resource *vfio_res, >>> struct pci_msix_table *msix_table = &vfio_res->msix_table; >>> struct pci_map *bar = &vfio_res->maps[bar_index]; >>> >>> - if (bar->size == 0) >>> + if (bar->size == 0) { >>> /* Skip this BAR */ >>> + RTE_LOG(INFO, EAL, "Skipping this BAR%d\n", bar_index); >>> return 0; >> >> I feel like "this" is unnecessary here - just "Skipping BAR%d" should >> be enough :) >> >> Tone: Will update code and remove "this" in next version. >> >>> + } >>> >>> if (msix_table->bar_index == bar_index) { >>> /* >>> @@ -457,7 +459,12 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct >>> mapped_pci_resource *vfio_res, >>> uint32_t table_start = msix_table->offset; >>> uint32_t table_end = table_start + msix_table->size; >>> table_end = (table_end + ~PAGE_MASK) & PAGE_MASK; >>> - table_start &= PAGE_MASK; >>> + table_start = (table_start + ~PAGE_MASK) & PAGE_MASK; >> >> IMO these two additions should be replaced by RTE_ALIGN by page size. >> Makes the purpose of the code much clearer. >> >> Tone: Sure, it is better! Will update code in next version. Thanks! >> >>> + /* after rounding to PAGE_SIZE, it is over the bar->size, >>> + * fall back to the MSI-X table offset in the bar. >>> + */ >>> + if (table_start >= bar->size) >>> + table_start = msix_table->offset; >> >> If i understand things correctly, msix_table->offset value here may be >> unaligned, so falling back to this value may cause mapping failure, because >> we later use this value as a size of mapping (which needs to be page >> aligned). Shouldn't this be aligned using RTE_ALIGN_FLOOR by page size? >> >> Tone: It is a little tricky. Align msix_table->offset with RTE_ALIGN_FLOOR >> maybe get 0 if the offset is less than page size in the PCI bar. It will >> trigger mmap() error. IIRC the input parameter "size" in mmap() is not >> required to be aligned with page size, system will do it. But it is better >> if we can do it. If I was wrong, please correct me. Thanks a lot. > > Apologies, you're correct - length can be misaligned (just tested it). > > However, i think it's still worth aligning (and putting in an additional > check), because we want to make sure we *don't* attempt to map the MSI-X BAR, > and kernel might do that by adjusting length automatically and return mmap > failure that way. > > Tone: Thanks a lot! I agree with you. It worth aligning the size. I will > update code (RTE_ALIGN_FLOOR by page size) in next version. > I'd like to discuss one case with you. In the case, base->size is 16384, > msix_table->offset is 8192, page_size is 65536. After align > "msix_table->offset" with page_size (RTE_ALIGN_FLOOR), the value of > "table_start" is 0, mmap() will report error, and the memory mapping is > failed. > For the case (table_start is 0 after the aglinment), may I continue falling > back the "table_start" to " msix_table->offset" (not aligned with page size), > and left system adjust the length automatically? Thanks! Please correct me if i'm wrong, but this is a code path for when we're trying to mmap around the MSI-X BAR. Kernel will not allow us to do that, period, so whatever start/end addresses you get, they *must not* include a single byte of MSI-X BAR. So, in case like you described, i think we should just straight up refuse the map the entire BAR. However, as i do not have a system with such properties to test on, so please correct me if i'm wrong here :) Tone: I understand and agree with you. 😊 Please have a look at my test case. In my case, I tried to bind NVMe device with VFIO driver and the kernel page size is 64KB. Without the change, the test is failed. From the debug information, I observed that "bar->size" is 16384, "msix_table->offset" is 8192 and "msix_table->size" is 512. Regarding the page size is much bigger than the "bar->size", in the change, the code maps the first 8192 bytes ahead of MSI-X table. After align with the page size boundary, the "start" offset after the MSI-X table is over "bar->size", mmap() reports error. In this case, I can only map the memory before the MSI-X table. After fall back "table_start" to " msix_table->offset " (i.e. 8192 bytes), and NOT mapping the memory behind MSI-X table, the NVMe device can be bound to VFIO driver, and the test is passed. The kernel version in my test environment is 4.16. So in the change, I do not map any byte of MSI-X table, unfortunately I cannot align the memory "size" in mmap() to page size boundary. From the test result, the change fixes the error. The case looks a little tricky. If we refuse the memory map here, it means we cannot bind VFIO driver with some PCI devices with 64KB kernel page size. I hope we can support such case in DPDK. 😊 > > -- > Thanks, > Anatoly > -- Thanks, Anatoly