Re: [dpdk-dev] [EXT] Re: [PATCH v10 4/5] kni: add IOVA=VA support in KNI module

Ferruh Yigit Mon, 21 Oct 2019 04:46:30 -0700

On 10/18/2019 6:15 PM, Vamsi Krishna Attunuru wrote:
> 
> 
>> -----Original Message-----
>> From: dev <dev-boun...@dpdk.org> On Behalf Of Ferruh Yigit
>> Sent: Wednesday, October 16, 2019 9:44 PM
>> To: Vamsi Krishna Attunuru <vattun...@marvell.com>; Stephen Hemminger
>> <step...@networkplumber.org>; Yigit, Ferruh
>> <ferruh.yi...@linux.intel.com>
>> Cc: dev@dpdk.org; tho...@monjalon.net; Jerin Jacob Kollanukkaran
>> <jer...@marvell.com>; olivier.m...@6wind.com;
>> anatoly.bura...@intel.com; arybche...@solarflare.com; Kiran Kumar
>> Kokkilagadda <kirankum...@marvell.com>
>> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v10 4/5] kni: add IOVA=VA support
>> in KNI module
>>
>> On 10/16/2019 12:26 PM, Vamsi Krishna Attunuru wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Stephen Hemminger <step...@networkplumber.org>
>>>> Sent: Tuesday, October 15, 2019 9:16 PM
>>>> To: Yigit, Ferruh <ferruh.yi...@linux.intel.com>
>>>> Cc: Vamsi Krishna Attunuru <vattun...@marvell.com>; dev@dpdk.org;
>>>> tho...@monjalon.net; Jerin Jacob Kollanukkaran <jer...@marvell.com>;
>>>> olivier.m...@6wind.com; ferruh.yi...@intel.com;
>>>> anatoly.bura...@intel.com; arybche...@solarflare.com; Kiran Kumar
>>>> Kokkilagadda <kirankum...@marvell.com>
>>>> Subject: [EXT] Re: [dpdk-dev] [PATCH v10 4/5] kni: add IOVA=VA
>>>> support in KNI module
>>>>
>>>> External Email
>>>>
>>>> ---------------------------------------------------------------------
>>>> -
>>>> On Tue, 15 Oct 2019 16:43:08 +0100
>>>> "Yigit, Ferruh" <ferruh.yi...@linux.intel.com> wrote:
>>>>
>>>>> On 8/16/2019 7:12 AM, vattun...@marvell.com wrote:
>>>>>> From: Kiran Kumar K <kirankum...@marvell.com>
>>>>>>
>>>>>> Patch adds support for kernel module to work in IOVA = VA mode, the
>>>>>> idea is to get physical address from IOVA address using
>>>>>> iommu_iova_to_phys API and later use phys_to_virt API to convert
>>>>>> the physical address to kernel virtual address.
>>>>>>
>>>>>> When compared with IOVA = PA mode, there is no performance drop
>>>>>> with this approach.
>>>>>>
>>>>>> This approach does not work with the kernel versions less than
>>>>>> 4.4.0 because of API compatibility issues.
>>>>>>
>>>>>> Patch also updates these support details in KNI documentation.
>>>>>>
>>>>>> Signed-off-by: Kiran Kumar K <kirankum...@marvell.com>
>>>>>> Signed-off-by: Vamsi Attunuru <vattun...@marvell.com>
>>>>>
>>>>> <...>
>>>>>
>>>>>> @@ -348,15 +351,65 @@ kni_ioctl_create(struct net *net, uint32_t
>>>> ioctl_num,
>>>>>>          strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);
>>>>>>
>>>>>>          /* Translate user space info into kernel space info */
>>>>>> -        kni->tx_q = phys_to_virt(dev_info.tx_phys);
>>>>>> -        kni->rx_q = phys_to_virt(dev_info.rx_phys);
>>>>>> -        kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
>>>>>> -        kni->free_q = phys_to_virt(dev_info.free_phys);
>>>>>> -
>>>>>> -        kni->req_q = phys_to_virt(dev_info.req_phys);
>>>>>> -        kni->resp_q = phys_to_virt(dev_info.resp_phys);
>>>>>> -        kni->sync_va = dev_info.sync_va;
>>>>>> -        kni->sync_kva = phys_to_virt(dev_info.sync_phys);
>>>>>> +        if (dev_info.iova_mode) {
>>>>>> +#ifdef HAVE_IOVA_AS_VA_SUPPORT
>>>>>> +                pci = pci_get_device(dev_info.vendor_id,
>>>>>> +                                     dev_info.device_id, NULL);
>>>>>> +                if (pci == NULL) {
>>>>>> +                        pr_err("pci dev does not exist\n");
>>>>>> +                        return -ENODEV;
>>>>>> +                }
>>>>>
>>>>> If there is no PCI device KNI should still work.
>>>>
>>>> Right now it is possible to use KNI with netvsc PMD on Hyper-V/Azure.
>>>> With this patch that won't be possible.
>>>
>>> Hi Ferruh, Stephen,
>>>
>>> These can be fixed by forcing iommu_mode as PA when vdevs are used for
>>> KNI usecase.
>>>
>>> rte_bus_get_iommu_class(void)
>>>  {
>>>         enum rte_iova_mode mode = RTE_IOVA_DC;
>>> +       struct rte_devargs *devargs = NULL;
>>>         bool buses_want_va = false;
>>>         bool buses_want_pa = false;
>>>         struct rte_bus *bus;
>>>
>>> +       if (rte_eal_check_module("rte_kni") == 1) {
>>> +               RTE_EAL_DEVARGS_FOREACH("vdev", devargs) {
>>> +                       return RTE_IOVA_PA;
>>> +               }
>>> +       }
>>> +
>>>         TAILQ_FOREACH(bus, &rte_bus_list, next) {
>>>                 enum rte_iova_mode bus_iova_mode;
>>>
>>> I think this will solve various use cases/combinations like PA or VA mode,
>> pdev or vdev used for KNI.
>>> Existing use cases would not be affected by these patch series with above
>> fix.
>>>
>>
>> Hi Vamsi,
>>
>> I think this is not a problem of using vdev so I think we can't solve this 
>> via
>> vdev check only.
>>
>> The sample I give (KNI PMD) is using vdev, but application can use KNI 
>> library
>> APIs directly to create kni interface and have kernel/user space
>> communication without any device involved. KNI PMD (vdev) is just a
>> wrapper to make this easy.
>>
>> Just thinking aloud,
>> KNI is sharing in userspace buffer with kernel, so basically it needs to do
>> virtual address to kernel virtual address translation, in a reasonably fast
>> manner.
>>
>> iova=va breaks KNI because:
>> 1) physical memory is not continuous anymore which break our address
>> translation logic
>> 2) we were using physical address of the buffer for the address translation,
>> but we have no more have it, we now have iova address.
>>
>> I assume 1) is resolved with 'rte_kni_pktmbuf_pool_create()' it would be
>> helpful though if you can explain how this works?
> 
> KNI kernel module uses pa2va, va2pa calls to translate buffer pointers, this 
> will only work when both pa & va are 1-to-1 contiguous. In such cases, dpdk 
> mbuf memory should not cross physical page boundary,  If it crosses, it's 
> iova/va will be contiguous, but it's pa may or may not be contiguous. Kni 
> pktmbuf pool api ensures that mempool is not populated with those type of 
> mbufs(that cross page boundary). Once the pool is created, all mbufs in that 
> pool resides with in the page limits.


ack.

>>
>> For second, a simple question, do we need to get a PCIe device information
>> to be able to convert iova to kernel virtual address? Can't I get this
>> information from iommu somehow?
>> Think about a case "--iova-mode=va" provided but there is no physical device
>> bind to vfio-pci, can I still allocated memor? And how can I use KNI in that
>> case?
> 
> We found a way to translate iova/uva to kva using kernel mm subsystem calls, 
> using get_user_pages_remote() call KNI module is able to get mapping for iova 
> to kva. This solution worked for pdevs(without sharing any dev info) and also 
> for wrapper vdev pmds.

Good to hear this.

> 
> I will push next version of patches with these solution and other command 
> line arg changes. Since the major architectural issue is fixed in enabling 
> iova=va for KNI, we are planning to merge these patch series in 19.11 release.
>

Re: [dpdk-dev] [EXT] Re: [PATCH v10 4/5] kni: add IOVA=VA support in KNI module

Reply via email to