On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:
-----Original Message-----
From: Burakov, Anatoly <anatoly.bura...@intel.com>
Sent: Tuesday, July 9, 2019 7:00 PM
To: Jerin Jacob Kollanukkaran <jer...@marvell.com>; David Marchand
<david.march...@redhat.com>
Cc: dev <dev@dpdk.org>; Thomas Monjalon <tho...@monjalon.net>; Ben
Walker <benjamin.wal...@intel.com>
Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
selection
On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:
-----Original Message-----
From: Burakov, Anatoly <anatoly.bura...@intel.com>
Sent: Tuesday, July 9, 2019 5:10 PM
To: Jerin Jacob Kollanukkaran <jer...@marvell.com>; David Marchand
<david.march...@redhat.com>
Cc: dev <dev@dpdk.org>; Thomas Monjalon <tho...@monjalon.net>;
Ben
Walker <benjamin.wal...@intel.com>
Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
mode selection
________________________________________
On Mon, Jul 8, 2019 at 4:25 PM <mailto:jer...@marvell.com> wrote:
From: Jerin Jacob <mailto:jer...@marvell.com>
Existing logic fails to select IOVA mode as VA if driver request
to enable IOVA as VA.
IOVA as VA has more strict requirement than other modes, so
enabling positive logic for IOVA as VA selection.
This patch also updates the default IOVA mode as PA for PCI
devices as it has to deal with DMA engines unlike the virtual
devices that may need only IOVA as DC.
We have three cases:
- driver/hw supports IOVA as PA only
[Jerin] It is not driver cap, it is more of system cap(IOMMU vs
non IOMMU). We are already addressing that case
I don't get how this works. How does "system capability" affect
what the device itself supports? Are we to assume that *all*
hardware support IOVA as VA by default? "System capability" is more
of a bus issue than an individual device issue, is it not?
What I meant is, supporting VA vs PA is function of IOMMU(not the
device
attribute).
Ie. Device makes the bus master request, if IOMMU available and
enabled in the SYSTEM , It goes over IOMMU and translate the IOVA
to
physical address.
Another way to put is, Is there any _PCIe_ device which
need/requires RTE_PCI_DRV_NEED_IOVA_AS_PA in
rte_pci_driver.drv_flags
Previously, as far as i can tell, the flag was used to indicate
support for IOVA as VA mode, not *requirement* for IOVA as VA mode.
For example, there are multiple patches [1][2][3][4] (i'm sure i can
find more!) that added IOVA as VA support to various drivers, and
they all were worded it in this exact way
- "support for IOVA as VA mode", not "require IOVA as VA mode". As
far as i can tell, none of these drivers *require* IOVA as VA mode -
they merely use this flag to indicate support for it.
Some class of devices NEED IOVA as VA for performance reasons.
Specially the devices has HW mempool allocators. On those devices If
we don’t use IOVA as VA, Upon getting packet from device, It needs to
go over rte_mem_iova2virt() per packet see driver/net/dppa2. Which has
real performance issue.
I wouldn't classify this as "needing" IOVA. "Need" implies it cannot work
without it, whereas in this case it's more of a "highly recommended" rather
than "need".
It is "need" as performance is horrible without it as is per packet SW
translation.
A "need" for DPDK performance perspective.
Would the driver fail to initialize if it detects running as IOVA as PA?
Now suddenly it turns out that someone somewhere "knew" that "IOVA
as
VA" flag in PCI drivers is supposed to indicate *requirement* and not
support, and it appears that this knowledge was not communicated nor
documented anywhere, and is now treated as common knowledge.
I think, the confusion here is, I was under impression that # If
device supports IOVA as VA and system runs with IOMMU then the dpdk
should run in IOVA as VA mode.
If above statement true then we don’t really need a new flag.
Exactly. And the flag used to indicate that the device *supports* IOVA as VA,
not that it *requires* it.
Couple of points to make forward progress:
# If we think, there is a use case where device is IOVA as VA And
system runs in IOMMU mode then for some reason DPDK needs to run in
PA
mode. If so, we need to create two flags RTE_PCI_DRV_IOVA_AS_VA - it
can run either modes
There are use cases - KNI and igb_uio come to mind. Whether IOMMU uses
VA or PA is a different from whether IOMMU is in use - there is no law that
states that, when using IOMMU, IOVA have to have 1:1 mapping with VA.
IOMMU requirement does not necessarily imply IOVA as VA - it is perfectly
legal to program IOMMU to use IOVA as PA (which we currently do when we
e.g. use VFIO for some devices and igb_uio for others).
For KNI, we already submitted a patch to support IOVA as VA.
Yep, point being that it *didn't work* before, hence we may want to
account for possible future use cases like this (however admittedly
hacky they are). There are valid use cases to enforce IOVA as VA support
only (such as for performance reasons, even though it would be
technically possible to use IOVA as PA), and there could be valid use
cases to enforce IOVA as PA support only (for example, i seem to
remember that crypto/qat driver at one point didn't support VFIO driver,
effectively rendering it not supporting IOVA as VA).
I don’t know about igb_uio, if IOVA as PA, we might as well disable IOMMU.
Is igb_uio enables IOMMU? I don’t see any reference.
grep -ri "iommu" kernel/linux/igb_uio/
igb_uio can work with IOMMU with pass-through mode. When Linux is booted
up in pass-through, IOMMU is enabled and igb_uio will work, and VFIO can
use both IOVA as PA and IOVA as VA, while igb_uio can only use IOVA as
PA. So yes, igb_uio does enable IOMMU in a very limited way, but only to
set up 1:1 mapping of IOVA with PA.
Also, some other use cases will also require IOVA as PA while having
full IOMMU support. An example of this would be systems with limited
IOMMU width (such as VM's) - even though the IOMMU is technically
supported, we may not have the necessary address width to run all
devices in IOVA as VA mode, and would need to fall back to IOVA as PA.
Since we cannot *require* IOVA as VA in current codebase, any driver
that expects IOVA as VA to always be enabled will presumably not work.
Again, it is not device attribute, it is system attribute.
If it's a system attribute, why is it a device driver flag then? The
system may or may not support IOMMU, the device itself probably doesn't
care since bus address looks the same in both cases, *but the driver
might* (such as would be in your case - requiring IOVA as VA and
disallowing IOVA as PA for performance reasons).
In current KNI case it
Fall backs to PA irrespective of device capability so we don’t need any
separate flag from driver.
Even if we introduce a flag, what it is supposed to do?
The same thing you are suggesting for one of your HW mempool drivers: a
"need" to only run in IOVA as VA mode.
The logic in this case (as far as the driver is concerned, disregarding
the kernel driver issue for now) would be:
has_pa = (drv->flags & SUPPORTS_PA) != 0;
has_va = (drv->flags & SUPPORTS_VA) != 0;
if (has_va && has_pa)
return RTE_IOVA_DC; // don't care, supports both
if (has_va)
return RTE_IOVA_VA; // only supports VA - your case
return RTE_IOVA_PA; // only supports PA
Currently (again, disregarding your interpretation of how IOVA as VA
works and looking at the actual commit history), we always seem to imply
that IOVA as PA works for all devices, and we use IOVA_AS_VA flag to
indicate that the device *also* supports IOVA as VA mode.
But we don't have any way to express a *requirement* for IOVA as VA mode
- only for IOVA as PA mode. That is the purpose of the new flag. You are
stating that the IOVA_AS_VA drv flag is an expression of that
requirement, but that is not reflected in the codebase - our commit
history indicates that we don't treat IOVA as VA as hard requirement
whenever this flag is specified (and i would argue that we shouldn't).
RTE_PCI_DRV_NEED_IOVA_AS_VA - it can run only on IOVA as VA
If we're adding a flag, we might as well not create a confusion and do it
consistently. If IOVA as PA is supported, have a flag to indicate that. If IOVA
as VA is supported, have a flag to indicate that. Absence of either flag implies
So in what category i40e driver comes? By default, pci bus should return PA for
class.
If VA supported then return VA.
So how new flag will help?
We seem to be in agreement that we need *two* flags to express all three
of the above. The question is, which flags. You suggest to have
"supports IOVA as VA" and "requires IOVA as VA" as two options, while i
am suggesting to have "supports IOVA as PA" and "supports IOVA as VA" as
flags. This requires modification to all existing drivers and is perhaps
undesirable from that point of view (this isn't my decision), but it's
less confusing than having two IOVA-as-VA flags that differ slightly in
their meaning (supports VA vs. requires VA).
Going back to your i40e example, AFAIK i40e supports both IOVA as VA and
IOVA as PA - so in this case it should return RTE_IOVA_DC (i.e. use
whatever's available). If other devices also don't care, then push the
decision to the upper layers and not decide anything at the bus level [1].
[1] http://patchwork.dpdk.org/patch/54801/
inability to work in that mode. I don't see how this is less clear and self-
documenting than having two IOVA as VA-related flags that have slightly
different meaning and imply things not otherwise stated explicitly.
# With top of tree, Currently it never runs in IOVA as VA mode.
That’s a separate problem to fix. Which effect all the devices
Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though Device
support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
protection and/or root privilege is required to run DPDK.
What's your view on this existing problem?
My view would be to always run in IOVA as VA by default and only falling
back to IOVA as PA if there is a need to do that. Yet, it seems that
whenever i try to bring this up, the response (not necessarily from you,
so this is not directed at you specifically) seems to be that because of
hotplug, we have to start in the "safest" (from device support point of
view) mode - that is, in IOVA as PA. Seeing how, as you claim, some
devices require IOVA as VA, then IOVA as PA is no longer the "safe"
default that all devices will support. Perhaps we can use this
opportunity to finally make IOVA as VA the default :)
[1] http://patchwork.dpdk.org/patch/53206/
[2] http://patchwork.dpdk.org/patch/50274/
[3] http://patchwork.dpdk.org/patch/50991/
[4] http://patchwork.dpdk.org/patch/46134/
--
Thanks,
Anatoly
--
Thanks,
Anatoly
--
Thanks,
Anatoly