[dpdk-dev] [dpdk-announce] release candidate 16.11-rc2
A new DPDK release candidate is ready for testing: http://dpdk.org/browse/dpdk/tag/?id=v16.11-rc2 The release 16.11 is going to be small. We should try to speed up testing and fixing bugs in order to start the next cycle timely. Please shout if you are aware of an important bug. Now, priorities are: - bug fixing - API names check (renaming is still accepted) - documentation and release notes Thank you everyone
[dpdk-dev] [PATCH v2] eal: fix libabi macro for device generalization patches
Hello Ferruh, On Wednesday 26 October 2016 07:55 PM, Ferruh Yigit wrote: > Hi Shreyansh, > > On 10/26/2016 2:12 PM, Shreyansh Jain wrote: >> On Wednesday 26 October 2016 06:30 PM, Shreyansh Jain wrote: >>> rte_device/driver generalization patches [1] were merged without a change >>> in the LIBABIVER macro. This patches bumps the macro of affected libs. >>> >>> Also, deprecation notice from 16.07 has been removed and release notes for >>> 16.11 added. >>> >>> Signed-off-by: Shreyansh Jain >>> -- >>> v2: >>> - Mark bumped libraries in release_16_11.rst file >>> - change code symbol names from text to code layout >>> >>> --- > > <...> > >>> .. code-block:: diff >>> >>> - libethdev.so.4 >>> + + libethdev.so.4 >> >> Just noticed: >> Should the '4' here reflect the current LIBABIVER number? >> If so, I will send this patch again. > > Yes, as you guessed, it should be: > - libethdev.so.4 > + + libethdev.so.5 > > <...> > >>> diff --git a/lib/librte_eal/bsdapp/eal/Makefile >>> b/lib/librte_eal/bsdapp/eal/Makefile >>> index a15b762..122798c 100644 >>> --- a/lib/librte_eal/bsdapp/eal/Makefile >>> +++ b/lib/librte_eal/bsdapp/eal/Makefile >>> @@ -48,7 +48,7 @@ LDLIBS += -lgcc_s >>> >>> EXPORT_MAP := rte_eal_version.map >>> >>> -LIBABIVER := 3 >>> +LIBABIVER := 4 > > eal version seems already increased for this release, 2 => 3, in: > d7e61ad3ae36 ("log: remove deprecated history dump") > > So NO need to increase it again, sorry for late notice, I just > recognized it. > Only librte_ether and librte_cryptodev requires the increase. Thanks for clearing this. I will bump librte_ether and librte_cryptodev and send across v3. > > <...> > > Thanks, > ferruh > > The LIBABI check script is really helpful. I wish I had run that for rte_driver/device patchset. Thanks for that info, though. - Shreyansh
[dpdk-dev] [PATCH v2] eal: fix libabi macro for device generalization patches
On Wednesday 26 October 2016 08:53 PM, Thomas Monjalon wrote: > 2016-10-26 15:25, Ferruh Yigit: >> eal version seems already increased for this release, 2 => 3, in: >> d7e61ad3ae36 ("log: remove deprecated history dump") > > Yes thanks. > >> So NO need to increase it again, sorry for late notice, I just >> recognized it. >> Only librte_ether and librte_cryptodev requires the increase. > > Please could you also explain in the commit message that: > - EAL was already bumped > - what is the breakage in ethdev > - what is the breakage in cryptodev Indeed. Will do in v3 > > Thanks > - Shreyansh
[dpdk-dev] [PATCH] doc: remove Intel reference from multi-process support guide
multi-process support has been verified on non IA such as ARMv8. Signed-off-by: Jerin Jacob --- doc/guides/prog_guide/multi_proc_support.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/guides/prog_guide/multi_proc_support.rst b/doc/guides/prog_guide/multi_proc_support.rst index badd102..2a996ae 100644 --- a/doc/guides/prog_guide/multi_proc_support.rst +++ b/doc/guides/prog_guide/multi_proc_support.rst @@ -35,7 +35,7 @@ Multi-process Support In the DPDK, multi-process support is designed to allow a group of DPDK processes to work together in a simple transparent manner to perform packet processing, -or other workloads, on Intel? architecture hardware. +or other workloads. To support this functionality, a number of additions have been made to the core DPDK Environment Abstraction Layer (EAL). -- 2.5.5
[dpdk-dev] Solarflare PMD submission question
Hi, we would like to include Solarflare libefx-based PMD in the DPDK 17.02 and start the upstreaming process. The driver supports Solarflare SFN7xxx and SFN8xxx families of 10/40 Gbps adapters. The driver has base driver. It is just fresh version of the same code which is used in the FreeBSD [1], illumos [2] and some other Solarflare drivers. The question is how to submit the base driver which is pretty big. Mail size of the patch which imports it is about 2 Mb. Further changes in the base driver will go in small patches (as it is done, for example, in the FreeBSD). The PMD itself is split into small and, I hope, readable and nice patches. [1] https://svnweb.freebsd.org/base/head/sys/dev/sfxge/common/ [2] https://github.com/illumos/illumos-gate/tree/master/usr/src/uts/common/io/sfxge/common/ Andrew.
[dpdk-dev] [PATCH v3] eal: fix libabi macro for device generalization patches
rte_device/driver generalization patches [1] were merged without a change in the LIBABIVER macro. This patches bumps the macro of affected libs. (librte_eal was already bumped; libcryptodev and libetherdev have been bumped). Details of ABI/API changes: - EAL (version not bumped) |- type field was removed from rte_driver |- rte_pci_device now embeds rte_device |- rte_pci_resource renamed to rte_mem_resource |- numa_node and devargs of rte_pci_driver is moved to rte_driver |- APIs for device hotplug (attach/detach) moved into EAL |- API rte_eal_pci_device_name added for PCI device naming |- vdev registration API introduced (rte_eal_vdrv_register, | rte_eal_vdrv_unregister - librte_crypto (v 1=>2) |- removed rte_cryptodev_create_unique_device_name API |- moved device naming to EAL - librte_ethdev (v 4=>5) |- rte_eth_dev_type is removed |- removed dev_type from rte_eth_dev_allocate API |- removed API rte_eth_dev_get_device_type |- removed API rte_eth_dev_get_addr_by_port |- removed API rte_eth_dev_get_port_by_addr |- removed rte_cryptodev_create_unique_device_name API |- moved device naming to EAL Also, deprecation notice from 16.07 has been removed and release notes for 16.11 added. [1] http://dpdk.org/ml/archives/dev/2016-September/047087.html Signed-off-by: Shreyansh Jain -- v3: - add API/ABI change info in commit log - fix library version change notification in release note - fix erroneous change to librte_eal version in v2 --- doc/guides/rel_notes/deprecation.rst | 12 doc/guides/rel_notes/release_16_11.rst | 30 -- lib/librte_cryptodev/Makefile | 2 +- lib/librte_ether/Makefile | 2 +- 4 files changed, 30 insertions(+), 16 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index d5c1490..884a231 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -18,18 +18,6 @@ Deprecation Notices ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of segments limit to be transmitted by device for TSO/non-TSO packets. -* The ethdev hotplug API is going to be moved to EAL with a notification - mechanism added to crypto and ethdev libraries so that hotplug is now - available to both of them. This API will be stripped of the device arguments - so that it only cares about hotplugging. - -* Structures embodying pci and vdev devices are going to be reworked to - integrate new common rte_device / rte_driver objects (see - http://dpdk.org/ml/archives/dev/2016-January/031390.html). - ethdev and crypto libraries will then only handle those objects so that they - do not need to care about the kind of devices that are being used, making it - easier to add new buses later. - * ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and ``nb_segs`` in one operation, because some platforms have an overhead if the diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst index aa0c09a..5a5485b 100644 --- a/doc/guides/rel_notes/release_16_11.rst +++ b/doc/guides/rel_notes/release_16_11.rst @@ -149,6 +149,32 @@ Resolved Issues EAL ~~~ +* **Improved device/driver heirarchy and generalized hotplugging** + + Device and driver relationship has been restructured by introducing generic + classes. This paves way for having PCI, VDEV and other device types as + just instantiated objects rather than classes in themselves. Hotplugging too + has been generalized into EAL so that ethernet or crypto devices can use the + common infrastructure. + + * removed ``pmd_type`` as way of segragation of devices + * moved ``numa_node`` and ``devargs`` into ``rte_driver`` from +``rte_pci_driver``. These can now be used by any instantiated object of +``rte_driver``. + * added ``rte_device`` class and all PCI and VDEV devices inherit from it + * renamed devinit/devuninit handlers to probe/remove to make it more +semantically correct with respect to device<=>driver relationship + * moved hotplugging support to EAL. Hereafter, PCI and vdev can use the +APIs ``rte_eal_dev_attach`` and ``rte_eal_dev_detach``. + * helpers and support macros have been renamed to make them more synonymous +with their device types +(e.g. ``PMD_REGISTER_DRIVER`` => ``DRIVER_REGISTER_PCI``) + * Device naming functions have been generalized from ethdev and cryptodev +to EAL. ``rte_eal_pci_device_name`` has been introduced for obtaining +unique device name from PCI Domain-BDF description. + * Virtual device registration APIs have been added: ``rte_eal_vdrv_register`` +and ``rte_eal_vdrv_unregister``. + Drivers ~~~ @@ -232,11 +258,11 @@ The libraries prepended with a plus sign were incremented in this version. .. code-block:: diff - libethdev.so.4 +
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
Hi Maxime, Seems indirect desc feature is causing serious performance degradation on Haswell platform, about 20% drop for both mrg=on and mrg=off (--txqflags=0xf00, non-vector version), both iofwd and macfwd. I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz. Could you please verify if this is true in your test? Thanks Zhihong > -Original Message- > From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] > Sent: Monday, October 17, 2016 10:15 PM > To: Yuanhan Liu > Cc: Wang, Zhihong ; Xie, Huawei > ; dev at dpdk.org; vkaplans at redhat.com; > mst at redhat.com; stephen at networkplumber.org > Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support > to the TX path > > > > On 10/17/2016 03:21 PM, Yuanhan Liu wrote: > > On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: > >>> On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. > >>> I'll continue the investigation early next week. > >> > >> The root cause is identified. > >> When INDIRECT_DESC feature is negotiated, Windows guest uses indirect > >> for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & > >> virtio-net kernel driver) use indirect only for Tx. > >> I'll implement indirect support for the Rx path in vhost lib, but the > >> change will be too big for -rc release. > >> I propose in the mean time to disable INDIRECT_DESC feature in vhost > >> lib, we can still enable it locally for testing. > >> > >> Yuanhan, is it ok for you? > > > > That's okay. > I'll send a patch to disable it then. > > > > >> > >>> Has anyone already tested Windows guest with vhost-net, which also > has > >>> indirect descs support? > >> > >> I tested and confirm it works with vhost-net. > > > > I'm a bit confused then. IIRC, vhost-net also doesn't support indirect > > for Rx path, right? > > No, it does support it actually. > I thought it didn't support too, I misread the Kernel implementation of > vhost-net and virtio-net. Acutally, virtio-net makes use of indirect > in Rx path when mergeable buffers is disabled. > > The confusion certainly comes from me, sorry about that. > > Maxime
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
Hi Zhihong, On 10/27/2016 11:00 AM, Wang, Zhihong wrote: > Hi Maxime, > > Seems indirect desc feature is causing serious performance > degradation on Haswell platform, about 20% drop for both > mrg=on and mrg=off (--txqflags=0xf00, non-vector version), > both iofwd and macfwd. I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge platform, and didn't faced such a drop. Have you tried to pass indirect_desc=off to qemu cmdline to see if you recover the performance? Yuanhan, which platform did you use when you tested it with zero copy? > > I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz. > > Could you please verify if this is true in your test? I'll try -rc1/-rc2 on my platform, and let you know. Thanks, Maxime > > > Thanks > Zhihong > >> -Original Message- >> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] >> Sent: Monday, October 17, 2016 10:15 PM >> To: Yuanhan Liu >> Cc: Wang, Zhihong ; Xie, Huawei >> ; dev at dpdk.org; vkaplans at redhat.com; >> mst at redhat.com; stephen at networkplumber.org >> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support >> to the TX path >> >> >> >> On 10/17/2016 03:21 PM, Yuanhan Liu wrote: >>> On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: > On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. > I'll continue the investigation early next week. The root cause is identified. When INDIRECT_DESC feature is negotiated, Windows guest uses indirect for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & virtio-net kernel driver) use indirect only for Tx. I'll implement indirect support for the Rx path in vhost lib, but the change will be too big for -rc release. I propose in the mean time to disable INDIRECT_DESC feature in vhost lib, we can still enable it locally for testing. Yuanhan, is it ok for you? >>> >>> That's okay. >> I'll send a patch to disable it then. >> >>> > Has anyone already tested Windows guest with vhost-net, which also >> has > indirect descs support? I tested and confirm it works with vhost-net. >>> >>> I'm a bit confused then. IIRC, vhost-net also doesn't support indirect >>> for Rx path, right? >> >> No, it does support it actually. >> I thought it didn't support too, I misread the Kernel implementation of >> vhost-net and virtio-net. Acutally, virtio-net makes use of indirect >> in Rx path when mergeable buffers is disabled. >> >> The confusion certainly comes from me, sorry about that. >> >> Maxime
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
On 10/27/2016 11:10 AM, Maxime Coquelin wrote: > Hi Zhihong, > > On 10/27/2016 11:00 AM, Wang, Zhihong wrote: >> Hi Maxime, >> >> Seems indirect desc feature is causing serious performance >> degradation on Haswell platform, about 20% drop for both >> mrg=on and mrg=off (--txqflags=0xf00, non-vector version), >> both iofwd and macfwd. > I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge > platform, and didn't faced such a drop. > Have you tried to pass indirect_desc=off to qemu cmdline to see if you > recover the performance? > > Yuanhan, which platform did you use when you tested it with zero copy? > >> >> I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz. >> >> Could you please verify if this is true in your test? > I'll try -rc1/-rc2 on my platform, and let you know. As a first test, I tried again Txonly from the guest to the host (Rxonly), where Tx indirect descriptors are used, on my E5-2665 @2.40GHz: v16.11-rc1: 10.81Mpps v16.11-rc2: 10.91Mpps -rc2 is even slightly better in my case. Could you please run the same test on your platform? And could you provide me more info on your fwd bench? Do you use dpdk-pktgen on host, or you do fwd on howt with a real NIC also? Thanks, Maxime > Thanks, > Maxime > >> >> >> Thanks >> Zhihong >> >>> -Original Message- >>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] >>> Sent: Monday, October 17, 2016 10:15 PM >>> To: Yuanhan Liu >>> Cc: Wang, Zhihong ; Xie, Huawei >>> ; dev at dpdk.org; vkaplans at redhat.com; >>> mst at redhat.com; stephen at networkplumber.org >>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors >>> support >>> to the TX path >>> >>> >>> >>> On 10/17/2016 03:21 PM, Yuanhan Liu wrote: On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: >> On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. >> I'll continue the investigation early next week. > > The root cause is identified. > When INDIRECT_DESC feature is negotiated, Windows guest uses indirect > for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & > virtio-net kernel driver) use indirect only for Tx. > I'll implement indirect support for the Rx path in vhost lib, but the > change will be too big for -rc release. > I propose in the mean time to disable INDIRECT_DESC feature in vhost > lib, we can still enable it locally for testing. > > Yuanhan, is it ok for you? That's okay. >>> I'll send a patch to disable it then. >>> > >> Has anyone already tested Windows guest with vhost-net, which also >>> has >> indirect descs support? > > I tested and confirm it works with vhost-net. I'm a bit confused then. IIRC, vhost-net also doesn't support indirect for Rx path, right? >>> >>> No, it does support it actually. >>> I thought it didn't support too, I misread the Kernel implementation of >>> vhost-net and virtio-net. Acutally, virtio-net makes use of indirect >>> in Rx path when mergeable buffers is disabled. >>> >>> The confusion certainly comes from me, sorry about that. >>> >>> Maxime
[dpdk-dev] [PATCH v3] eal: fix libabi macro for device generalization patches
2016-10-27 12:38, Shreyansh Jain: > rte_device/driver generalization patches [1] were merged without a change > in the LIBABIVER macro. This patches bumps the macro of affected libs. It is not a macro but a Makefile variable. > (librte_eal was already bumped; libcryptodev and libetherdev have been > bumped). Please provide the commit id where EAL was bumped. > Details of ABI/API changes: > - EAL (version not bumped) not bumped -> already bumped > |- type field was removed from rte_driver > |- rte_pci_device now embeds rte_device > |- rte_pci_resource renamed to rte_mem_resource > |- numa_node and devargs of rte_pci_driver is moved to rte_driver > |- APIs for device hotplug (attach/detach) moved into EAL > |- API rte_eal_pci_device_name added for PCI device naming > |- vdev registration API introduced (rte_eal_vdrv_register, > | rte_eal_vdrv_unregister > > - librte_crypto (v 1=>2) > |- removed rte_cryptodev_create_unique_device_name API > |- moved device naming to EAL > > - librte_ethdev (v 4=>5) > |- rte_eth_dev_type is removed > |- removed dev_type from rte_eth_dev_allocate API > |- removed API rte_eth_dev_get_device_type > |- removed API rte_eth_dev_get_addr_by_port > |- removed API rte_eth_dev_get_port_by_addr > |- removed rte_cryptodev_create_unique_device_name API > |- moved device naming to EAL > > Also, deprecation notice from 16.07 has been removed and release notes for > 16.11 added. > > [1] http://dpdk.org/ml/archives/dev/2016-September/047087.html > > Signed-off-by: Shreyansh Jain [...] > --- a/doc/guides/rel_notes/release_16_11.rst > +++ b/doc/guides/rel_notes/release_16_11.rst > @@ -149,6 +149,32 @@ Resolved Issues It is the "Resolved Issues" section. Please move in the "API Changes" section. > EAL > ~~~ > > +* **Improved device/driver heirarchy and generalized hotplugging** typo: hierarchy > + Device and driver relationship has been restructured by introducing generic > + classes. This paves way for having PCI, VDEV and other device types as > + just instantiated objects rather than classes in themselves. Hotplugging > too > + has been generalized into EAL so that ethernet or crypto devices can use > the > + common infrastructure. > + > + * removed ``pmd_type`` as way of segragation of devices > + * moved ``numa_node`` and ``devargs`` into ``rte_driver`` from > +``rte_pci_driver``. These can now be used by any instantiated object of > +``rte_driver``. > + * added ``rte_device`` class and all PCI and VDEV devices inherit from it > + * renamed devinit/devuninit handlers to probe/remove to make it more > +semantically correct with respect to device<=>driver relationship > + * moved hotplugging support to EAL. Hereafter, PCI and vdev can use the > +APIs ``rte_eal_dev_attach`` and ``rte_eal_dev_detach``. > + * helpers and support macros have been renamed to make them more synonymous > +with their device types > +(e.g. ``PMD_REGISTER_DRIVER`` => ``DRIVER_REGISTER_PCI``) It is RTE_PMD_REGISTER_PCI > + * Device naming functions have been generalized from ethdev and cryptodev > +to EAL. ``rte_eal_pci_device_name`` has been introduced for obtaining > +unique device name from PCI Domain-BDF description. > + * Virtual device registration APIs have been added: > ``rte_eal_vdrv_register`` > +and ``rte_eal_vdrv_unregister``. Thanks
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
> -Original Message- > From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] > Sent: Thursday, October 27, 2016 5:55 PM > To: Wang, Zhihong ; Yuanhan Liu > ; stephen at networkplumber.org; Pierre > Pfister (ppfister) > Cc: Xie, Huawei ; dev at dpdk.org; > vkaplans at redhat.com; mst at redhat.com > Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support > to the TX path > > > > On 10/27/2016 11:10 AM, Maxime Coquelin wrote: > > Hi Zhihong, > > > > On 10/27/2016 11:00 AM, Wang, Zhihong wrote: > >> Hi Maxime, > >> > >> Seems indirect desc feature is causing serious performance > >> degradation on Haswell platform, about 20% drop for both > >> mrg=on and mrg=off (--txqflags=0xf00, non-vector version), > >> both iofwd and macfwd. > > I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge > > platform, and didn't faced such a drop. > > Have you tried to pass indirect_desc=off to qemu cmdline to see if you > > recover the performance? > > > > Yuanhan, which platform did you use when you tested it with zero copy? > > > >> > >> I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz. > >> > >> Could you please verify if this is true in your test? > > I'll try -rc1/-rc2 on my platform, and let you know. > As a first test, I tried again Txonly from the guest to the host (Rxonly), > where Tx indirect descriptors are used, on my E5-2665 @2.40GHz: > v16.11-rc1: 10.81Mpps > v16.11-rc2: 10.91Mpps > > -rc2 is even slightly better in my case. > Could you please run the same test on your platform? I mean to use rc2 as both host and guest, and compare the perf between indirect=0 and indirect=1. I use PVP traffic, tried both testpmd and OvS as the forwarding engine in host, and testpmd in guest. Thanks Zhihong > > And could you provide me more info on your fwd bench? > Do you use dpdk-pktgen on host, or you do fwd on howt with a real NIC > also? > > Thanks, > Maxime > > Thanks, > > Maxime > > > >> > >> > >> Thanks > >> Zhihong > >> > >>> -Original Message- > >>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] > >>> Sent: Monday, October 17, 2016 10:15 PM > >>> To: Yuanhan Liu > >>> Cc: Wang, Zhihong ; Xie, Huawei > >>> ; dev at dpdk.org; vkaplans at redhat.com; > >>> mst at redhat.com; stephen at networkplumber.org > >>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors > >>> support > >>> to the TX path > >>> > >>> > >>> > >>> On 10/17/2016 03:21 PM, Yuanhan Liu wrote: > On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: > >> On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. > >> I'll continue the investigation early next week. > > > > The root cause is identified. > > When INDIRECT_DESC feature is negotiated, Windows guest uses > indirect > > for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & > > virtio-net kernel driver) use indirect only for Tx. > > I'll implement indirect support for the Rx path in vhost lib, but the > > change will be too big for -rc release. > > I propose in the mean time to disable INDIRECT_DESC feature in vhost > > lib, we can still enable it locally for testing. > > > > Yuanhan, is it ok for you? > > That's okay. > >>> I'll send a patch to disable it then. > >>> > > > > >> Has anyone already tested Windows guest with vhost-net, which > also > >>> has > >> indirect descs support? > > > > I tested and confirm it works with vhost-net. > > I'm a bit confused then. IIRC, vhost-net also doesn't support indirect > for Rx path, right? > >>> > >>> No, it does support it actually. > >>> I thought it didn't support too, I misread the Kernel implementation of > >>> vhost-net and virtio-net. Acutally, virtio-net makes use of indirect > >>> in Rx path when mergeable buffers is disabled. > >>> > >>> The confusion certainly comes from me, sorry about that. > >>> > >>> Maxime
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote: > Hi Zhihong, > > On 10/27/2016 11:00 AM, Wang, Zhihong wrote: > >Hi Maxime, > > > >Seems indirect desc feature is causing serious performance > >degradation on Haswell platform, about 20% drop for both > >mrg=on and mrg=off (--txqflags=0xf00, non-vector version), > >both iofwd and macfwd. > I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge > platform, and didn't faced such a drop. I was actually wondering that may be the cause. I tested it with my IvyBridge server as well, I saw no drop. Maybe you should find a similar platform (Haswell) and have a try? --yliu > Have you tried to pass indirect_desc=off to qemu cmdline to see if you > recover the performance? > > Yuanhan, which platform did you use when you tested it with zero copy? > > > > >I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz. > > > >Could you please verify if this is true in your test? > I'll try -rc1/-rc2 on my platform, and let you know. > > Thanks, > Maxime > > > > > > >Thanks > >Zhihong > > > >>-Original Message- > >>From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] > >>Sent: Monday, October 17, 2016 10:15 PM > >>To: Yuanhan Liu > >>Cc: Wang, Zhihong ; Xie, Huawei > >>; dev at dpdk.org; vkaplans at redhat.com; > >>mst at redhat.com; stephen at networkplumber.org > >>Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support > >>to the TX path > >> > >> > >> > >>On 10/17/2016 03:21 PM, Yuanhan Liu wrote: > >>>On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: > >On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. > >I'll continue the investigation early next week. > > The root cause is identified. > When INDIRECT_DESC feature is negotiated, Windows guest uses indirect > for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & > virtio-net kernel driver) use indirect only for Tx. > I'll implement indirect support for the Rx path in vhost lib, but the > change will be too big for -rc release. > I propose in the mean time to disable INDIRECT_DESC feature in vhost > lib, we can still enable it locally for testing. > > Yuanhan, is it ok for you? > >>> > >>>That's okay. > >>I'll send a patch to disable it then. > >> > >>> > > >Has anyone already tested Windows guest with vhost-net, which also > >>has > >indirect descs support? > > I tested and confirm it works with vhost-net. > >>> > >>>I'm a bit confused then. IIRC, vhost-net also doesn't support indirect > >>>for Rx path, right? > >> > >>No, it does support it actually. > >>I thought it didn't support too, I misread the Kernel implementation of > >>vhost-net and virtio-net. Acutally, virtio-net makes use of indirect > >>in Rx path when mergeable buffers is disabled. > >> > >>The confusion certainly comes from me, sorry about that. > >> > >>Maxime
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
On 10/27/2016 12:33 PM, Yuanhan Liu wrote: > On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote: >> Hi Zhihong, >> >> On 10/27/2016 11:00 AM, Wang, Zhihong wrote: >>> Hi Maxime, >>> >>> Seems indirect desc feature is causing serious performance >>> degradation on Haswell platform, about 20% drop for both >>> mrg=on and mrg=off (--txqflags=0xf00, non-vector version), >>> both iofwd and macfwd. >> I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge >> platform, and didn't faced such a drop. > > I was actually wondering that may be the cause. I tested it with > my IvyBridge server as well, I saw no drop. > > Maybe you should find a similar platform (Haswell) and have a try? Yes, that's why I asked Zhihong whether he could test Txonly in guest to see if issue is reproducible like this. I will be easier for me to find an Haswell machine if it has not to be connected back to back to and HW/SW packet generator. Thanks, Maxime > > --yliu > >> Have you tried to pass indirect_desc=off to qemu cmdline to see if you >> recover the performance? >> >> Yuanhan, which platform did you use when you tested it with zero copy? >> >>> >>> I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz. >>> >>> Could you please verify if this is true in your test? >> I'll try -rc1/-rc2 on my platform, and let you know. >> >> Thanks, >> Maxime >> >>> >>> >>> Thanks >>> Zhihong >>> -Original Message- From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] Sent: Monday, October 17, 2016 10:15 PM To: Yuanhan Liu Cc: Wang, Zhihong ; Xie, Huawei ; dev at dpdk.org; vkaplans at redhat.com; mst at redhat.com; stephen at networkplumber.org Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path On 10/17/2016 03:21 PM, Yuanhan Liu wrote: > On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: >>> On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. >>> I'll continue the investigation early next week. >> >> The root cause is identified. >> When INDIRECT_DESC feature is negotiated, Windows guest uses indirect >> for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & >> virtio-net kernel driver) use indirect only for Tx. >> I'll implement indirect support for the Rx path in vhost lib, but the >> change will be too big for -rc release. >> I propose in the mean time to disable INDIRECT_DESC feature in vhost >> lib, we can still enable it locally for testing. >> >> Yuanhan, is it ok for you? > > That's okay. I'll send a patch to disable it then. > >> >>> Has anyone already tested Windows guest with vhost-net, which also has >>> indirect descs support? >> >> I tested and confirm it works with vhost-net. > > I'm a bit confused then. IIRC, vhost-net also doesn't support indirect > for Rx path, right? No, it does support it actually. I thought it didn't support too, I misread the Kernel implementation of vhost-net and virtio-net. Acutally, virtio-net makes use of indirect in Rx path when mergeable buffers is disabled. The confusion certainly comes from me, sorry about that. Maxime
[dpdk-dev] Solarflare PMD submission question
Hi, First of all, welcome to DPDK! 2016-10-27 09:34, Andrew Rybchenko: > Hi, > > we would like to include Solarflare libefx-based PMD in the DPDK 17.02 > and start the upstreaming process. > The driver supports Solarflare SFN7xxx and SFN8xxx families of 10/40 > Gbps adapters. > The driver has base driver. It is just fresh version of the same code > which is used in the FreeBSD [1], illumos [2] and some other Solarflare > drivers. Unfortunately it is common to have some big base drivers in DPDK. Note that some PMD rely on their kernel counterpart for the control path. It is a way to avoid code duplication. As far as I understand, it is easier to share queues with DPDK from kernel when the device supports an IOMMU. > The question is how to submit the base driver which is pretty big. Mail > size of the patch which imports it is about 2 Mb. First answer is a question: Have you thought about cooperating with the kernel driver for your PMD? If you really cannot use this approach, then we have to maintain this whole base driver in DPDK. It will be easier to read, understand and reference if it is a bit split. Could you try to send it as 10 to 20 patches explaining the role of each part and giving some design details? It would be also really appreciated to provide a design documentation in doc/guides/nics. Are the datasheets open? A link in the doc would help. > Further changes in the base driver will go in small patches (as it is > done, for example, in the FreeBSD). > The PMD itself is split into small and, I hope, readable and nice patches. Good to know. Thanks Please be prepare to work on several iterations of the patch series. PS: the mailing list put emails exceeding 300KB into a moderation queue.
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
On Thu, Oct 27, 2016 at 12:35:11PM +0200, Maxime Coquelin wrote: > > > On 10/27/2016 12:33 PM, Yuanhan Liu wrote: > >On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote: > >>Hi Zhihong, > >> > >>On 10/27/2016 11:00 AM, Wang, Zhihong wrote: > >>>Hi Maxime, > >>> > >>>Seems indirect desc feature is causing serious performance > >>>degradation on Haswell platform, about 20% drop for both > >>>mrg=on and mrg=off (--txqflags=0xf00, non-vector version), > >>>both iofwd and macfwd. > >>I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge > >>platform, and didn't faced such a drop. > > > >I was actually wondering that may be the cause. I tested it with > >my IvyBridge server as well, I saw no drop. > > > >Maybe you should find a similar platform (Haswell) and have a try? > Yes, that's why I asked Zhihong whether he could test Txonly in guest to > see if issue is reproducible like this. I have no Haswell box, otherwise I could do a quick test for you. IIRC, he tried to disable the indirect_desc feature, then the performance recovered. So, it's likely the indirect_desc is the culprit here. > I will be easier for me to find an Haswell machine if it has not to be > connected back to back to and HW/SW packet generator. Makes sense. --yliu > > Thanks, > Maxime > > > > > --yliu > > > >>Have you tried to pass indirect_desc=off to qemu cmdline to see if you > >>recover the performance? > >> > >>Yuanhan, which platform did you use when you tested it with zero copy? > >> > >>> > >>>I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz. > >>> > >>>Could you please verify if this is true in your test? > >>I'll try -rc1/-rc2 on my platform, and let you know. > >> > >>Thanks, > >>Maxime > >> > >>> > >>> > >>>Thanks > >>>Zhihong > >>> > -Original Message- > From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] > Sent: Monday, October 17, 2016 10:15 PM > To: Yuanhan Liu > Cc: Wang, Zhihong ; Xie, Huawei > ; dev at dpdk.org; vkaplans at redhat.com; > mst at redhat.com; stephen at networkplumber.org > Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support > to the TX path > > > > On 10/17/2016 03:21 PM, Yuanhan Liu wrote: > >On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: > >>>On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. > >>>I'll continue the investigation early next week. > >> > >>The root cause is identified. > >>When INDIRECT_DESC feature is negotiated, Windows guest uses indirect > >>for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & > >>virtio-net kernel driver) use indirect only for Tx. > >>I'll implement indirect support for the Rx path in vhost lib, but the > >>change will be too big for -rc release. > >>I propose in the mean time to disable INDIRECT_DESC feature in vhost > >>lib, we can still enable it locally for testing. > >> > >>Yuanhan, is it ok for you? > > > >That's okay. > I'll send a patch to disable it then. > > > > >> > >>>Has anyone already tested Windows guest with vhost-net, which also > has > >>>indirect descs support? > >> > >>I tested and confirm it works with vhost-net. > > > >I'm a bit confused then. IIRC, vhost-net also doesn't support indirect > >for Rx path, right? > > No, it does support it actually. > I thought it didn't support too, I misread the Kernel implementation of > vhost-net and virtio-net. Acutally, virtio-net makes use of indirect > in Rx path when mergeable buffers is disabled. > > The confusion certainly comes from me, sorry about that. > > Maxime
[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path
On 10/27/2016 12:33 PM, Yuanhan Liu wrote: > On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote: >> Hi Zhihong, >> >> On 10/27/2016 11:00 AM, Wang, Zhihong wrote: >>> Hi Maxime, >>> >>> Seems indirect desc feature is causing serious performance >>> degradation on Haswell platform, about 20% drop for both >>> mrg=on and mrg=off (--txqflags=0xf00, non-vector version), >>> both iofwd and macfwd. >> I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge >> platform, and didn't faced such a drop. > > I was actually wondering that may be the cause. I tested it with > my IvyBridge server as well, I saw no drop. Sorry, mine is a SandyBridge, not IvyBridge. > > Maybe you should find a similar platform (Haswell) and have a try? > > --yliu > >> Have you tried to pass indirect_desc=off to qemu cmdline to see if you >> recover the performance? >> >> Yuanhan, which platform did you use when you tested it with zero copy? >> >>> >>> I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz. >>> >>> Could you please verify if this is true in your test? >> I'll try -rc1/-rc2 on my platform, and let you know. >> >> Thanks, >> Maxime >> >>> >>> >>> Thanks >>> Zhihong >>> -Original Message- From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com] Sent: Monday, October 17, 2016 10:15 PM To: Yuanhan Liu Cc: Wang, Zhihong ; Xie, Huawei ; dev at dpdk.org; vkaplans at redhat.com; mst at redhat.com; stephen at networkplumber.org Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path On 10/17/2016 03:21 PM, Yuanhan Liu wrote: > On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote: >>> On my side, I just setup 2 Windows 2016 VMs, and confirm the issue. >>> I'll continue the investigation early next week. >> >> The root cause is identified. >> When INDIRECT_DESC feature is negotiated, Windows guest uses indirect >> for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD & >> virtio-net kernel driver) use indirect only for Tx. >> I'll implement indirect support for the Rx path in vhost lib, but the >> change will be too big for -rc release. >> I propose in the mean time to disable INDIRECT_DESC feature in vhost >> lib, we can still enable it locally for testing. >> >> Yuanhan, is it ok for you? > > That's okay. I'll send a patch to disable it then. > >> >>> Has anyone already tested Windows guest with vhost-net, which also has >>> indirect descs support? >> >> I tested and confirm it works with vhost-net. > > I'm a bit confused then. IIRC, vhost-net also doesn't support indirect > for Rx path, right? No, it does support it actually. I thought it didn't support too, I misread the Kernel implementation of vhost-net and virtio-net. Acutally, virtio-net makes use of indirect in Rx path when mergeable buffers is disabled. The confusion certainly comes from me, sorry about that. Maxime
[dpdk-dev] [PATCH v3] eal: fix libabi macro for device generalization patches
Hello Thomas, On Thursday 27 October 2016 03:45 PM, Thomas Monjalon wrote: > 2016-10-27 12:38, Shreyansh Jain: >> rte_device/driver generalization patches [1] were merged without a change >> in the LIBABIVER macro. This patches bumps the macro of affected libs. > > It is not a macro but a Makefile variable. Yes, I will change that. > >> (librte_eal was already bumped; libcryptodev and libetherdev have been >> bumped). > > Please provide the commit id where EAL was bumped. Ok. Will do. > >> Details of ABI/API changes: >> - EAL (version not bumped) > > not bumped -> already bumped Ok. > >> |- type field was removed from rte_driver >> |- rte_pci_device now embeds rte_device >> |- rte_pci_resource renamed to rte_mem_resource >> |- numa_node and devargs of rte_pci_driver is moved to rte_driver >> |- APIs for device hotplug (attach/detach) moved into EAL >> |- API rte_eal_pci_device_name added for PCI device naming >> |- vdev registration API introduced (rte_eal_vdrv_register, >> | rte_eal_vdrv_unregister >> >> - librte_crypto (v 1=>2) >> |- removed rte_cryptodev_create_unique_device_name API >> |- moved device naming to EAL >> >> - librte_ethdev (v 4=>5) >> |- rte_eth_dev_type is removed >> |- removed dev_type from rte_eth_dev_allocate API >> |- removed API rte_eth_dev_get_device_type >> |- removed API rte_eth_dev_get_addr_by_port >> |- removed API rte_eth_dev_get_port_by_addr >> |- removed rte_cryptodev_create_unique_device_name API >> |- moved device naming to EAL >> >> Also, deprecation notice from 16.07 has been removed and release notes for >> 16.11 added. >> >> [1] http://dpdk.org/ml/archives/dev/2016-September/047087.html >> >> Signed-off-by: Shreyansh Jain > [...] >> --- a/doc/guides/rel_notes/release_16_11.rst >> +++ b/doc/guides/rel_notes/release_16_11.rst >> @@ -149,6 +149,32 @@ Resolved Issues > > It is the "Resolved Issues" section. > Please move in the "API Changes" section. Ok. > >> EAL >> ~~~ >> >> +* **Improved device/driver heirarchy and generalized hotplugging** > > typo: hierarchy Yes. > >> + Device and driver relationship has been restructured by introducing >> generic >> + classes. This paves way for having PCI, VDEV and other device types as >> + just instantiated objects rather than classes in themselves. Hotplugging >> too >> + has been generalized into EAL so that ethernet or crypto devices can use >> the >> + common infrastructure. >> + >> + * removed ``pmd_type`` as way of segragation of devices >> + * moved ``numa_node`` and ``devargs`` into ``rte_driver`` from >> +``rte_pci_driver``. These can now be used by any instantiated object of >> +``rte_driver``. >> + * added ``rte_device`` class and all PCI and VDEV devices inherit from it >> + * renamed devinit/devuninit handlers to probe/remove to make it more >> +semantically correct with respect to device<=>driver relationship >> + * moved hotplugging support to EAL. Hereafter, PCI and vdev can use the >> +APIs ``rte_eal_dev_attach`` and ``rte_eal_dev_detach``. >> + * helpers and support macros have been renamed to make them more >> synonymous >> +with their device types >> +(e.g. ``PMD_REGISTER_DRIVER`` => ``DRIVER_REGISTER_PCI``) > > It is RTE_PMD_REGISTER_PCI It seems my Friday is earlier than usual :( I was the one who changed it and I completely forgot about it. > >> + * Device naming functions have been generalized from ethdev and cryptodev >> +to EAL. ``rte_eal_pci_device_name`` has been introduced for obtaining >> +unique device name from PCI Domain-BDF description. >> + * Virtual device registration APIs have been added: >> ``rte_eal_vdrv_register`` >> +and ``rte_eal_vdrv_unregister``. > > Thanks > I am sending v4 soon. - Shreyansh
[dpdk-dev] [PATCH v4] eal: fix lib version for device generalization patches
rte_device/driver generalization patches [1] were merged without a change in the LIBABIVER variable. This patches bumps the macro of affected libs: - libcryptodev and libetherdev have been bumped - librte_eal version changed in d7e61ad3ae36 ("log: remove deprecated history dump") Details of ABI/API changes: - EAL [version already bumped in: d7e61ad3ae36] |- type field was removed from rte_driver |- rte_pci_device now embeds rte_device |- rte_pci_resource renamed to rte_mem_resource |- numa_node and devargs of rte_pci_driver is moved to rte_driver |- APIs for device hotplug (attach/detach) moved into EAL |- API rte_eal_pci_device_name added for PCI device naming |- vdev registration API introduced (rte_eal_vdrv_register, | rte_eal_vdrv_unregister - librte_crypto (v 1=>2) |- removed rte_cryptodev_create_unique_device_name API |- moved device naming to EAL - librte_ethdev (v 4=>5) |- rte_eth_dev_type is removed |- removed dev_type from rte_eth_dev_allocate API |- removed API rte_eth_dev_get_device_type |- removed API rte_eth_dev_get_addr_by_port |- removed API rte_eth_dev_get_port_by_addr |- removed rte_cryptodev_create_unique_device_name API |- moved device naming to EAL Also, deprecation notice from 16.07 has been removed and release notes for 16.11 added. [1] http://dpdk.org/ml/archives/dev/2016-September/047087.html Signed-off-by: Shreyansh Jain -- v4: - fix spelling mistakes and incorrect symbol name in doc - reword commit log for EAL modification commit id v3: - add API/ABI change info in commit log - fix library version change notification in release note - fix erroneous change to librte_eal version in v2 --- doc/guides/rel_notes/deprecation.rst | 12 doc/guides/rel_notes/release_16_11.rst | 30 -- lib/librte_cryptodev/Makefile | 2 +- lib/librte_ether/Makefile | 2 +- 4 files changed, 30 insertions(+), 16 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index d5c1490..884a231 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -18,18 +18,6 @@ Deprecation Notices ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of segments limit to be transmitted by device for TSO/non-TSO packets. -* The ethdev hotplug API is going to be moved to EAL with a notification - mechanism added to crypto and ethdev libraries so that hotplug is now - available to both of them. This API will be stripped of the device arguments - so that it only cares about hotplugging. - -* Structures embodying pci and vdev devices are going to be reworked to - integrate new common rte_device / rte_driver objects (see - http://dpdk.org/ml/archives/dev/2016-January/031390.html). - ethdev and crypto libraries will then only handle those objects so that they - do not need to care about the kind of devices that are being used, making it - easier to add new buses later. - * ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and ``nb_segs`` in one operation, because some platforms have an overhead if the diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst index aa0c09a..db20567 100644 --- a/doc/guides/rel_notes/release_16_11.rst +++ b/doc/guides/rel_notes/release_16_11.rst @@ -201,6 +201,32 @@ API Changes * The ``file_name`` data type of ``struct rte_port_source_params`` and ``struct rte_port_sink_params`` is changed from `char *`` to ``const char *``. +* **Improved device/driver hierarchy and generalized hotplugging** + + Device and driver relationship has been restructured by introducing generic + classes. This paves way for having PCI, VDEV and other device types as + just instantiated objects rather than classes in themselves. Hotplugging too + has been generalized into EAL so that ethernet or crypto devices can use the + common infrastructure. + + * removed ``pmd_type`` as way of segregation of devices + * moved ``numa_node`` and ``devargs`` into ``rte_driver`` from +``rte_pci_driver``. These can now be used by any instantiated object of +``rte_driver``. + * added ``rte_device`` class and all PCI and VDEV devices inherit from it + * renamed devinit/devuninit handlers to probe/remove to make it more +semantically correct with respect to device<=>driver relationship + * moved hotplugging support to EAL. Hereafter, PCI and vdev can use the +APIs ``rte_eal_dev_attach`` and ``rte_eal_dev_detach``. + * helpers and support macros have been renamed to make them more synonymous +with their device types +(e.g. ``PMD_REGISTER_DRIVER`` => ``RTE_PMD_REGISTER_PCI``) + * Device naming functions have been generalized from ethdev and cryptodev +to EAL. ``rte_eal_pci_device_name`` has been introduced for obtaining +un
[dpdk-dev] [PATCH v4] eal: fix lib version for device generalization patches
On Thursday 27 October 2016 04:59 PM, Shreyansh Jain wrote: > index aa0c09a..db20567 100644 > --- a/doc/guides/rel_notes/release_16_11.rst > +++ b/doc/guides/rel_notes/release_16_11.rst > @@ -201,6 +201,32 @@ API Changes > * The ``file_name`` data type of ``struct rte_port_source_params`` and >``struct rte_port_sink_params`` is changed from `char *`` to ``const char > *``. > > +* **Improved device/driver hierarchy and generalized hotplugging** > + > + Device and driver relationship has been restructured by introducing generic > + classes. This paves way for having PCI, VDEV and other device types as > + just instantiated objects rather than classes in themselves. Hotplugging > too > + has been generalized into EAL so that ethernet or crypto devices can use > the > + common infrastructure. > + > + * removed ``pmd_type`` as way of segregation of devices > + * moved ``numa_node`` and ``devargs`` into ``rte_driver`` from > +``rte_pci_driver``. These can now be used by any instantiated object of > +``rte_driver``. > + * added ``rte_device`` class and all PCI and VDEV devices inherit from it > + * renamed devinit/devuninit handlers to probe/remove to make it more > +semantically correct with respect to device<=>driver relationship > + * moved hotplugging support to EAL. Hereafter, PCI and vdev can use the > +APIs ``rte_eal_dev_attach`` and ``rte_eal_dev_detach``. > + * helpers and support macros have been renamed to make them more synonymous > +with their device types > +(e.g. ``PMD_REGISTER_DRIVER`` => ``RTE_PMD_REGISTER_PCI``) > + * Device naming functions have been generalized from ethdev and cryptodev > +to EAL. ``rte_eal_pci_device_name`` has been introduced for obtaining > +unique device name from PCI Domain-BDF description. > + * Virtual device registration APIs have been added: > ``rte_eal_vdrv_register`` > +and ``rte_eal_vdrv_unregister``. > + > > ABI Changes > --- Even though I have sent the v4, there is another possibility of splitting this log across API and ABI changes. Problem is that most of the changes are quite related in terms of impact on ABI and API. (some like rte_device is clear enough, though). Any suggestions? Would repetitions be OK in release notes? - Shreyansh
[dpdk-dev] [PATCH v4] eal: fix lib version for device generalization patches
2016-10-27 17:02, Shreyansh Jain: > Even though I have sent the v4, there is another possibility of > splitting this log across API and ABI changes. > Problem is that most of the changes are quite related in terms of impact > on ABI and API. (some like rte_device is clear enough, though). > Any suggestions? Would repetitions be OK in release notes? In general, API change implies ABI change. I think we must use the "ABI changes" section for cases where API is not changed. No need of repeating in both sections.
[dpdk-dev] Tcpdump
Hi, I have a DPDK application that binds to an interface and processes packets. For debugging purposes I want to run tcpdump on this interface. IYO, what is my best option with hurting the performance of the application too much? TIA, Dror
[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
On 10/26/2016 02:56 PM, Tomasz Kulasek wrote: > Added API for `rte_eth_tx_prep` > > [...] > > Signed-off-by: Tomasz Kulasek Acked-by: Olivier Matz
[dpdk-dev] [PATCH] pci: Don't call probe callback if driver already loaded.
Hello Benjamin, On Tue, Oct 25, 2016 at 11:50 PM, Ben Walker wrote: > If the user asks to probe multiple times, the probe > callback should only be called on devices that don't have > a driver already loaded. > > This is useful if a driver is registered after the > execution of a program has started and the list of devices > needs to be re-scanned. Why not use the hotplug api, attaching explicitely one pci device ? > Signed-off-by: Ben Walker > --- > lib/librte_eal/common/eal_common_pci.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/lib/librte_eal/common/eal_common_pci.c > b/lib/librte_eal/common/eal_common_pci.c > index 638cd86..971ad20 100644 > --- a/lib/librte_eal/common/eal_common_pci.c > +++ b/lib/librte_eal/common/eal_common_pci.c > @@ -289,6 +289,10 @@ pci_probe_all_drivers(struct rte_pci_device *dev) > if (dev == NULL) > return -1; > > + /* Check if a driver is already loaded */ > + if (dev->driver != NULL) > + return 0; > + This can do the trick, yes. To be safe, I think we are missing a check in rte_eal_pci_probe_one_driver() so that dev->driver is only set when the probe function from the driver did succeed. -- David Marchand
[dpdk-dev] [PATCH] pci: Don't call probe callback if driver already loaded.
On Thu, Oct 27, 2016 at 3:28 PM, David Marchand wrote: > On Tue, Oct 25, 2016 at 11:50 PM, Ben Walker > wrote: >> If the user asks to probe multiple times, the probe >> callback should only be called on devices that don't have >> a driver already loaded. >> >> This is useful if a driver is registered after the >> execution of a program has started and the list of devices >> needs to be re-scanned. > > Why not use the hotplug api, attaching explicitely one pci device ? Ah, scratch that. I've been too quick to reply. Ok, you are loading a new driver. -- David Marchand
[dpdk-dev] Unable to change source MAC address of packet
Hi, I am crafting a packet in which the source MAC address as set in the Ethernet header is different than the transmit port?s default MAC address. A packet capture of the packets coming out of this port however comes with source MAC address of the port?s default MAC address. Altering the destination MAC address works fine and shows up correctly in packet capture. The underlying network interface is an i210 and some logs added to the eth_igb_xmit_pkts function show that the packets I have crafted indeed are reaching the driver with the source MAC address set in the packet code of the application. How can I disable this automatic source MAC address setting? Thanks, Padam
[dpdk-dev] Unable to change source MAC address of packet
> On Oct 27, 2016, at 6:33 AM, Padam Jeet Singh > wrote: > > Hi, > > I am crafting a packet in which the source MAC address as set in the Ethernet > header is different than the transmit port?s default MAC address. A packet > capture of the packets coming out of this port however comes with source MAC > address of the port?s default MAC address. > > Altering the destination MAC address works fine and shows up correctly in > packet capture. > > The underlying network interface is an i210 and some logs added to the > eth_igb_xmit_pkts function show that the packets I have crafted indeed are > reaching the driver with the source MAC address set in the packet code of the > application. > > How can I disable this automatic source MAC address setting? The packets sent with rte_eth_tx_burst() are not forced to a give MAC address. If you are using something on top of DPDK like Pktgen or OVS or something, then it may try to force a source MAC address. Maybe the hardware does it, but we need to know the NIC being used and then someone maybe able to answer. I do not know of any Intel NICs do that. Is this what you are doing. > > Thanks, > Padam Regards, Keith
[dpdk-dev] Unable to change source MAC address of packet
> On 27-Oct-2016, at 7:37 pm, Wiles, Keith wrote: > > >> On Oct 27, 2016, at 6:33 AM, Padam Jeet Singh >> wrote: >> >> Hi, >> >> I am crafting a packet in which the source MAC address as set in the >> Ethernet header is different than the transmit port?s default MAC address. A >> packet capture of the packets coming out of this port however comes with >> source MAC address of the port?s default MAC address. >> >> Altering the destination MAC address works fine and shows up correctly in >> packet capture. >> >> The underlying network interface is an i210 and some logs added to the >> eth_igb_xmit_pkts function show that the packets I have crafted indeed are >> reaching the driver with the source MAC address set in the packet code of >> the application. >> >> How can I disable this automatic source MAC address setting? > > The packets sent with rte_eth_tx_burst() are not forced to a give MAC > address. If you are using something on top of DPDK like Pktgen or OVS or > something, then it may try to force a source MAC address. No? not using pktgen or OVS. Plain simple code to take a packets from a KNI, change source mac address on all received packets, and then tx_burst them to a port. > Maybe the hardware does it, but we need to know the NIC being used and then > someone maybe able to answer. I do not know of any Intel NICs do that. Intel i210 NIC (gigabit Ethernet) is being used. I have gone through the i210 documentation and can?t see anything specific to setting of MAC address in hardware for TX side. For RX side there are validations like MAC filtering, but nothing over TX. > > Is this what you are doing. I agree that rte_eth_tx_burst does not overwrite the source MAC as I was able to trace all the way to the IGB driver that source mac makes it intact. There is no offload flags enabled in the mbuf. Yet the packets to the other side comes out as with source mac address of the port. Is there any standard DPDK app which crafts packets with different source MAC than the port?s physical mac? (I checked the l2fwd example loads the port mac before transmitting and then uses the same in TX function). > >> >> Thanks, >> Padam > > Regards, > Keith >
[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
Hi Tomasz, This is a major new function in the API and I still have some comments. 2016-10-26 14:56, Tomasz Kulasek: > --- a/config/common_base > +++ b/config/common_base > +CONFIG_RTE_ETHDEV_TX_PREP=y We cannot enable it until it is implemented in every drivers. > struct rte_eth_dev { > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */ > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */ > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare > function. */ > struct rte_eth_dev_data *data; /**< Pointer to device data */ > const struct eth_driver *driver;/**< Driver for this device */ > const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */ Could you confirm why tx_pkt_prep is not in dev_ops? I guess we want to have several implementations? Shouldn't we have a const struct control_dev_ops and a struct datapath_dev_ops? > +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf > **tx_pkts, > + uint16_t nb_pkts) The word "prep" can be understood as "prepend". Why not rte_eth_tx_prepare? > +/** > + * Fix pseudo header checksum > + * > + * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in > + * provided mbufs packet data. > + * > + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and > set > + * in packet data, > + * - for TSO the IP payload length is not included in pseudo header. > + * > + * This function expects that used headers are in the first data segment of > + * mbuf, are not fragmented and can be safely modified. What happens otherwise? > + * > + * @param m > + * The packet mbuf to be fixed. > + * @return > + * 0 if checksum is initialized properly > + */ > +static inline int > +rte_phdr_cksum_fix(struct rte_mbuf *m) Could we find a better name for this function? - About the prefix, rte_ip_ ? - About the scope, where this phdr_cksum is specified? Isn't it an intel_phdr_cksum to match what hardware expects? - About the verb, is it really fixing something broken? Or just writing into a mbuf? I would suggest rte_ip_intel_cksum_prepare.
[dpdk-dev] [PATCH] doc: fix mlx5 features overview
Fixes: 75ef62a94301 ("net/mlx5: fix link speed capability information") Fixes: 188408719888 ("net/mlx5: fix support for newer link speeds") Signed-off-by: Nelio Laranjeiro --- doc/guides/nics/features/mlx5.ini | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini index e84612f..f811e3f 100644 --- a/doc/guides/nics/features/mlx5.ini +++ b/doc/guides/nics/features/mlx5.ini @@ -4,6 +4,7 @@ ; Refer to default.ini for the full list of available PMD features. ; [Features] +Speed capabilities = Y Link status = Y Link status event= Y Queue start/stop = Y -- 2.1.4
[dpdk-dev] [PATCH v6 00/21] Introduce SoC device/driver framework for EAL
Introduction: = This patch set is direct derivative of Jan's original series [1],[2]. - This version is based on master HEAD (ca41215) - In this, I am merging the series [11] back. It was initially part of this set but I had split considering that those changes in PCI were good standalone as well. But, 1) not much feedback was avail- able and 2) this patchset is a use-case for those patches making it easier to review. Just like what Jan had intended in original series. - SoC support is not enabled by default. It needs the 'enable-soc' toggle on command line. This is primarily because this patchset is still experimental and we would like to keep it isolated from non-SoC ops. Though, it does impact the ABI. Aim: As of now EAL is primarly focused on PCI initialization/probing. rte_eal_init() |- rte_eal_pci_init(): Find PCI devices from sysfs |- ... |- rte_eal_memzone_init() |- ... `- rte_eal_pci_probe(): Driver<=>Device initialization This patchset introduces SoC framework which would enable SoC drivers and drivers to be plugged into EAL, very similar to how PCI drivers/devices are done today. This is a stripped down version of PCI framework which allows the SoC PMDs to implement their own routines for detecting devices and linking devices to drivers. 1) Changes to EAL rte_eal_init() |- rte_eal_pci_init(): Find PCI devices from sysfs |- rte_eal_soc_init(): Calls PMDs->scan_fn |- ... |- rte_eal_memzone_init() |- ... |- rte_eal_pci_probe(): Driver<=>Device initialization, PMD->devinit() `- rte_eal_soc_probe(): Calls PMDs->match_fn and PMDs->devinit(); 2) New device/driver structures: - rte_soc_driver (inheriting rte_driver) - rte_soc_device (inheriting rte_device) - rte_eth_dev and eth_driver embedded rte_soc_device and rte_soc_driver, respectively. 3) The SoC PMDs need to: - define rte_soc_driver with necessary scan and match callbacks - Register themselves using DRIVER_REGISTER_SOC() - Implement respective bus scanning in the scan callbacks to add necessary devices to SoC device list - Implement necessary eth_dev_init/uninint for ethernet instances 4) Design considerations that are same as PCI: - SoC initialization is being done through rte_eal_init(), just after PCI initialization is done. - As in case of PCI, probe is done after rte_eal_pci_probe() to link the devices detected with the drivers registered. - Device attach/detach functions are available and have been designed on the lines of PCI framework. - PMDs register using DRIVER_REGISTER_SOC, very similar to DRIVER_REGISTER_PCI for PCI devices. - Linked list of SoC driver and devices exists independent of the other driver/device list, but inheriting rte_driver/rte_driver, these are also part of a global list. 5) Design considerations that are different from PCI: - Each driver implements its own scan and match function. PCI uses the BDF format to read the device from sysfs, but this _may_not_ be a case for a SoC ethernet device. = This is an important change from initial proposal by Jan in [2]. Unlike his attempt to use /sys/bus/platform, this patch relies on the PMD to detect the devices. This is because SoC may require specific or additional info for device detection. Further, SoC may have embedded devices/MACs which require initialization which cannot be covered through sysfs parsing. `-> Point (6) below is a side note to above. = PCI based PMDs rely on EAL's capability to detect devices. This proposal puts the onus on PMD to detect devices, add to soc_device_list and wait for Probe. Matching, of device<=>driver is again PMD's callback. 6) Adding default scan and match helpers for PMDs - The design warrrants the PMDs implement their own scan of devices on bus, and match routines for probe implementation. This patch introduces helpers which can be used by PMDs for scan of the platform bus and matching devices against the compatible string extracted from the scan. - Intention is to make it easier to integrate known SoC which expose platform bus compliant information (compat, sys/bus/platform...). - PMDs which have deviations from this standard model can implement and hook their bus scanning and probe match callbacks while registering driver. Patchset Overview: == - Patches 0001~0004 are from [11] - moving some PCI specific functions and definitions to non-PCI area. - Patches 0005~0008 introduce the base infrastructure and test case - Patch 0009 is for command line support for no-soc, on lines of no-pci - Patch 0010 enables EAL to handle SoC type devices - Patch 0011 adds support for scan and probe callbacks and updates the test framework with relevant test case. - Patch 0012~0014 enable device argument, driver specific flags and interrupt handling related basic infra. Subsequent patches build up on them. - Patch 0015~0016 add suppor
[dpdk-dev] [PATCH v6 01/21] eal: generalize PCI kernel driver enum to EAL
From: Jan Viktorin Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain -- Changes since v0: - fix compilation error due to missing include --- lib/librte_eal/common/include/rte_dev.h | 12 lib/librte_eal/common/include/rte_pci.h | 9 - 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index 8840380..6975b9f 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -109,6 +109,18 @@ struct rte_mem_resource { void *addr; /**< Virtual address, NULL when not mapped. */ }; +/** + * Kernel driver passthrough type + */ +enum rte_kernel_driver { + RTE_KDRV_UNKNOWN = 0, + RTE_KDRV_IGB_UIO, + RTE_KDRV_VFIO, + RTE_KDRV_UIO_GENERIC, + RTE_KDRV_NIC_UIO, + RTE_KDRV_NONE, +}; + /** Double linked list of device drivers. */ TAILQ_HEAD(rte_driver_list, rte_driver); /** Double linked list of devices. */ diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index 9ce8847..2c7046f 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -135,15 +135,6 @@ struct rte_pci_addr { struct rte_devargs; -enum rte_kernel_driver { - RTE_KDRV_UNKNOWN = 0, - RTE_KDRV_IGB_UIO, - RTE_KDRV_VFIO, - RTE_KDRV_UIO_GENERIC, - RTE_KDRV_NIC_UIO, - RTE_KDRV_NONE, -}; - /** * A structure describing a PCI device. */ -- 2.7.4
[dpdk-dev] [PATCH v6 03/21] eal/linux: generalize PCI kernel unbinding driver to EAL
From: Jan Viktorin Generalize the PCI-specific pci_unbind_kernel_driver. It is now divided into two parts. First, determination of the path and string identification of the device to be unbound. Second, the actual unbind operation which is generic. BSD implementation updated as ENOTSUP Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain -- Changes since v2: - update BSD support for unbind kernel driver --- lib/librte_eal/bsdapp/eal/eal.c | 7 +++ lib/librte_eal/bsdapp/eal/eal_pci.c | 4 ++-- lib/librte_eal/common/eal_private.h | 13 + lib/librte_eal/linuxapp/eal/eal.c | 26 ++ lib/librte_eal/linuxapp/eal/eal_pci.c | 33 + 5 files changed, 57 insertions(+), 26 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 35e3117..5271fc2 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -633,3 +633,10 @@ rte_eal_process_type(void) { return rte_config.process_type; } + +int +rte_eal_unbind_kernel_driver(const char *devpath __rte_unused, +const char *devid __rte_unused) +{ + return -ENOTSUP; +} diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c index 7ed0115..703f034 100644 --- a/lib/librte_eal/bsdapp/eal/eal_pci.c +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c @@ -89,11 +89,11 @@ /* unbind kernel driver for this device */ int -pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused) +pci_unbind_kernel_driver(struct rte_pci_device *dev) { RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not implemented " "for BSD\n"); - return -ENOTSUP; + return rte_eal_unbind_kernel_driver(dev); } /* Map pci device */ diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 9e7d8f6..b0c208a 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -256,6 +256,19 @@ int rte_eal_alarm_init(void); int rte_eal_check_module(const char *module_name); /** + * Unbind kernel driver bound to the device specified by the given devpath, + * and its string identification. + * + * @param devpath path to the device directory ("/sys/.../devices/") + * @param devididentification of the device () + * + * @return + * -1 unbind has failed + * 0 module has been unbound + */ +int rte_eal_unbind_kernel_driver(const char *devpath, const char *devid); + +/** * Get cpu core_id. * * This function is private to the EAL. diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 2075282..5f6676d 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -943,3 +943,29 @@ rte_eal_check_module(const char *module_name) /* Module has been found */ return 1; } + +int +rte_eal_unbind_kernel_driver(const char *devpath, const char *devid) +{ + char filename[PATH_MAX]; + FILE *f; + + snprintf(filename, sizeof(filename), +"%s/driver/unbind", devpath); + + f = fopen(filename, "w"); + if (f == NULL) /* device was not bound */ + return 0; + + if (fwrite(devid, strlen(devid), 1, f) == 0) { + RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__, + filename); + goto error; + } + + fclose(f); + return 0; +error: + fclose(f); + return -1; +} diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index 876ba38..a03553f 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -59,38 +59,23 @@ int pci_unbind_kernel_driver(struct rte_pci_device *dev) { int n; - FILE *f; - char filename[PATH_MAX]; - char buf[BUFSIZ]; + char devpath[PATH_MAX]; + char devid[BUFSIZ]; struct rte_pci_addr *loc = &dev->addr; - /* open /sys/bus/pci/devices/:BB:CC.D/driver */ - snprintf(filename, sizeof(filename), - "%s/" PCI_PRI_FMT "/driver/unbind", pci_get_sysfs_path(), + /* devpath /sys/bus/pci/devices/:BB:CC.D */ + snprintf(devpath, sizeof(devpath), + "%s/" PCI_PRI_FMT, pci_get_sysfs_path(), loc->domain, loc->bus, loc->devid, loc->function); - f = fopen(filename, "w"); - if (f == NULL) /* device was not bound */ - return 0; - - n = snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n", + n = snprintf(devid, sizeof(devid), PCI_PRI_FMT "\n", loc->domain, loc->bus, loc->devid, loc->function); - if ((n < 0) || (n >= (int)sizeof(buf))) { + if ((n < 0) || (n >= (int)sizeof(devid))) { RTE_LOG(ERR, EAL, "%s(): snprintf failed\n", __func__); - goto error; - } - if (fwrit
[dpdk-dev] [PATCH v6 04/21] eal/linux: generalize PCI kernel driver extraction to EAL
From: Jan Viktorin Generalize the PCI-specific pci_get_kernel_driver_by_path. The function is general enough, we have just moved it to eal.c, changed the prefix to rte_eal and provided it privately to other parts of EAL. Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain --- lib/librte_eal/bsdapp/eal/eal.c | 7 +++ lib/librte_eal/common/eal_private.h | 14 ++ lib/librte_eal/linuxapp/eal/eal.c | 29 + lib/librte_eal/linuxapp/eal/eal_pci.c | 31 +-- 4 files changed, 51 insertions(+), 30 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 5271fc2..9b93da3 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -640,3 +640,10 @@ rte_eal_unbind_kernel_driver(const char *devpath __rte_unused, { return -ENOTSUP; } + +int +rte_eal_get_kernel_driver_by_path(const char *filename __rte_unused, + char *dri_name __rte_unused) +{ + return -ENOTSUP; +} diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index b0c208a..c8c2131 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -269,6 +269,20 @@ int rte_eal_check_module(const char *module_name); int rte_eal_unbind_kernel_driver(const char *devpath, const char *devid); /** + * Extract the kernel driver name from the absolute path to the driver. + * + * @param filename path to the driver ("/driver") + * @path dri_name target buffer where to place the driver name + * (should be at least PATH_MAX long) + * + * @return + * -1 on failure + * 0 when successful + * 1 when there is no such driver + */ +int rte_eal_get_kernel_driver_by_path(const char *filename, char *dri_name); + +/** * Get cpu core_id. * * This function is private to the EAL. diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 5f6676d..00af21c 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -969,3 +969,32 @@ error: fclose(f); return -1; } + +int +rte_eal_get_kernel_driver_by_path(const char *filename, char *dri_name) +{ + int count; + char path[PATH_MAX]; + char *name; + + if (!filename || !dri_name) + return -1; + + count = readlink(filename, path, PATH_MAX); + if (count >= PATH_MAX) + return -1; + + /* For device does not have a driver */ + if (count < 0) + return 1; + + path[count] = '\0'; + + name = strrchr(path, '/'); + if (name) { + strncpy(dri_name, name + 1, strlen(name + 1) + 1); + return 0; + } + + return -1; +} diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index a03553f..e1cf9e8 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -78,35 +78,6 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev) return rte_eal_unbind_kernel_driver(devpath, devid); } -static int -pci_get_kernel_driver_by_path(const char *filename, char *dri_name) -{ - int count; - char path[PATH_MAX]; - char *name; - - if (!filename || !dri_name) - return -1; - - count = readlink(filename, path, PATH_MAX); - if (count >= PATH_MAX) - return -1; - - /* For device does not have a driver */ - if (count < 0) - return 1; - - path[count] = '\0'; - - name = strrchr(path, '/'); - if (name) { - strncpy(dri_name, name + 1, strlen(name + 1) + 1); - return 0; - } - - return -1; -} - /* Map pci device */ int rte_eal_pci_map_device(struct rte_pci_device *dev) @@ -354,7 +325,7 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t bus, /* parse driver */ snprintf(filename, sizeof(filename), "%s/driver", dirname); - ret = pci_get_kernel_driver_by_path(filename, driver); + ret = rte_eal_get_kernel_driver_by_path(filename, driver); if (ret < 0) { RTE_LOG(ERR, EAL, "Fail to get kernel driver\n"); free(dev); -- 2.7.4
[dpdk-dev] [PATCH v6 02/21] eal: generalize PCI map/unmap resource to EAL
From: Jan Viktorin The functions pci_map_resource, pci_unmap_resource are generic so the pci_* prefix can be omitted. The functions are moved to the eal_common_dev.c so they can be reused by other infrastructure. Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain --- lib/librte_eal/bsdapp/eal/eal_pci.c | 2 +- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 2 ++ lib/librte_eal/common/eal_common_dev.c | 39 + lib/librte_eal/common/eal_common_pci.c | 39 - lib/librte_eal/common/eal_common_pci_uio.c | 16 +- lib/librte_eal/common/include/rte_dev.h | 32 lib/librte_eal/common/include/rte_pci.h | 32 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 2 +- lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 5 ++-- lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 ++ 10 files changed, 89 insertions(+), 82 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c index 8b3ed88..7ed0115 100644 --- a/lib/librte_eal/bsdapp/eal/eal_pci.c +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c @@ -228,7 +228,7 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx, /* if matching map is found, then use it */ offset = res_idx * pagesz; - mapaddr = pci_map_resource(NULL, fd, (off_t)offset, + mapaddr = rte_eal_map_resource(NULL, fd, (off_t)offset, (size_t)dev->mem_resource[res_idx].len, 0); close(fd); if (mapaddr == MAP_FAILED) diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map index 2f81f7c..11d9f59 100644 --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map @@ -170,6 +170,8 @@ DPDK_16.11 { rte_delay_us_callback_register; rte_eal_dev_attach; rte_eal_dev_detach; + rte_eal_map_resource; + rte_eal_unmap_resource; rte_eal_vdrv_register; rte_eal_vdrv_unregister; diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index 4f3b493..457d227 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include @@ -151,3 +152,41 @@ err: RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n", name); return -EINVAL; } + +/* map a particular resource from a file */ +void * +rte_eal_map_resource(void *requested_addr, int fd, off_t offset, size_t size, +int additional_flags) +{ + void *mapaddr; + + /* Map the Memory resource of device */ + mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE, + MAP_SHARED | additional_flags, fd, offset); + if (mapaddr == MAP_FAILED) { + RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s" + " (%p)\n", __func__, fd, requested_addr, + (unsigned long)size, (unsigned long)offset, + strerror(errno), mapaddr); + } else + RTE_LOG(DEBUG, EAL, " Device memory mapped at %p\n", mapaddr); + + return mapaddr; +} + +/* unmap a particular resource */ +void +rte_eal_unmap_resource(void *requested_addr, size_t size) +{ + if (requested_addr == NULL) + return; + + /* Unmap the Memory resource of device */ + if (munmap(requested_addr, size)) { + RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n", + __func__, requested_addr, (unsigned long)size, + strerror(errno)); + } else + RTE_LOG(DEBUG, EAL, " Device memory unmapped at %p\n", + requested_addr); +} diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c index 638cd86..464acc1 100644 --- a/lib/librte_eal/common/eal_common_pci.c +++ b/lib/librte_eal/common/eal_common_pci.c @@ -67,7 +67,6 @@ #include #include #include -#include #include #include @@ -114,44 +113,6 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev) return NULL; } -/* map a particular resource from a file */ -void * -pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size, -int additional_flags) -{ - void *mapaddr; - - /* Map the PCI memory resource of device */ - mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE, - MAP_SHARED | additional_flags, fd, offset); - if (mapaddr == MAP_FAILED) { - RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s (%p)\n", - __func__, fd, requested_addr, - (unsigned long)size, (unsigned long)offset,
[dpdk-dev] [PATCH v6 05/21] eal: define container macro
From: Jan Viktorin Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain --- lib/librte_eal/common/include/rte_common.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/lib/librte_eal/common/include/rte_common.h b/lib/librte_eal/common/include/rte_common.h index db5ac91..8152bd9 100644 --- a/lib/librte_eal/common/include/rte_common.h +++ b/lib/librte_eal/common/include/rte_common.h @@ -331,6 +331,24 @@ rte_bsf32(uint32_t v) #define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER) #endif +/** + * Return pointer to the wrapping struct instance. + * Example: + * + * struct wrapper { + * ... + * struct child c; + * ... + * }; + * + * struct child *x = obtain(...); + * struct wrapper *w = container_of(x, struct wrapper, c); + */ +#ifndef container_of +#define container_of(p, type, member) \ + ((type *) (((char *) (p)) - offsetof(type, member))) +#endif + #define _RTE_STR(x) #x /** Take a macro value and get a string version of it */ #define RTE_STR(x) _RTE_STR(x) -- 2.7.4
[dpdk-dev] [PATCH v6 06/21] eal/soc: introduce very essential SoC infra definitions
From: Jan Viktorin Define initial structures and functions for the SoC infrastructure. This patch supports only a very minimal functions for now. More features will be added in the following commits. Includes rte_device/rte_driver inheritance of rte_soc_device/rte_soc_driver. Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- app/test/Makefile | 1 + app/test/test_soc.c | 90 + lib/librte_eal/common/Makefile | 2 +- lib/librte_eal/common/eal_private.h | 4 + lib/librte_eal/common/include/rte_soc.h | 138 5 files changed, 234 insertions(+), 1 deletion(-) create mode 100644 app/test/test_soc.c create mode 100644 lib/librte_eal/common/include/rte_soc.h diff --git a/app/test/Makefile b/app/test/Makefile index 5be023a..30295af 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -77,6 +77,7 @@ APP = test # SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) := commands.c SRCS-y += test.c +SRCS-y += test_soc.c SRCS-y += resource.c SRCS-y += test_resource.c test_resource.res: test_resource.c diff --git a/app/test/test_soc.c b/app/test/test_soc.c new file mode 100644 index 000..916a863 --- /dev/null +++ b/app/test/test_soc.c @@ -0,0 +1,90 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 RehiveTech. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of RehiveTech nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include + +#include +#include +#include + +#include "test.h" + +static char *safe_strdup(const char *s) +{ + char *c = strdup(s); + + if (c == NULL) + rte_panic("failed to strdup '%s'\n", s); + + return c; +} + +static int test_compare_addr(void) +{ + struct rte_soc_addr a0; + struct rte_soc_addr a1; + struct rte_soc_addr a2; + + a0.name = safe_strdup("ethernet0"); + a0.fdt_path = NULL; + + a1.name = safe_strdup("ethernet0"); + a1.fdt_path = NULL; + + a2.name = safe_strdup("ethernet1"); + a2.fdt_path = NULL; + + TEST_ASSERT(!rte_eal_compare_soc_addr(&a0, &a1), + "Failed to compare two soc addresses that equal"); + TEST_ASSERT(rte_eal_compare_soc_addr(&a0, &a2), + "Failed to compare two soc addresses that differs"); + + free(a2.name); + free(a1.name); + free(a0.name); + return 0; +} + +static int +test_soc(void) +{ + if (test_compare_addr()) + return -1; + + return 0; +} + +REGISTER_TEST_COMMAND(soc_autotest, test_soc); diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile index dfd64aa..b414008 100644 --- a/lib/librte_eal/common/Makefile +++ b/lib/librte_eal/common/Makefile @@ -33,7 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk INC := rte_branch_prediction.h rte_common.h INC += rte_debug.h rte_eal.h rte_errno.h rte_launch.h rte_lcore.h -INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h +INC += rte_log.h rte_memory.h rte_memzone.h rte_soc.h rte_pci.h INC += rte_per_lcore.h rte_random.h INC += rte_tailq.h rte_interrupts.h rte_alarm.h INC += rte_string_fns.h rte_version.h diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index c8c2131..0e8d6f7 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -36,6 +36,7 @@ #inclu
[dpdk-dev] [PATCH v6 07/21] eal/soc: add SoC PMD register/unregister logic
From: Jan Viktorin Registeration of a SoC driver through a helper RTE_PMD_REGISTER_SOC (on the lines of RTE_PMD_REGISTER_PCI). soc_driver_list stores all the registered drivers. Test case has been introduced to verify the registration and deregistration. Signed-off-by: Jan Viktorin [Shreyansh: update PMD registration method] Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- app/test/test_soc.c | 111 lib/librte_eal/bsdapp/eal/rte_eal_version.map | 3 + lib/librte_eal/common/eal_common_soc.c | 56 lib/librte_eal/common/include/rte_soc.h | 26 ++ lib/librte_eal/linuxapp/eal/Makefile| 1 + lib/librte_eal/linuxapp/eal/rte_eal_version.map | 3 + 6 files changed, 200 insertions(+) create mode 100644 lib/librte_eal/common/eal_common_soc.c diff --git a/app/test/test_soc.c b/app/test/test_soc.c index 916a863..ac03e64 100644 --- a/app/test/test_soc.c +++ b/app/test/test_soc.c @@ -75,6 +75,108 @@ static int test_compare_addr(void) free(a2.name); free(a1.name); free(a0.name); + + return 0; +} + +/** + * Empty PMD driver based on the SoC infra. + * + * The rte_soc_device is usually wrapped in some higher-level struct + * (eth_driver). We simulate such a wrapper with an anonymous struct here. + */ +struct test_wrapper { + struct rte_soc_driver soc_drv; +}; + +struct test_wrapper empty_pmd0 = { + .soc_drv = { + .driver = { + .name = "empty_pmd0" + }, + }, +}; + +struct test_wrapper empty_pmd1 = { + .soc_drv = { + .driver = { + .name = "empty_pmd1" + }, + }, +}; + +static int +count_registered_socdrvs(void) +{ + int i; + struct rte_soc_driver *drv; + + i = 0; + TAILQ_FOREACH(drv, &soc_driver_list, next) + i += 1; + + return i; +} + +static int +test_register_unregister(void) +{ + struct rte_soc_driver *drv; + int count; + + rte_eal_soc_register(&empty_pmd0.soc_drv); + + TEST_ASSERT(!TAILQ_EMPTY(&soc_driver_list), + "No PMD is present but the empty_pmd0 should be there"); + drv = TAILQ_FIRST(&soc_driver_list); + TEST_ASSERT(!strcmp(drv->driver.name, "empty_pmd0"), + "The registered PMD is not empty_pmd0 but '%s'", + drv->driver.name); + + rte_eal_soc_register(&empty_pmd1.soc_drv); + + count = count_registered_socdrvs(); + TEST_ASSERT_EQUAL(count, 2, "Expected 2 PMDs but detected %d", count); + + rte_eal_soc_unregister(&empty_pmd0.soc_drv); + count = count_registered_socdrvs(); + TEST_ASSERT_EQUAL(count, 1, "Expected 1 PMDs but detected %d", count); + + rte_eal_soc_unregister(&empty_pmd1.soc_drv); + + printf("%s has been successful\n", __func__); + return 0; +} + +/* save real devices and drivers until the tests finishes */ +struct soc_driver_list real_soc_driver_list = + TAILQ_HEAD_INITIALIZER(real_soc_driver_list); + +static int test_soc_setup(void) +{ + struct rte_soc_driver *drv; + + /* no real drivers for the test */ + while (!TAILQ_EMPTY(&soc_driver_list)) { + drv = TAILQ_FIRST(&soc_driver_list); + rte_eal_soc_unregister(drv); + TAILQ_INSERT_TAIL(&real_soc_driver_list, drv, next); + } + + return 0; +} + +static int test_soc_cleanup(void) +{ + struct rte_soc_driver *drv; + + /* bring back real drivers after the test */ + while (!TAILQ_EMPTY(&real_soc_driver_list)) { + drv = TAILQ_FIRST(&real_soc_driver_list); + TAILQ_REMOVE(&real_soc_driver_list, drv, next); + rte_eal_soc_register(drv); + } + return 0; } @@ -84,6 +186,15 @@ test_soc(void) if (test_compare_addr()) return -1; + if (test_soc_setup()) + return -1; + + if (test_register_unregister()) + return -1; + + if (test_soc_cleanup()) + return -1; + return 0; } diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map index 11d9f59..cf6fb8e 100644 --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map @@ -171,8 +171,11 @@ DPDK_16.11 { rte_eal_dev_attach; rte_eal_dev_detach; rte_eal_map_resource; + rte_eal_soc_register; + rte_eal_soc_unregister; rte_eal_unmap_resource; rte_eal_vdrv_register; rte_eal_vdrv_unregister; + soc_driver_list; } DPDK_16.07; diff --git a/lib/librte_eal/common/eal_common_soc.c b/lib/librte_eal/common/eal_common_soc.c new file mode 100644 index 000..56135ed --- /dev/null +++ b/lib/librte_eal/common/eal_common_soc.c @@ -0,0 +1,56 @@ +/*- + * BSD LICENSE + * + * Co
[dpdk-dev] [PATCH v6 08/21] eal/soc: implement SoC device list and dump
From: Jan Viktorin SoC devices would be linked in a separate list (from PCI). This is used for probe function. A helper for dumping the device list is added. Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 2 ++ lib/librte_eal/common/eal_common_soc.c | 34 + lib/librte_eal/common/include/rte_soc.h | 9 +++ lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 ++ 4 files changed, 47 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map index cf6fb8e..86e3cfd 100644 --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map @@ -171,11 +171,13 @@ DPDK_16.11 { rte_eal_dev_attach; rte_eal_dev_detach; rte_eal_map_resource; + rte_eal_soc_dump; rte_eal_soc_register; rte_eal_soc_unregister; rte_eal_unmap_resource; rte_eal_vdrv_register; rte_eal_vdrv_unregister; + soc_device_list; soc_driver_list; } DPDK_16.07; diff --git a/lib/librte_eal/common/eal_common_soc.c b/lib/librte_eal/common/eal_common_soc.c index 56135ed..5dcddc5 100644 --- a/lib/librte_eal/common/eal_common_soc.c +++ b/lib/librte_eal/common/eal_common_soc.c @@ -31,6 +31,8 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include +#include #include #include @@ -40,6 +42,38 @@ /* Global SoC driver list */ struct soc_driver_list soc_driver_list = TAILQ_HEAD_INITIALIZER(soc_driver_list); +struct soc_device_list soc_device_list = + TAILQ_HEAD_INITIALIZER(soc_device_list); + +/* dump one device */ +static int +soc_dump_one_device(FILE *f, struct rte_soc_device *dev) +{ + int i; + + fprintf(f, "%s", dev->addr.name); + fprintf(f, " - fdt_path: %s\n", + dev->addr.fdt_path ? dev->addr.fdt_path : "(none)"); + + for (i = 0; dev->id && dev->id[i].compatible; ++i) + fprintf(f, " %s\n", dev->id[i].compatible); + + return 0; +} + +/* dump devices on the bus to an output stream */ +void +rte_eal_soc_dump(FILE *f) +{ + struct rte_soc_device *dev = NULL; + + if (!f) + return; + + TAILQ_FOREACH(dev, &soc_device_list, next) { + soc_dump_one_device(f, dev); + } +} /* register a driver */ void diff --git a/lib/librte_eal/common/include/rte_soc.h b/lib/librte_eal/common/include/rte_soc.h index 23b06a9..347e611 100644 --- a/lib/librte_eal/common/include/rte_soc.h +++ b/lib/librte_eal/common/include/rte_soc.h @@ -56,8 +56,12 @@ extern "C" { extern struct soc_driver_list soc_driver_list; /**< Global list of SoC Drivers */ +extern struct soc_device_list soc_device_list; +/**< Global list of SoC Devices */ TAILQ_HEAD(soc_driver_list, rte_soc_driver); /**< SoC drivers in D-linked Q. */ +TAILQ_HEAD(soc_device_list, rte_soc_device); /**< SoC devices in D-linked Q. */ + struct rte_soc_id { const char *compatible; /**< OF compatible specification */ @@ -142,6 +146,11 @@ rte_eal_compare_soc_addr(const struct rte_soc_addr *a0, } /** + * Dump discovered SoC devices. + */ +void rte_eal_soc_dump(FILE *f); + +/** * Register a SoC driver. */ void rte_eal_soc_register(struct rte_soc_driver *driver); diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map index ab6b985..0155025 100644 --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map @@ -175,11 +175,13 @@ DPDK_16.11 { rte_eal_dev_attach; rte_eal_dev_detach; rte_eal_map_resource; + rte_eal_soc_dump; rte_eal_soc_register; rte_eal_soc_unregister; rte_eal_unmap_resource; rte_eal_vdrv_register; rte_eal_vdrv_unregister; + soc_device_list; soc_driver_list; } DPDK_16.07; -- 2.7.4
[dpdk-dev] [PATCH v6 09/21] eal: introduce command line enable SoC option
From: Jan Viktorin Support --enable-soc. SoC support is disabled by default. Signed-off-by: Jan Viktorin [Shreyansh: Change --no-soc to --enable-soc; disabled by default] Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- doc/guides/testpmd_app_ug/run_app.rst | 4 lib/librte_eal/common/eal_common_options.c | 5 + lib/librte_eal/common/eal_internal_cfg.h | 1 + lib/librte_eal/common/eal_options.h| 2 ++ 4 files changed, 12 insertions(+) diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst index d7c5120..4dafe5f 100644 --- a/doc/guides/testpmd_app_ug/run_app.rst +++ b/doc/guides/testpmd_app_ug/run_app.rst @@ -156,6 +156,10 @@ See the DPDK Getting Started Guides for more information on these options. Use malloc instead of hugetlbfs. +* ``--enable-soc`` + +Enable SoC framework support + Testpmd Command-line Options diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 6ca8af1..2156ab3 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -75,6 +75,7 @@ const struct option eal_long_options[] = { {OPT_BASE_VIRTADDR, 1, NULL, OPT_BASE_VIRTADDR_NUM}, {OPT_CREATE_UIO_DEV,0, NULL, OPT_CREATE_UIO_DEV_NUM }, + {OPT_ENABLE_SOC,0, NULL, OPT_ENABLE_SOC_NUM }, {OPT_FILE_PREFIX, 1, NULL, OPT_FILE_PREFIX_NUM }, {OPT_HELP, 0, NULL, OPT_HELP_NUM }, {OPT_HUGE_DIR, 1, NULL, OPT_HUGE_DIR_NUM }, @@ -843,6 +844,10 @@ eal_parse_common_option(int opt, const char *optarg, break; /* long options */ + case OPT_ENABLE_SOC_NUM: + conf->enable_soc = 1; + break; + case OPT_HUGE_UNLINK_NUM: conf->hugepage_unlink = 1; break; diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h index 5f1367e..2a6e3ea 100644 --- a/lib/librte_eal/common/eal_internal_cfg.h +++ b/lib/librte_eal/common/eal_internal_cfg.h @@ -67,6 +67,7 @@ struct internal_config { unsigned hugepage_unlink; /**< true to unlink backing files */ volatile unsigned xen_dom0_support; /**< support app running on Xen Dom0*/ volatile unsigned no_pci; /**< true to disable PCI */ + volatile unsigned enable_soc; /**< true to enable SoC */ volatile unsigned no_hpet;/**< true to disable HPET */ volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping * instead of native TSC */ diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h index a881c62..6e679c3 100644 --- a/lib/librte_eal/common/eal_options.h +++ b/lib/librte_eal/common/eal_options.h @@ -49,6 +49,8 @@ enum { OPT_BASE_VIRTADDR_NUM, #define OPT_CREATE_UIO_DEV"create-uio-dev" OPT_CREATE_UIO_DEV_NUM, +#define OPT_ENABLE_SOC"enable-soc" + OPT_ENABLE_SOC_NUM, #define OPT_FILE_PREFIX "file-prefix" OPT_FILE_PREFIX_NUM, #define OPT_HUGE_DIR "huge-dir" -- 2.7.4
[dpdk-dev] [PATCH v6 10/21] eal/soc: init SoC infra from EAL
From: Jan Viktorin Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- lib/librte_eal/bsdapp/eal/Makefile| 1 + lib/librte_eal/bsdapp/eal/eal.c | 4 +++ lib/librte_eal/bsdapp/eal/eal_soc.c | 46 lib/librte_eal/common/eal_private.h | 10 +++ lib/librte_eal/linuxapp/eal/Makefile | 1 + lib/librte_eal/linuxapp/eal/eal.c | 3 ++ lib/librte_eal/linuxapp/eal/eal_soc.c | 56 +++ 7 files changed, 121 insertions(+) create mode 100644 lib/librte_eal/bsdapp/eal/eal_soc.c create mode 100644 lib/librte_eal/linuxapp/eal/eal_soc.c diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile index a15b762..42b3a2b 100644 --- a/lib/librte_eal/bsdapp/eal/Makefile +++ b/lib/librte_eal/bsdapp/eal/Makefile @@ -56,6 +56,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_memory.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_hugepage_info.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_thread.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_pci.c +SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_soc.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_debug.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 9b93da3..2d62b9d 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -64,6 +64,7 @@ #include #include #include +#include #include #include #include @@ -564,6 +565,9 @@ rte_eal_init(int argc, char **argv) if (rte_eal_pci_init() < 0) rte_panic("Cannot init PCI\n"); + if (rte_eal_soc_init() < 0) + rte_panic("Cannot init SoC\n"); + eal_check_mem_on_local_socket(); if (eal_plugins_init() < 0) diff --git a/lib/librte_eal/bsdapp/eal/eal_soc.c b/lib/librte_eal/bsdapp/eal/eal_soc.c new file mode 100644 index 000..cb297ff --- /dev/null +++ b/lib/librte_eal/bsdapp/eal/eal_soc.c @@ -0,0 +1,46 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 RehiveTech. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of RehiveTech nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include + +#include + +/* Init the SoC EAL subsystem */ +int +rte_eal_soc_init(void) +{ + return 0; +} diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 0e8d6f7..d810f9f 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -122,6 +122,16 @@ int rte_eal_pci_init(void); struct rte_soc_driver; struct rte_soc_device; +/** + * Init the SoC infra. + * + * This function is private to EAL. + * + * @return + * 0 on success, negative on error + */ +int rte_eal_soc_init(void); + struct rte_pci_driver; struct rte_pci_device; diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index a520477..59e30fa 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -65,6 +65,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_vfio_mp_sync.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci_uio.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci_vfio.c +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_soc.c SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_debug.c SRCS-$(C
[dpdk-dev] [PATCH v6 12/21] eal/soc: extend and utilize devargs
From: Jan Viktorin It is assumed that SoC Devices provided on command line are prefixed with "soc:". This patch adds parse and attach support for such devices. Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- lib/librte_eal/common/eal_common_dev.c | 27 + lib/librte_eal/common/eal_common_devargs.c | 17 lib/librte_eal/common/eal_common_soc.c | 61 - lib/librte_eal/common/include/rte_devargs.h | 8 lib/librte_eal/common/include/rte_soc.h | 24 5 files changed, 120 insertions(+), 17 deletions(-) diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index 457d227..ebbcf47 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -107,17 +107,23 @@ rte_eal_dev_init(void) int rte_eal_dev_attach(const char *name, const char *devargs) { - struct rte_pci_addr addr; + struct rte_soc_addr soc_addr; + struct rte_pci_addr pci_addr; if (name == NULL || devargs == NULL) { RTE_LOG(ERR, EAL, "Invalid device or arguments provided\n"); return -EINVAL; } - if (eal_parse_pci_DomBDF(name, &addr) == 0) { - if (rte_eal_pci_probe_one(&addr) < 0) + memset(&soc_addr, 0, sizeof(soc_addr)); + if (rte_eal_parse_soc_spec(name, &soc_addr) == 0) { + if (rte_eal_soc_probe_one(&soc_addr) < 0) { + free(soc_addr.name); + goto err; + } + } else if (eal_parse_pci_DomBDF(name, &pci_addr) == 0) { + if (rte_eal_pci_probe_one(&pci_addr) < 0) goto err; - } else { if (rte_eal_vdev_init(name, devargs)) goto err; @@ -132,15 +138,22 @@ err: int rte_eal_dev_detach(const char *name) { - struct rte_pci_addr addr; + struct rte_soc_addr soc_addr; + struct rte_pci_addr pci_addr; if (name == NULL) { RTE_LOG(ERR, EAL, "Invalid device provided.\n"); return -EINVAL; } - if (eal_parse_pci_DomBDF(name, &addr) == 0) { - if (rte_eal_pci_detach(&addr) < 0) + memset(&soc_addr, 0, sizeof(soc_addr)); + if (rte_eal_parse_soc_spec(name, &soc_addr) == 0) { + if (rte_eal_soc_detach(&soc_addr) < 0) { + free(soc_addr.name); + goto err; + } + } else if (eal_parse_pci_DomBDF(name, &pci_addr) == 0) { + if (rte_eal_pci_detach(&pci_addr) < 0) goto err; } else { if (rte_eal_vdev_uninit(name)) diff --git a/lib/librte_eal/common/eal_common_devargs.c b/lib/librte_eal/common/eal_common_devargs.c index e403717..e1dae1a 100644 --- a/lib/librte_eal/common/eal_common_devargs.c +++ b/lib/librte_eal/common/eal_common_devargs.c @@ -41,6 +41,7 @@ #include #include +#include #include #include "eal_private.h" @@ -105,6 +106,14 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str) goto fail; break; + + case RTE_DEVTYPE_WHITELISTED_SOC: + case RTE_DEVTYPE_BLACKLISTED_SOC: + /* try to parse soc device with prefix "soc:" */ + if (rte_eal_parse_soc_spec(buf, &devargs->soc.addr) != 0) + goto fail; + break; + case RTE_DEVTYPE_VIRTUAL: /* save driver name */ ret = snprintf(devargs->virt.drv_name, @@ -166,6 +175,14 @@ rte_eal_devargs_dump(FILE *f) devargs->pci.addr.devid, devargs->pci.addr.function, devargs->args); + else if (devargs->type == RTE_DEVTYPE_WHITELISTED_SOC) + fprintf(f, " SoC whitelist %s %s\n", + devargs->soc.addr.name, + devargs->soc.addr.fdt_path); + else if (devargs->type == RTE_DEVTYPE_BLACKLISTED_SOC) + fprintf(f, " SoC blacklist %s %s\n", + devargs->soc.addr.name, + devargs->soc.addr.fdt_path); else if (devargs->type == RTE_DEVTYPE_VIRTUAL) fprintf(f, " VIRTUAL %s %s\n", devargs->virt.drv_name, diff --git a/lib/librte_eal/common/eal_common_soc.c b/lib/librte_eal/common/eal_common_soc.c index 256cef8..44f5559 100644 --- a/lib/librte_eal/common/eal_common_soc.c +++ b/lib/librte_eal/common/eal_common_soc.c @@ -37,6 +37,8 @@ #include #include +#include +#include #include #include "eal_private.h" @@ -70,6 +72,21 @@ rte_eal_soc_match_compat(struct rte_soc_driver *drv, return 1; } +static struct rte_devargs *soc_devargs_loo
[dpdk-dev] [PATCH v6 13/21] eal/soc: add drv_flags
From: Jan Viktorin The flags are copied from the PCI ones. They should be refactorized into a general set of flags in the future. Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- lib/librte_eal/common/include/rte_soc.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/lib/librte_eal/common/include/rte_soc.h b/lib/librte_eal/common/include/rte_soc.h index fb5ea7b..40490b9 100644 --- a/lib/librte_eal/common/include/rte_soc.h +++ b/lib/librte_eal/common/include/rte_soc.h @@ -123,8 +123,18 @@ struct rte_soc_driver { soc_scan_t *scan_fn; /**< Callback for scanning SoC bus*/ soc_match_t *match_fn; /**< Callback to match dev<->drv */ const struct rte_soc_id *id_table; /**< ID table, NULL terminated */ + uint32_t drv_flags;/**< Control handling of device */ }; +/** Device needs to map its resources by EAL */ +#define RTE_SOC_DRV_NEED_MAPPING 0x0001 +/** Device needs to be unbound even if no module is provieded */ +#define RTE_SOC_DRV_FORCE_UNBIND 0x0004 +/** Device driver supports link state interrupt */ +#define RTE_SOC_DRV_INTR_LSC0x0008 +/** Device driver supports detaching capability */ +#define RTE_SOC_DRV_DETACHABLE 0x0010 + /** * Utility function to write a SoC device name, this device name can later be * used to retrieve the corresponding rte_soc_addr using above functions. -- 2.7.4
[dpdk-dev] [PATCH v6 14/21] eal/soc: add intr_handle
From: Jan Viktorin Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- lib/librte_eal/common/include/rte_soc.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/librte_eal/common/include/rte_soc.h b/lib/librte_eal/common/include/rte_soc.h index 40490b9..38f897d 100644 --- a/lib/librte_eal/common/include/rte_soc.h +++ b/lib/librte_eal/common/include/rte_soc.h @@ -53,6 +53,7 @@ extern "C" { #include #include #include +#include extern struct soc_driver_list soc_driver_list; /**< Global list of SoC Drivers */ @@ -80,6 +81,7 @@ struct rte_soc_device { struct rte_device device; /**< Inherit code device */ struct rte_soc_addr addr; /**< SoC device Location */ struct rte_soc_id *id; /**< SoC device ID list */ + struct rte_intr_handle intr_handle; /**< Interrupt handle */ struct rte_soc_driver *driver; /**< Associated driver */ }; -- 2.7.4
[dpdk-dev] [PATCH v6 15/21] eal/soc: add default scan for Soc devices
From: Jan Viktorin Default implementation which scans the sysfs platform devices hierarchy. For each device, extract the ueven and convert into rte_soc_device. The information populated can then be used in probe to match against the drivers registered. Signed-off-by: Jan Viktorin [Shreyansh: restructure commit to be an optional implementation] Signed-off-by: Shreyansh Jain -- v5: - Update rte_eal_soc_scan to rte_eal_soc_scan_platform_bus - Fix comments over scan and match functions --- lib/librte_eal/common/include/rte_soc.h | 16 +- lib/librte_eal/linuxapp/eal/eal_soc.c | 315 2 files changed, 329 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/common/include/rte_soc.h b/lib/librte_eal/common/include/rte_soc.h index 38f897d..8be3db7 100644 --- a/lib/librte_eal/common/include/rte_soc.h +++ b/lib/librte_eal/common/include/rte_soc.h @@ -64,7 +64,10 @@ TAILQ_HEAD(soc_driver_list, rte_soc_driver); /**< SoC drivers in D-linked Q. */ TAILQ_HEAD(soc_device_list, rte_soc_device); /**< SoC devices in D-linked Q. */ struct rte_soc_id { - const char *compatible; /**< OF compatible specification */ + union { + const char *compatible; /**< OF compatible specification */ + char *_compatible; + }; uint64_t priv_data; /**< SoC Driver specific data */ }; @@ -200,7 +203,16 @@ rte_eal_parse_soc_spec(const char *spec, struct rte_soc_addr *addr) } /** - * Default function for matching the Soc driver with device. Each driver can + * Helper function for scanning for new SoC devices on platform bus. + * + * @return + * 0 on success + * !0 on failure to scan + */ +int rte_eal_soc_scan_platform_bus(void); + +/** + * Helper function for matching the Soc driver with device. Each driver can * either use this function or define their own soc matching function. * This function relies on the compatible string extracted from sysfs. But, * a SoC might have different way of identifying its devices. Such SoC can diff --git a/lib/librte_eal/linuxapp/eal/eal_soc.c b/lib/librte_eal/linuxapp/eal/eal_soc.c index 3929a76..d8dfe97 100644 --- a/lib/librte_eal/linuxapp/eal/eal_soc.c +++ b/lib/librte_eal/linuxapp/eal/eal_soc.c @@ -48,6 +48,321 @@ #include #include +/** Pathname of SoC devices directory. */ +#define SYSFS_SOC_DEVICES "/sys/bus/platform/devices" + +static const char * +soc_get_sysfs_path(void) +{ + const char *path = NULL; + + path = getenv("SYSFS_SOC_DEVICES"); + if (path == NULL) + return SYSFS_SOC_DEVICES; + + return path; +} + +static char * +dev_read_uevent(const char *dirname) +{ + char filename[PATH_MAX]; + struct stat st; + char *buf; + ssize_t total = 0; + int fd; + + snprintf(filename, sizeof(filename), "%s/uevent", dirname); + fd = open(filename, O_RDONLY); + if (fd < 0) { + RTE_LOG(WARNING, EAL, "Failed to open file %s\n", filename); + return strdup(""); + } + + if (fstat(fd, &st) < 0) { + RTE_LOG(ERR, EAL, "Failed to stat file %s\n", filename); + close(fd); + return NULL; + } + + if (st.st_size == 0) { + close(fd); + return strdup(""); + } + + buf = malloc(st.st_size + 1); + if (buf == NULL) { + RTE_LOG(ERR, EAL, "Failed to alloc memory to read %s\n", + filename); + close(fd); + return NULL; + } + + while (total < st.st_size) { + ssize_t rlen = read(fd, buf + total, st.st_size - total); + if (rlen < 0) { + if (errno == EINTR) + continue; + + RTE_LOG(ERR, EAL, "Failed to read file %s\n", filename); + + free(buf); + close(fd); + return NULL; + } + if (rlen == 0) /* EOF */ + break; + + total += rlen; + } + + buf[total] = '\0'; + close(fd); + + return buf; +} + +static const char * +dev_uevent_find(const char *uevent, const char *key) +{ + const size_t keylen = strlen(key); + const size_t total = strlen(uevent); + const char *p = uevent; + + /* check whether it is the first key */ + if (!strncmp(uevent, key, keylen)) + return uevent + keylen; + + /* check 2nd key or further... */ + do { + p = strstr(p, key); + if (p == NULL) + break; + + if (p[-1] == '\n') /* check we are at a new line */ + return p + keylen; + + p += keylen; /* skip this one */ + } while (p - uevent < (ptrdiff_t) total); + + return NULL; +} + +static char * +strdup_until_nl(const char *p) +{ + cons
[dpdk-dev] [PATCH v6 16/21] eal/soc: additional features for SoC
From: Jan Viktorin Additional features introduced: - Find kernel driver through sysfs bindings - Dummy implementation for mapping to kernel driver - DMA coherency value from sysfs - Numa node number from sysfs - Support for updating device during probe if already registered Signed-off-by: Jan Viktorin [Shreyansh: merge multiple patches into single set] Signed-off-by: Shreyansh Jain --- lib/librte_eal/common/eal_common_soc.c | 30 lib/librte_eal/common/eal_private.h | 23 ++ lib/librte_eal/common/include/rte_soc.h | 28 +++ lib/librte_eal/linuxapp/eal/eal_soc.c | 129 4 files changed, 210 insertions(+) diff --git a/lib/librte_eal/common/eal_common_soc.c b/lib/librte_eal/common/eal_common_soc.c index 44f5559..29c38e0 100644 --- a/lib/librte_eal/common/eal_common_soc.c +++ b/lib/librte_eal/common/eal_common_soc.c @@ -114,6 +114,26 @@ rte_eal_soc_probe_one_driver(struct rte_soc_driver *drv, return ret; } + if (!dev->is_dma_coherent) { + if (!(drv->drv_flags & RTE_SOC_DRV_ACCEPT_NONCC)) { + RTE_LOG(DEBUG, EAL, + " device is not DMA coherent, skipping\n"); + return 1; + } + } + + if (drv->drv_flags & RTE_SOC_DRV_NEED_MAPPING) { + /* map resources */ + ret = rte_eal_soc_map_device(dev); + if (ret) + return ret; + } else if (drv->drv_flags & RTE_SOC_DRV_FORCE_UNBIND + && rte_eal_process_type() == RTE_PROC_PRIMARY) { + /* unbind */ + if (soc_unbind_kernel_driver(dev) < 0) + return -1; + } + dev->driver = drv; RTE_VERIFY(drv->probe != NULL); return drv->probe(drv, dev); @@ -166,6 +186,10 @@ rte_eal_soc_detach_dev(struct rte_soc_driver *drv, if (drv->remove && (drv->remove(dev) < 0)) return -1; /* negative value is an error */ + if (drv->drv_flags & RTE_SOC_DRV_NEED_MAPPING) + /* unmap resources for devices */ + rte_eal_soc_unmap_device(dev); + /* clear driver structure */ dev->driver = NULL; @@ -241,6 +265,12 @@ rte_eal_soc_probe_one(const struct rte_soc_addr *addr) if (addr == NULL) return -1; + /* update current SoC device in global list, kernel bindings might have +* changed since last time we looked at it. +*/ + if (soc_update_device(addr) < 0) + goto err_return; + TAILQ_FOREACH(dev, &soc_device_list, next) { if (rte_eal_compare_soc_addr(&dev->addr, addr)) continue; diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index d810f9f..30c648d 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -159,6 +159,29 @@ int pci_update_device(const struct rte_pci_addr *addr); int pci_unbind_kernel_driver(struct rte_pci_device *dev); /** + * Update a soc device object by asking the kernel for the latest information. + * + * This function is private to EAL. + * + * @param addr + * The SoC address to look for + * @return + * - 0 on success. + * - negative on error. + */ +int soc_update_device(const struct rte_soc_addr *addr); + +/** + * Unbind kernel driver for this device + * + * This function is private to EAL. + * + * @return + * 0 on success, negative on error + */ +int soc_unbind_kernel_driver(struct rte_soc_device *dev); + +/** * Map the PCI resource of a PCI device in virtual memory * * This function is private to EAL. diff --git a/lib/librte_eal/common/include/rte_soc.h b/lib/librte_eal/common/include/rte_soc.h index 8be3db7..d7f7ec8 100644 --- a/lib/librte_eal/common/include/rte_soc.h +++ b/lib/librte_eal/common/include/rte_soc.h @@ -46,9 +46,11 @@ extern "C" { #include #include +#include #include #include #include +#include #include #include @@ -63,6 +65,14 @@ extern struct soc_device_list soc_device_list; TAILQ_HEAD(soc_driver_list, rte_soc_driver); /**< SoC drivers in D-linked Q. */ TAILQ_HEAD(soc_device_list, rte_soc_device); /**< SoC devices in D-linked Q. */ +#define SOC_MAX_RESOURCE 6 + +struct rte_soc_resource { + uint64_t phys_addr; + uint64_t len; + void *addr; +}; + struct rte_soc_id { union { const char *compatible; /**< OF compatible specification */ @@ -84,8 +94,12 @@ struct rte_soc_device { struct rte_device device; /**< Inherit code device */ struct rte_soc_addr addr; /**< SoC device Location */ struct rte_soc_id *id; /**< SoC device ID list */ + struct rte_soc_resource mem_resource[SOC_MAX_RESOURCE]; struct rte_intr_handle intr_handle; /**< Interrupt handle */ struct rte_soc_driver *driver; /
[dpdk-dev] [PATCH v6 17/21] ether: utilize container_of for pci_drv
From: Jan Viktorin It is not necessary to place the rte_pci_driver at the beginning of the rte_eth_dev struct anymore as we use the container_of macro to get the parent pointer. Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- lib/librte_ether/rte_ethdev.c | 4 ++-- lib/librte_ether/rte_ethdev.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index fde8112..347c230 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -241,7 +241,7 @@ rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv, int diag; - eth_drv = (struct eth_driver *)pci_drv; + eth_drv = container_of(pci_drv, struct eth_driver, pci_drv); rte_eal_pci_device_name(&pci_dev->addr, ethdev_name, sizeof(ethdev_name)); @@ -302,7 +302,7 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev) if (eth_dev == NULL) return -ENODEV; - eth_drv = (const struct eth_driver *)pci_dev->driver; + eth_drv = container_of(pci_dev->driver, struct eth_driver, pci_drv); /* Invoke PMD device uninit function */ if (*eth_drv->eth_dev_uninit) { diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 38641e8..f893fe0 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1850,7 +1850,7 @@ typedef int (*eth_dev_uninit_t)(struct rte_eth_dev *eth_dev); * Each Ethernet driver acts as a PCI driver and is represented by a generic * *eth_driver* structure that holds: * - * - An *rte_pci_driver* structure (which must be the first field). + * - An *rte_pci_driver* structure. * * - The *eth_dev_init* function invoked for each matching PCI device. * -- 2.7.4
[dpdk-dev] [PATCH v6 18/21] ether: verify we copy info from a PCI device
From: Jan Viktorin Now that different types of ethdev exist, check for presence of PCI dev while copying out the info. Similar would be done for SoC. Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- lib/librte_ether/rte_ethdev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 347c230..a1e3aaf 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -3206,6 +3206,8 @@ rte_eth_copy_pci_info(struct rte_eth_dev *eth_dev, struct rte_pci_device *pci_de return; } + RTE_VERIFY(eth_dev->pci_dev != NULL); + eth_dev->data->dev_flags = 0; if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC) eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC; -- 2.7.4
[dpdk-dev] [PATCH v6 19/21] ether: extract function eth_dev_get_intr_handle
From: Jan Viktorin We abstract access to the intr_handle here as we want to get it either from the pci_dev or soc_dev. Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- lib/librte_ether/rte_ethdev.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index a1e3aaf..4c61246 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -2532,6 +2532,16 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev, rte_spinlock_unlock(&rte_eth_dev_cb_lock); } +static inline +struct rte_intr_handle *eth_dev_get_intr_handle(struct rte_eth_dev *dev) +{ + if (dev->pci_dev) + return &dev->pci_dev->intr_handle; + + RTE_ASSERT(0); + return NULL; +} + int rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data) { @@ -2544,7 +2554,7 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data) RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); dev = &rte_eth_devices[port_id]; - intr_handle = &dev->pci_dev->intr_handle; + intr_handle = eth_dev_get_intr_handle(dev); if (!intr_handle->intr_vec) { RTE_PMD_DEBUG_TRACE("RX Intr vector unset\n"); return -EPERM; @@ -2604,7 +2614,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id, return -EINVAL; } - intr_handle = &dev->pci_dev->intr_handle; + intr_handle = eth_dev_get_intr_handle(dev); if (!intr_handle->intr_vec) { RTE_PMD_DEBUG_TRACE("RX Intr vector unset\n"); return -EPERM; -- 2.7.4
[dpdk-dev] [PATCH v6 20/21] ether: introduce ethernet dev probe remove
From: Jan Viktorin Signed-off-by: Jan Viktorin Signed-off-by: Shreyansh Jain Signed-off-by: Hemant Agrawal --- lib/librte_ether/rte_ethdev.c | 148 +- lib/librte_ether/rte_ethdev.h | 31 + 2 files changed, 177 insertions(+), 2 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 4c61246..972e916 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -325,6 +325,101 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev) } int +rte_eth_dev_soc_probe(struct rte_soc_driver *soc_drv, + struct rte_soc_device *soc_dev) +{ + struct eth_driver*eth_drv; + struct rte_eth_dev *eth_dev; + char ethdev_name[RTE_ETH_NAME_MAX_LEN]; + + int diag; + + eth_drv = container_of(soc_drv, struct eth_driver, soc_drv); + + rte_eal_soc_device_name(&soc_dev->addr, ethdev_name, + sizeof(ethdev_name)); + + eth_dev = rte_eth_dev_allocate(ethdev_name); + if (eth_dev == NULL) + return -ENOMEM; + + if (rte_eal_process_type() == RTE_PROC_PRIMARY) { + eth_dev->data->dev_private = rte_zmalloc( + "ethdev private structure", + eth_drv->dev_private_size, + RTE_CACHE_LINE_SIZE); + if (eth_dev->data->dev_private == NULL) + rte_panic("Cannot allocate memzone for private port " + "data\n"); + } + eth_dev->soc_dev = soc_dev; + eth_dev->driver = eth_drv; + eth_dev->data->rx_mbuf_alloc_failed = 0; + + /* init user callbacks */ + TAILQ_INIT(&(eth_dev->link_intr_cbs)); + + /* +* Set the default MTU. +*/ + eth_dev->data->mtu = ETHER_MTU; + + /* Invoke PMD device initialization function */ + diag = (*eth_drv->eth_dev_init)(eth_dev); + if (diag == 0) + return 0; + + RTE_PMD_DEBUG_TRACE("driver %s: eth_dev_init(%s) failed\n", + soc_drv->driver.name, + soc_dev->addr.name); + if (rte_eal_process_type() == RTE_PROC_PRIMARY) + rte_free(eth_dev->data->dev_private); + rte_eth_dev_release_port(eth_dev); + return diag; +} + +int +rte_eth_dev_soc_remove(struct rte_soc_device *soc_dev) +{ + const struct eth_driver *eth_drv; + struct rte_eth_dev *eth_dev; + char ethdev_name[RTE_ETH_NAME_MAX_LEN]; + int ret; + + if (soc_dev == NULL) + return -EINVAL; + + rte_eal_soc_device_name(&soc_dev->addr, ethdev_name, + sizeof(ethdev_name)); + + eth_dev = rte_eth_dev_allocated(ethdev_name); + if (eth_dev == NULL) + return -ENODEV; + + eth_drv = container_of(soc_dev->driver, struct eth_driver, soc_drv); + + /* Invoke PMD device uninit function */ + if (*eth_drv->eth_dev_uninit) { + ret = (*eth_drv->eth_dev_uninit)(eth_dev); + if (ret) + return ret; + } + + /* free ether device */ + rte_eth_dev_release_port(eth_dev); + + if (rte_eal_process_type() == RTE_PROC_PRIMARY) + rte_free(eth_dev->data->dev_private); + + eth_dev->soc_dev = NULL; + eth_dev->driver = NULL; + eth_dev->data = NULL; + + return 0; +} + + +int rte_eth_dev_is_valid_port(uint8_t port_id) { if (port_id >= RTE_MAX_ETHPORTS || @@ -1557,6 +1652,7 @@ rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info) RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get); (*dev->dev_ops->dev_infos_get)(dev, dev_info); dev_info->pci_dev = dev->pci_dev; + dev_info->soc_dev = dev->soc_dev; dev_info->driver_name = dev->data->drv_name; dev_info->nb_rx_queues = dev->data->nb_rx_queues; dev_info->nb_tx_queues = dev->data->nb_tx_queues; @@ -2535,8 +2631,15 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev, static inline struct rte_intr_handle *eth_dev_get_intr_handle(struct rte_eth_dev *dev) { - if (dev->pci_dev) + if (dev->pci_dev) { + RTE_ASSERT(dev->soc_dev == NULL); return &dev->pci_dev->intr_handle; + } + + if (dev->soc_dev) { + RTE_ASSERT(dev->pci_dev == NULL); + return &dev->soc_dev->intr_handle; + } RTE_ASSERT(0); return NULL; @@ -2573,6 +2676,23 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data) return 0; } +static inline +const char *eth_dev_get_driver_name(const struct rte_eth_dev *dev) +{ + if (dev->pci_dev) { + RTE_ASSERT(dev->soc_dev == NULL); + return dev->driver->pci_drv.driver.name; + } + + if (dev->soc_dev) { + RTE_ASSERT(dev->pci_dev
[dpdk-dev] [PATCH v6 21/21] eal/crypto: Support rte_soc_driver/device for cryptodev
- rte_cryptodev_driver/rte_cryptodev_dev embeds rte_soc_driver/device for linking SoC PMDs to crypto devices. - Add probe and remove functions linked Signed-off-by: Hemant Agrawal Signed-off-by: Shreyansh Jain --- lib/librte_cryptodev/rte_cryptodev.c | 122 - lib/librte_cryptodev/rte_cryptodev.h | 3 + lib/librte_cryptodev/rte_cryptodev_pmd.h | 18 +++- lib/librte_cryptodev/rte_cryptodev_version.map | 2 + 4 files changed, 140 insertions(+), 5 deletions(-) diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c index 127e8d0..77ec9fe 100644 --- a/lib/librte_cryptodev/rte_cryptodev.c +++ b/lib/librte_cryptodev/rte_cryptodev.c @@ -422,7 +422,8 @@ rte_cryptodev_pci_probe(struct rte_pci_driver *pci_drv, int retval; - cryptodrv = (struct rte_cryptodev_driver *)pci_drv; + cryptodrv = container_of(pci_drv, struct rte_cryptodev_driver, +pci_drv); if (cryptodrv == NULL) return -ENODEV; @@ -489,7 +490,8 @@ rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev) if (cryptodev == NULL) return -ENODEV; - cryptodrv = (const struct rte_cryptodev_driver *)pci_dev->driver; + cryptodrv = container_of(pci_dev->driver, struct rte_cryptodev_driver, +pci_drv); if (cryptodrv == NULL) return -ENODEV; @@ -513,6 +515,111 @@ rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev) return 0; } + +int +rte_cryptodev_soc_probe(struct rte_soc_driver *soc_drv, + struct rte_soc_device *soc_dev) +{ + struct rte_cryptodev_driver *cryptodrv; + struct rte_cryptodev *cryptodev; + + char cryptodev_name[RTE_CRYPTODEV_NAME_MAX_LEN]; + + int retval; + + cryptodrv = container_of(soc_drv, struct rte_cryptodev_driver, +soc_drv); + + rte_eal_soc_device_name(&soc_dev->addr, cryptodev_name, + sizeof(cryptodev_name)); + + cryptodev = rte_cryptodev_pmd_allocate(cryptodev_name, + rte_socket_id()); + if (cryptodev == NULL) + return -ENOMEM; + + + if (rte_eal_process_type() == RTE_PROC_PRIMARY) { + cryptodev->data->dev_private = + rte_zmalloc_socket( + "cryptodev private structure", + cryptodrv->dev_private_size, + RTE_CACHE_LINE_SIZE, + rte_socket_id()); + + if (cryptodev->data->dev_private == NULL) + rte_panic("Cannot allocate memzone for private " + "device data"); + } + + cryptodev->soc_dev = soc_dev; + cryptodev->driver = cryptodrv; + + /* init user callbacks */ + TAILQ_INIT(&(cryptodev->link_intr_cbs)); + + /* Invoke PMD device initialization function */ + retval = (*cryptodrv->cryptodev_init)(cryptodrv, cryptodev); + if (retval == 0) + return 0; + + CDEV_LOG_ERR("driver %s: cryptodev_init(%s) failed\n", + soc_drv->driver.name, + soc_dev->addr.name); + + if (rte_eal_process_type() == RTE_PROC_PRIMARY) + rte_free(cryptodev->data->dev_private); + + cryptodev->attached = RTE_CRYPTODEV_DETACHED; + cryptodev_globals.nb_devs--; + + return -ENXIO; +} + +int +rte_cryptodev_soc_remove(struct rte_soc_device *soc_dev) +{ + const struct rte_cryptodev_driver *cryptodrv; + struct rte_cryptodev *cryptodev; + char cryptodev_name[RTE_CRYPTODEV_NAME_MAX_LEN]; + int ret; + + if (soc_dev == NULL) + return -EINVAL; + + rte_eal_soc_device_name(&soc_dev->addr, cryptodev_name, + sizeof(cryptodev_name)); + + cryptodev = rte_cryptodev_pmd_get_named_dev(cryptodev_name); + if (cryptodev == NULL) + return -ENODEV; + + cryptodrv = container_of(soc_dev->driver, + struct rte_cryptodev_driver, soc_drv); + if (cryptodrv == NULL) + return -ENODEV; + + /* Invoke PMD device uninit function */ + if (*cryptodrv->cryptodev_uninit) { + ret = (*cryptodrv->cryptodev_uninit)(cryptodrv, cryptodev); + if (ret) + return ret; + } + + /* free crypto device */ + rte_cryptodev_pmd_release_device(cryptodev); + + if (rte_eal_process_type() == RTE_PROC_PRIMARY) + rte_free(cryptodev->data->dev_private); + + cryptodev->pci_dev = NULL; + cryptodev->soc_dev = NULL; + cryptodev->driver = NULL; + cryptodev->data = NULL; + + return 0; +} + uint16_
[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> > Hi Tomasz, > > This is a major new function in the API and I still have some comments. > > 2016-10-26 14:56, Tomasz Kulasek: > > --- a/config/common_base > > +++ b/config/common_base > > +CONFIG_RTE_ETHDEV_TX_PREP=y > > We cannot enable it until it is implemented in every drivers. Not sure why? If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop. Right now it is not mandatory for the PMD to implement it. > > > struct rte_eth_dev { > > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */ > > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */ > > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare > > function. */ > > struct rte_eth_dev_data *data; /**< Pointer to device data */ > > const struct eth_driver *driver;/**< Driver for this device */ > > const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */ > > Could you confirm why tx_pkt_prep is not in dev_ops? > I guess we want to have several implementations? Yes, it depends on configuration options, same as tx_pkt_burst. > > Shouldn't we have a const struct control_dev_ops and a struct > datapath_dev_ops? That's probably a good idea, but I suppose it is out of scope for that patch. Konstantin
[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
2016-10-27 15:52, Ananyev, Konstantin: > > > > > Hi Tomasz, > > > > This is a major new function in the API and I still have some comments. > > > > 2016-10-26 14:56, Tomasz Kulasek: > > > --- a/config/common_base > > > +++ b/config/common_base > > > +CONFIG_RTE_ETHDEV_TX_PREP=y > > > > We cannot enable it until it is implemented in every drivers. > > Not sure why? > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop. > Right now it is not mandatory for the PMD to implement it. If it is not implemented, the application must do the preparation by itself. >From patch 6: " Removed pseudo header calculation for udp/tcp/tso packets from application and used Tx preparation API for packet preparation and verification. " So how does it behave with other drivers? > > > struct rte_eth_dev { > > > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */ > > > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */ > > > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare > > > function. */ > > > struct rte_eth_dev_data *data; /**< Pointer to device data */ > > > const struct eth_driver *driver;/**< Driver for this device */ > > > const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */ > > > > Could you confirm why tx_pkt_prep is not in dev_ops? > > I guess we want to have several implementations? > > Yes, it depends on configuration options, same as tx_pkt_burst. > > > > > Shouldn't we have a const struct control_dev_ops and a struct > > datapath_dev_ops? > > That's probably a good idea, but I suppose it is out of scope for that patch. No it's not out of scope. It answers to the question "why is it added in this structure and not dev_ops". We won't do this change when nothing else is changed in the struct.
[dpdk-dev] PCIe Hot Insert/Remove Support
Hi Benjamin, 2016-10-24 18:16, Walker, Benjamin: > Hi all, > > My name is Ben Walker and I'm the technical lead for SPDK (it's like DPDK, but > for storage devices). SPDK relies on DPDK only for the base functionality in > the > EAL - memory management, the rings, and the PCI scanning code. A key feature > for > storage devices is support for hot insert and remove, so we're currently > working > through how best to implement this for a user space driver. While doing this > work, we've run into a few issues with the current DPDK PCI/device/driver > framework that I'd like to discuss with this list. I'm not entirely ramped up > on > all of the current activity in this area or what the future plans are, so > please > educate me if something is coming that will address our current issues. I'm > working off of the latest commit on the master branch as of today. > > Today, there appears to be two lists - one of PCI devices and one of drivers. > To > update the list of PCI devices, you call rte_eal_pci_scan(), which scans the > PCI > bus. That call does not attempt to load any drivers. One scan is automatically > performed when the eal is first initialized. To add or remove drivers from the > driver list you call rte_eal_driver_register/unregister. To match drivers in > the > driver list to devices in the device list, you call rte_eal_pci_probe. > > There are a few problems with how the code works for us. First, > rte_eal_pci_scan's algorithm will not correctly detect devices that are in its > internal list but weren't found by the most recent PCI bus scan (i.e. they > were > hot removed). DPDK's scan doesn't seem to comprehend hot remove in any way. > Fortunately there is a public API to remove devices from the device list - > rte_eal_pci_detach. That function will automatically unload any drivers > associated with the device and then remove it from the list. There is a > similar > call for adding a device to the list - rte_eal_pci_probe_one, which will add a > device to the device list and then automatically match it to drivers. I think > if > rte_eal_pci_scan is going to be a public interface (and it is), it needs to > correctly comprehend the removal of PCI devices. Otherwise, make it a private > API that is only called in response to rte_eal_init and only expose the public > probe_one/detach calls for modifying the list of devices. My preference is for > the former, not the latter. > > Second, rte_eal_pci_probe will call the driver initialization functions each > time a probe happens, even if the driver has already been successfully loaded. > This tends to crash a lot of the PMDs. It seems to me like rte_eal_pci_probe > is > not safe to call more than once during the lifetime of the program, which is a > real challenge when you have multiple users of the PCI framework. For > instance, > an application may manage both storage devices using the rte_eal_pci framework > and NICs, and the initialization routine may go something like: > > register NIC drivers > rte_eal_probe() > ... > register SSD drivers > rte_eal_probe() > > This is almost certainly how any real code is going to function because the > code > dealing with NICs is unrelated and probably unaware of the code dealing with > the > SSDs. It should be fairly trivial to simply not call the probe() callback for > a > device if the driver has already been loaded. Is this a reasonable > modification > to make? Yes it seems to be a reasonnable fix. However we could improve this design a lot. I'll try to describe the big picture around PCI hotplugging below. PCI was too much mixed in DPDK code, so we are sorting out what is generic to every buses and devices, and what is specific to PCI. Then we want to manage PCI devices as any other device (real or virtual). >From a generic perspective, we should be able to manage startup and hotplug with few API functions: - scan a bus or all of them (called at startup) note: scanning for vdev is an argument parsing - receive a notification (callback registered by the application) note: whitelist/blacklist can be handled here at the application level - match the device with a driver - initialize the device in the matching driver note: some drivers require binding with a kernel module, and that must be implemented in EAL The last missing part for a true hotplug is receiving hardware events so that we don't need to manually scan anymore. Hope it will answer your question. We are waiting for contributions to progress in this direction. Thanks
[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Thursday, October 27, 2016 5:02 PM > To: Ananyev, Konstantin > Cc: Kulasek, TomaszX ; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation > > 2016-10-27 15:52, Ananyev, Konstantin: > > > > > > > > Hi Tomasz, > > > > > > This is a major new function in the API and I still have some comments. > > > > > > 2016-10-26 14:56, Tomasz Kulasek: > > > > --- a/config/common_base > > > > +++ b/config/common_base > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y > > > > > > We cannot enable it until it is implemented in every drivers. > > > > Not sure why? > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop. > > Right now it is not mandatory for the PMD to implement it. > > If it is not implemented, the application must do the preparation by itself. > From patch 6: > " > Removed pseudo header calculation for udp/tcp/tso packets from > application and used Tx preparation API for packet preparation and > verification. > " > So how does it behave with other drivers? Hmm so it seems that we broke testpmd csumonly mode for non-intel drivers.. My bad, missed that part completely. Yes, then I suppose for now we'll need to support both (with and without) code paths for testpmd. Probably a new fwd mode or just extra parameter for the existing one? Any other suggestions? > > > > > struct rte_eth_dev { > > > > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive > > > > function. */ > > > > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit > > > > function. */ > > > > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare > > > > function. */ > > > > struct rte_eth_dev_data *data; /**< Pointer to device data */ > > > > const struct eth_driver *driver;/**< Driver for this device */ > > > > const struct eth_dev_ops *dev_ops; /**< Functions exported by > > > > PMD */ > > > > > > Could you confirm why tx_pkt_prep is not in dev_ops? > > > I guess we want to have several implementations? > > > > Yes, it depends on configuration options, same as tx_pkt_burst. > > > > > > > > Shouldn't we have a const struct control_dev_ops and a struct > > > datapath_dev_ops? > > > > That's probably a good idea, but I suppose it is out of scope for that > > patch. > > No it's not out of scope. > It answers to the question "why is it added in this structure and not > dev_ops". > We won't do this change when nothing else is changed in the struct. Not sure I understood you here: Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced as part of that patch? But that's a lot of changes all over rte_ethdev.[h,c]. It definitely worse a separate patch (might be some discussion) for me. Konstantin
[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
Hi Thomas, > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Thursday, October 27, 2016 17:01 > To: Kulasek, TomaszX > Cc: dev at dpdk.org; olivier.matz at 6wind.com > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation > > Hi Tomasz, > > This is a major new function in the API and I still have some comments. > > 2016-10-26 14:56, Tomasz Kulasek: > > --- a/config/common_base > > +++ b/config/common_base > > +CONFIG_RTE_ETHDEV_TX_PREP=y > > We cannot enable it until it is implemented in every drivers. > For most of drivers it's safe to enable it by default and if this feature is not supported, no checks/modifications are done. In that meaning the processing path is the same as without using Tx preparation. Introducing this macro was discussed in the threads: http://dpdk.org/ml/archives/dev/2016-September/046437.html http://dpdk.org/dev/patchwork/patch/15770/ Short conclusion: Jerin Jacob pointed, that it can have significant impact on some architectures (such a low-end ARMv7, ARMv8 targets which may not have PCIE-RC support and have only integrated NIC controller), even if this feature is not implemented. We've added this macro to provide an ability to use NOOP operation and allow turn off this feature if will have adverse effect on specific configuration/hardware. > > struct rte_eth_dev { > > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. > */ > > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. > */ > > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare > function. */ > > struct rte_eth_dev_data *data; /**< Pointer to device data */ > > const struct eth_driver *driver;/**< Driver for this device */ > > const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */ > > Could you confirm why tx_pkt_prep is not in dev_ops? > I guess we want to have several implementations? > Yes, the implementation may vary on selected tx_burst path (e.g. vector implementation, simple implementation, full featured, and so on, and can have another requirements, such a implemented features, performance requirements for each implementation). The path is chosen based on the application requirements transparently and we have a pair of callbacks -- tx_burst and corresponding callback (which depends directly on tx_burst path). > Shouldn't we have a const struct control_dev_ops and a struct > datapath_dev_ops? > > > +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf > **tx_pkts, > > + uint16_t nb_pkts) > > The word "prep" can be understood as "prepend". > Why not rte_eth_tx_prepare? > I do not mind. > > +/** > > + * Fix pseudo header checksum > > + * > > + * This function fixes pseudo header checksum for TSO and non-TSO > tcp/udp in > > + * provided mbufs packet data. > > + * > > + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted > and set > > + * in packet data, > > + * - for TSO the IP payload length is not included in pseudo header. > > + * > > + * This function expects that used headers are in the first data > segment of > > + * mbuf, are not fragmented and can be safely modified. > > What happens otherwise? > There are requirements for this helper function. For Tx preparation callback we check this requirement and if it fails, -NOTSUP errno is returned. > > + * > > + * @param m > > + * The packet mbuf to be fixed. > > + * @return > > + * 0 if checksum is initialized properly > > + */ > > +static inline int > > +rte_phdr_cksum_fix(struct rte_mbuf *m) > > Could we find a better name for this function? > - About the prefix, rte_ip_ ? > - About the scope, where this phdr_cksum is specified? > Isn't it an intel_phdr_cksum to match what hardware expects? > - About the verb, is it really fixing something broken? > Or just writing into a mbuf? > I would suggest rte_ip_intel_cksum_prepare. Fixes in the meaning of requirements for offloads, which states e.g. that to use specific Tx offload we should to fill checksums in a proper way, if not, thee settings are not valid and should be fixed. But you're right, prepare is better word. About the function name, maybe rte_net_intel_chksum_prepare will be better while it prepares also tcp/udp headers and is placed in rte_net.h? Tomasz
[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
2016-10-27 16:24, Ananyev, Konstantin: > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > 2016-10-27 15:52, Ananyev, Konstantin: > > > > Hi Tomasz, > > > > > > > > This is a major new function in the API and I still have some comments. > > > > > > > > 2016-10-26 14:56, Tomasz Kulasek: > > > > > --- a/config/common_base > > > > > +++ b/config/common_base > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y > > > > > > > > We cannot enable it until it is implemented in every drivers. > > > > > > Not sure why? > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop. > > > Right now it is not mandatory for the PMD to implement it. > > > > If it is not implemented, the application must do the preparation by itself. > > From patch 6: > > " > > Removed pseudo header calculation for udp/tcp/tso packets from > > application and used Tx preparation API for packet preparation and > > verification. > > " > > So how does it behave with other drivers? > > Hmm so it seems that we broke testpmd csumonly mode for non-intel drivers.. > My bad, missed that part completely. > Yes, then I suppose for now we'll need to support both (with and without) > code paths for testpmd. > Probably a new fwd mode or just extra parameter for the existing one? > Any other suggestions? Please think how we can use it in every applications. It is not ready. Either we introduce the API without enabling it, or we implement it in every drivers. > > > > > struct rte_eth_dev { > > > > > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive > > > > > function. */ > > > > > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit > > > > > function. */ > > > > > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare > > > > > function. */ > > > > > struct rte_eth_dev_data *data; /**< Pointer to device data */ > > > > > const struct eth_driver *driver;/**< Driver for this device */ > > > > > const struct eth_dev_ops *dev_ops; /**< Functions exported by > > > > > PMD */ > > > > > > > > Could you confirm why tx_pkt_prep is not in dev_ops? > > > > I guess we want to have several implementations? > > > > > > Yes, it depends on configuration options, same as tx_pkt_burst. > > > > > > > > > > > Shouldn't we have a const struct control_dev_ops and a struct > > > > datapath_dev_ops? > > > > > > That's probably a good idea, but I suppose it is out of scope for that > > > patch. > > > > No it's not out of scope. > > It answers to the question "why is it added in this structure and not > > dev_ops". > > We won't do this change when nothing else is changed in the struct. > > Not sure I understood you here: > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced as > part of that patch? > But that's a lot of changes all over rte_ethdev.[h,c]. > It definitely worse a separate patch (might be some discussion) for me. Yes it could be a separate patch in the same patchset.
[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
Hi > -Original Message- > From: Ananyev, Konstantin > Sent: Thursday, October 27, 2016 18:24 > To: Thomas Monjalon > Cc: Kulasek, TomaszX ; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation > > > > > -Original Message- > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > Sent: Thursday, October 27, 2016 5:02 PM > > To: Ananyev, Konstantin > > Cc: Kulasek, TomaszX ; dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation > > > > 2016-10-27 15:52, Ananyev, Konstantin: > > > > > > > > > > > Hi Tomasz, > > > > > > > > This is a major new function in the API and I still have some > comments. > > > > > > > > 2016-10-26 14:56, Tomasz Kulasek: > > > > > --- a/config/common_base > > > > > +++ b/config/common_base > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y > > > > > > > > We cannot enable it until it is implemented in every drivers. > > > > > > Not sure why? > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop. > > > Right now it is not mandatory for the PMD to implement it. > > > > If it is not implemented, the application must do the preparation by > itself. > > From patch 6: > > " > > Removed pseudo header calculation for udp/tcp/tso packets from > > application and used Tx preparation API for packet preparation and > > verification. > > " > > So how does it behave with other drivers? > > Hmm so it seems that we broke testpmd csumonly mode for non-intel > drivers.. > My bad, missed that part completely. > Yes, then I suppose for now we'll need to support both (with and without) > code paths for testpmd. > Probably a new fwd mode or just extra parameter for the existing one? > Any other suggestions? > I had sent txprep engine in v2 (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the suggestions. If you like it I can resent it in place of csumonly modification. Tomasz > > > > > > > struct rte_eth_dev { > > > > > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive > function. */ > > > > > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit > > > > > function. */ > > > > > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit > > > > > +prepare function. */ > > > > > struct rte_eth_dev_data *data; /**< Pointer to device data */ > > > > > const struct eth_driver *driver;/**< Driver for this device */ > > > > > const struct eth_dev_ops *dev_ops; /**< Functions exported by > > > > > PMD */ > > > > > > > > Could you confirm why tx_pkt_prep is not in dev_ops? > > > > I guess we want to have several implementations? > > > > > > Yes, it depends on configuration options, same as tx_pkt_burst. > > > > > > > > > > > Shouldn't we have a const struct control_dev_ops and a struct > datapath_dev_ops? > > > > > > That's probably a good idea, but I suppose it is out of scope for that > patch. > > > > No it's not out of scope. > > It answers to the question "why is it added in this structure and not > dev_ops". > > We won't do this change when nothing else is changed in the struct. > > Not sure I understood you here: > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced > as part of that patch? > But that's a lot of changes all over rte_ethdev.[h,c]. > It definitely worse a separate patch (might be some discussion) for me. > Konstantin > >
[dpdk-dev] [RFC PATCH v2 2/3] lib: add bitrate statistics library
On Fri, 28 Oct 2016 09:04:30 +0800 Remy Horton wrote: > + > +struct rte_stats_bitrate_s { > + uint64_t last_ibytes; > + uint64_t last_obytes; > + uint64_t peak_ibits; > + uint64_t peak_obits; > + uint64_t ewma_ibits; > + uint64_t ewma_obits; > +}; > + Reader/write access of 64 bit values is not safe on 32 bit platforms. I think you need to add a generation counter (see Linux kernel syncp) to handle 32 bit architecture. If done correctly, it would be a nop on 64 bit platforms.
[dpdk-dev] [PATCH] net/qede: fix gcc compiler option checks
From: Rasesh Mody Using GCC_VERSION to check gcc version and decide whether to include that compiler option. Fixes: ec94dbc57362 ("qede: add base driver") Fixes: ecc7a5a27ffe ("net/qede/base: fix 32-bit build") Signed-off-by: Rasesh Mody --- drivers/net/qede/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/qede/Makefile b/drivers/net/qede/Makefile index 39751e4..29b443d 100644 --- a/drivers/net/qede/Makefile +++ b/drivers/net/qede/Makefile @@ -46,11 +46,11 @@ endif endif ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) -ifeq ($(shell gcc -Wno-unused-but-set-variable -Werror -E - < /dev/null > /dev/null 2>&1; echo $$?),0) +ifeq ($(shell test $(GCC_VERSION) -ge 44 && echo 1), 1) CFLAGS_BASE_DRIVER += -Wno-unused-but-set-variable endif CFLAGS_BASE_DRIVER += -Wno-missing-declarations -ifeq ($(shell gcc -Wno-maybe-uninitialized -Werror -E - < /dev/null > /dev/null 2>&1; echo $$?),0) +ifeq ($(shell test $(GCC_VERSION) -ge 46 && echo 1), 1) CFLAGS_BASE_DRIVER += -Wno-maybe-uninitialized endif CFLAGS_BASE_DRIVER += -Wno-strict-prototypes -- 1.8.3.1
[dpdk-dev] [PATCH] net/qede: fix advertising link speed capability
From: Harish Patil Fix to advertise device's link speed capability based on current link speed rather than returning driver supported speeds. Fixes: 95e67b479506 ("net/qede: add 100G link speed capability") Signed-off-by: Harish Patil --- drivers/net/qede/qede_ethdev.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c index b91b478..4c4c669 100644 --- a/drivers/net/qede/qede_ethdev.c +++ b/drivers/net/qede/qede_ethdev.c @@ -646,6 +646,7 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev, { struct qede_dev *qdev = eth_dev->data->dev_private; struct ecore_dev *edev = &qdev->edev; + struct qed_link_output link; PMD_INIT_FUNC_TRACE(edev); @@ -678,8 +679,9 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev, DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM); - dev_info->speed_capa = ETH_LINK_SPEED_25G | ETH_LINK_SPEED_40G | - ETH_LINK_SPEED_100G; + memset(&link, 0, sizeof(struct qed_link_output)); + qdev->ops->common->get_link(edev, &link); + dev_info->speed_capa = rte_eth_speed_bitflag(link.speed, 0); } /* return 0 means link status changed, -1 means not changed */ -- 1.8.3.1