Re: [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table
On 09/05/2018 06:28 AM, Tiwei Bie wrote: This series consists of some fixes and improvements for virtio-user's memory table preparation. This series supersedes below patches: https://patches.dpdk.org/patch/43807/ https://patches.dpdk.org/patch/43918/ The second patch in this series depends on the below patch set: http://patches.dpdk.org/project/dpdk/list/?series=1177 Tiwei Bie (3): net/virtio-user: fix deadlock in memory events callback net/virtio-user: avoid parsing process mappings net/virtio-user: fix memory hotplug support in vhost-kernel drivers/net/virtio/virtio_user/vhost_kernel.c | 50 +++-- drivers/net/virtio/virtio_user/vhost_user.c | 211 -- .../net/virtio/virtio_user/virtio_user_dev.c | 19 ++ 3 files changed, 135 insertions(+), 145 deletions(-) Applied to dpdk-next-virtio/master. Thanks, Maxime
Re: [dpdk-dev] [PATCH v4 1/8] net/mvneta: add neta PMD skeleton
Thank you Stephen for taking a look at these patches. On 19.09.2018 18:19, Stephen Hemminger wrote: > On Wed, 19 Sep 2018 17:01:27 +0200 > Andrzej Ostruszka wrote: [...] >> + git clone >> https://github.com/MarvellEmbeddedProcessors/linux-marvell.git -b >> linux-4.4.52-armada-17.10 >> + > > In general the rule for DPDK is that drivers in upstream DPDK must have their > OS support > components upstream. You are relying on an old Linux kernel which is maybe > the way embedded > works, but it really needs to be upstream first. If this rule holds then there are some exceptions e.g. mvpp2 (in current release using the same version of kernel) so accepting this driver will not brake this rule any more than it is now. AFAIK Marvell is using some parts that are not upstreamable so it will be always a custom kernel - but I'm not competent enough to claim that authoritatively. In mvpp2 we have just recently switched to newer kernel (just applied to next-net) and with enough of testing mvneta will switch to it too. So I hope this is not a blocker. Best regards Andrzej
Re: [dpdk-dev] [PATCH v4 1/8] net/mvneta: add neta PMD skeleton
On 19.09.2018 18:28, Stephen Hemminger wrote: > On Wed, 19 Sep 2018 17:01:27 +0200 > Andrzej Ostruszka wrote: > >> +/** >> + * Create private device structure. >> + * >> + * @param dev_name >> + * Pointer to the port name passed in the initialization parameters. >> + * >> + * @return >> + * Pointer to the newly allocated private device structure. >> + */ >> +static struct mvneta_priv * >> +mvneta_priv_create(const char *dev_name) >> +{ >> +struct mvneta_priv *priv; >> + >> +priv = rte_zmalloc_socket(dev_name, sizeof(*priv), 0, rte_socket_id()); >> +if (!priv) >> +return NULL; >> + >> +return priv; >> +} > > Why make this a function, it really doesn't add anything over just doing it > inline. True. Removed it. >> +static int >> +mvneta_eth_dev_create(struct rte_vdev_device *vdev, const char *name) >> +{ >> +int ret, fd = socket(AF_INET, SOCK_DGRAM, 0); >> +struct rte_eth_dev *eth_dev; >> +struct mvneta_priv *priv; >> +struct ifreq req; >> + >> +eth_dev = rte_eth_dev_allocate(name); >> +if (!eth_dev) >> +return -ENOMEM; >> + >> +priv = mvneta_priv_create(name); >> + >> +if (!priv) { > > nit: no blank line needed. > >> +ret = -ENOMEM; >> +goto out_free_dev; > > You have error goto's backwards. > >> +} >> + >> +eth_dev->data->mac_addrs = >> +rte_zmalloc("mac_addrs", >> +ETHER_ADDR_LEN * MVNETA_MAC_ADDRS_MAX, 0); >> +if (!eth_dev->data->mac_addrs) { >> +MVNETA_LOG(ERR, "Failed to allocate space for eth addrs"); >> +ret = -ENOMEM; >> +goto out_free_priv; >> +} >> + >> +memset(&req, 0, sizeof(req)); >> +strcpy(req.ifr_name, name); > > > >> +out_free_mac: >> +rte_free(eth_dev->data->mac_addrs); >> +out_free_dev: >> +rte_eth_dev_release_port(eth_dev); >> +out_free_priv: >> +rte_free(priv); > > These are backwards: out_free_priv is called if ioctl fails and will > leak eth_dev port. Good catch - 'out_free_priv' should go before 'out_free_dev'! Thank you. The problem is not for the case of ioctl failure but I guess that is just a wording lapsus. Best regards Andrzej
[dpdk-dev] [PATCH] test: disable alarm autotest in FreeBSD
Disabled the alarm_autotest UT in FreeBSD Interrupts are not supported in FreeBSD. Alarm API depends on interrupts, so disabled alarm test on FreeBSD. Signed-off-by: Pallantla Poornima --- test/test/test_alarm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/test/test/test_alarm.c b/test/test/test_alarm.c index f566947f2..d1284b379 100644 --- a/test/test/test_alarm.c +++ b/test/test/test_alarm.c @@ -178,7 +178,10 @@ static int test_alarm(void) { int count = 0; - +#ifdef RTE_EXEC_ENV_BSDAPP + printf("The alarm API is not supported on FreeBSD\n"); + return 0; +#endif /* check if the callback will be called */ printf("check if the callback will be called\n"); flag = 0; -- 2.16.3
[dpdk-dev] [PATCH] test: disable alarm autotest in FreeBSD
Disabled the alarm_autotest UT in FreeBSD Interrupts are not supported in FreeBSD. Alarm API depends on interrupts, so disabled alarm test on FreeBSD. Signed-off-by: Pallantla Poornima --- test/test/test_alarm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/test/test/test_alarm.c b/test/test/test_alarm.c index f566947f2..d1284b379 100644 --- a/test/test/test_alarm.c +++ b/test/test/test_alarm.c @@ -178,7 +178,10 @@ static int test_alarm(void) { int count = 0; - +#ifdef RTE_EXEC_ENV_BSDAPP + printf("The alarm API is not supported on FreeBSD\n"); + return 0; +#endif /* check if the callback will be called */ printf("check if the callback will be called\n"); flag = 0; -- 2.16.3
Re: [dpdk-dev] [PATCH v2 1/3] config: use one single config option for C11 memory model
-Original Message- > Date: Wed, 19 Sep 2018 21:42:38 +0800 > From: Phil Yang > To: dev@dpdk.org > CC: n...@arm.com, jerin.ja...@caviumnetworks.com, > kkokkilaga...@caviumnetworks.com, honnappa.nagaraha...@arm.com, > gavin...@arm.com > Subject: [PATCH v2 1/3] config: use one single config option for C11 memory > model > X-Mailer: git-send-email 2.7.4 > > External Email > > Keep only single config option RTE_USE_C11_MEM_MODEL for C11 memory > model, so all modules can leverage C11 atomic extension by enable this > option. > > Fixes: 39368eb ("ring: introduce C11 memory model barrier option") IMO, Fixes is not required as you are not fixing anything in the existing code. Other than that, it looks good. With above change: Acked-by: Jerin Jacob > Signed-off-by: Phil Yang > Reviewed-by: Honnappa Nagarahalli > Reviewed-by: Gavin Hu > --- > config/arm/meson.build | 2 +- > config/common_armv8a_linuxapp| 2 +- > config/common_base | 2 +- > config/defconfig_arm64-thunderx-linuxapp-gcc | 2 +- > lib/librte_ring/rte_ring.h | 4 ++-- > 5 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/config/arm/meson.build b/config/arm/meson.build > index 94cca49..4b23b39 100644 > --- a/config/arm/meson.build > +++ b/config/arm/meson.build > @@ -53,7 +53,7 @@ flags_cavium = [ > ['RTE_MAX_NUMA_NODES', 2], > ['RTE_MAX_LCORE', 96], > ['RTE_MAX_VFIO_GROUPS', 128], > - ['RTE_RING_USE_C11_MEM_MODEL', false]] > + ['RTE_USE_C11_MEM_MODEL', false]] > flags_dpaa = [ > ['RTE_MACHINE', '"dpaa"'], > ['RTE_CACHE_LINE_SIZE', 64], > diff --git a/config/common_armv8a_linuxapp b/config/common_armv8a_linuxapp > index 111c005..54e6987 100644 > --- a/config/common_armv8a_linuxapp > +++ b/config/common_armv8a_linuxapp > @@ -29,7 +29,7 @@ CONFIG_RTE_ARCH_ARM64_MEMCPY=n > #CONFIG_RTE_ARM64_MEMCPY_ALIGN_MASK=0xF > #CONFIG_RTE_ARM64_MEMCPY_STRICT_ALIGN=n > > -CONFIG_RTE_RING_USE_C11_MEM_MODEL=y > +CONFIG_RTE_USE_C11_MEM_MODEL=y > > CONFIG_RTE_LIBRTE_FM10K_PMD=n > CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n > diff --git a/config/common_base b/config/common_base > index 155c7d4..ccd2670 100644 > --- a/config/common_base > +++ b/config/common_base > @@ -661,7 +661,7 @@ CONFIG_RTE_LIBRTE_PMD_IFPGA_RAWDEV=y > # Compile librte_ring > # > CONFIG_RTE_LIBRTE_RING=y > -CONFIG_RTE_RING_USE_C11_MEM_MODEL=n > +CONFIG_RTE_USE_C11_MEM_MODEL=n > > # > # Compile librte_mempool > diff --git a/config/defconfig_arm64-thunderx-linuxapp-gcc > b/config/defconfig_arm64-thunderx-linuxapp-gcc > index 2bed66c..f11e758 100644 > --- a/config/defconfig_arm64-thunderx-linuxapp-gcc > +++ b/config/defconfig_arm64-thunderx-linuxapp-gcc > @@ -10,7 +10,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=128 > CONFIG_RTE_MAX_NUMA_NODES=2 > CONFIG_RTE_MAX_LCORE=96 > CONFIG_RTE_MAX_VFIO_GROUPS=128 > -CONFIG_RTE_RING_USE_C11_MEM_MODEL=n > +CONFIG_RTE_USE_C11_MEM_MODEL=n > > # > # Compile PMD for octeontx sso event device > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > index 7a731d0..af5444a 100644 > --- a/lib/librte_ring/rte_ring.h > +++ b/lib/librte_ring/rte_ring.h > @@ -303,11 +303,11 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r); > * There are 2 choices for the users > * 1.use rmb() memory barrier > * 2.use one-direcion load_acquire/store_release barrier,defined by > - * CONFIG_RTE_RING_USE_C11_MEM_MODEL=y > + * CONFIG_RTE_USE_C11_MEM_MODEL=y > * It depends on performance test results. > * By default, move common functions to rte_ring_generic.h > */ > -#ifdef RTE_RING_USE_C11_MEM_MODEL > +#ifdef RTE_USE_C11_MEM_MODEL > #include "rte_ring_c11_mem.h" > #else > #include "rte_ring_generic.h" > -- > 2.7.4 >
Re: [dpdk-dev] [PATCH v2 2/3] kni: fix kni fifo synchronization
-Original Message- > Date: Wed, 19 Sep 2018 21:42:39 +0800 > From: Phil Yang > To: dev@dpdk.org > CC: n...@arm.com, jerin.ja...@caviumnetworks.com, > kkokkilaga...@caviumnetworks.com, honnappa.nagaraha...@arm.com, > gavin...@arm.com > Subject: [PATCH v2 2/3] kni: fix kni fifo synchronization > X-Mailer: git-send-email 2.7.4 > + Ferruh Yigit > > With existing code in kni_fifo_put, rx_q values are not being updated > before updating fifo_write. While reading rx_q in kni_net_rx_normal, > This is causing the sync issue on other core. The same situation happens > in kni_fifo_get as well. > > So syncing the values by adding C11 atomic memory barriers to make sure > the values being synced before updating fifo_write and fifo_read. > > Fixes: 3fc5ca2 ("kni: initial import") > Signed-off-by: Phil Yang > Reviewed-by: Honnappa Nagarahalli > Reviewed-by: Gavin Hu > --- > .../linuxapp/eal/include/exec-env/rte_kni_common.h | 5 > lib/librte_kni/rte_kni_fifo.h | 30 > +- > 2 files changed, 34 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > index cfa9448..1fd713b 100644 > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > @@ -54,8 +54,13 @@ struct rte_kni_request { > * Writing should never overwrite the read position > */ > struct rte_kni_fifo { > +#ifndef RTE_USE_C11_MEM_MODEL > volatile unsigned write; /**< Next position to be written*/ > volatile unsigned read; /**< Next position to be read */ > +#else > + unsigned write; /**< Next position to be written*/ > + unsigned read; /**< Next position to be read */ > +#endif > unsigned len;/**< Circular buffer length */ > unsigned elem_size; /**< Pointer size - for 32/64 bit OS */ > void *volatile buffer[]; /**< The buffer contains mbuf pointers */ > diff --git a/lib/librte_kni/rte_kni_fifo.h b/lib/librte_kni/rte_kni_fifo.h > index ac26a8c..f4171a1 100644 > --- a/lib/librte_kni/rte_kni_fifo.h > +++ b/lib/librte_kni/rte_kni_fifo.h > @@ -28,8 +28,13 @@ kni_fifo_put(struct rte_kni_fifo *fifo, void **data, > unsigned num) > { > unsigned i = 0; > unsigned fifo_write = fifo->write; > - unsigned fifo_read = fifo->read; > unsigned new_write = fifo_write; > +#ifdef RTE_USE_C11_MEM_MODEL > + unsigned fifo_read = __atomic_load_n(&fifo->read, > +__ATOMIC_ACQUIRE); > +#else > + unsigned fifo_read = fifo->read; > +#endif Correct. > > for (i = 0; i < num; i++) { > new_write = (new_write + 1) & (fifo->len - 1); > @@ -39,7 +44,12 @@ kni_fifo_put(struct rte_kni_fifo *fifo, void **data, > unsigned num) > fifo->buffer[fifo_write] = data[i]; > fifo_write = new_write; > } > +#ifdef RTE_USE_C11_MEM_MODEL > + __atomic_store_n(&fifo->write, fifo_write, __ATOMIC_RELEASE); > +#else > + rte_smp_wmb(); > fifo->write = fifo_write; > +#endif Correct. > return i; > } > > @@ -51,7 +61,12 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void **data, > unsigned num) > { > unsigned i = 0; > unsigned new_read = fifo->read; > +#ifdef RTE_USE_C11_MEM_MODEL > + unsigned fifo_write = __atomic_load_n(&fifo->write, __ATOMIC_ACQUIRE); > +#else > unsigned fifo_write = fifo->write; > +#endif Correct. > + > for (i = 0; i < num; i++) { > if (new_read == fifo_write) > break; > @@ -59,7 +74,12 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void **data, > unsigned num) > data[i] = fifo->buffer[new_read]; > new_read = (new_read + 1) & (fifo->len - 1); > } > +#ifdef RTE_USE_C11_MEM_MODEL > + __atomic_store_n(&fifo->read, new_read, __ATOMIC_RELEASE); > +#else > + rte_smp_wmb(); > fifo->read = new_read; > +#endif Correct. > return i; > } > > @@ -69,5 +89,13 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void **data, > unsigned num) > static inline uint32_t > kni_fifo_count(struct rte_kni_fifo *fifo) > { > +#ifdef RTE_USE_C11_MEM_MODEL > + unsigned fifo_write = __atomic_load_n(&fifo->write, > + __ATOMIC_ACQUIRE); > + unsigned fifo_read = __atomic_load_n(&fifo->read, > +__ATOMIC_ACQUIRE); Isn't too heavy to have two __ATOMIC_ACQUIREs? a simple rte_smp_rmb() would be enough here. Right? or Do we need __ATOMIC_ACQUIRE for fifo_write case? Other than that, I prefer to avoid ifdef clutter by introducing two separate file just like ring C11 implementation. I don't have strong opinion on this t
[dpdk-dev] How to replace rte_eth_dev_attach with rte_eal_hotplug_add
Hello, From dpdk 18.08 release rte_eth_dev_attach and rte_eth_dev_detach becom deprecated API and it is recommended to replace with rte_eal_hotplug_add and rte_eal_hotplug_remove. My program uses above mentioned deprecated APIs and have to replace those. Note that my program uses attach to attach vhost, pcap pmd. My question is whether it is correct to replace those as following: find rte_eth_dev_attach function in rte_ethdev.c and migrate those content into my program. e.g. lib/librte_ethdev/rte_ethdev.c line 643-686 for attach lib/librte_ethdev/rte_ethdev.c line 690-720 for detach Your advice/guidance are much appreciated. Thanks! BR, Hideyuki Yamashita - Hideyuki Yamashita NTT TechnoCross -
[dpdk-dev] [PATCH v5 0/8] Add Marvell NETA PMD
This patch series introduces new PMD for Marvell NETA adapters (MVNETA). See the documentation for more info. It is split for easier reviewing. v5: * fixed wrong order of clenup in mvneta_eth_dev_create() * inlined one auxilary function (mvneta_priv_create()) v4: * rebased on top of next-net (DEV_RX_OFFLOAD_CRC_STRIP removed) * Rx/Tx functionality moved to new mvneta_rxtx.c file * removed eth_mvneta alias (and docs updated accordingly) * fixed additional review comments v3: No changes against v2, just resubmitting again to have clean patch set after faulty format-patch. My apologies for the noise. v2: * fixed couple of checkpatch warnings * removed '\n' from MVNETA_LOG invocations (appended by the macro) * removed unused MVNETA_MUSDK_DMA_MEMSIZE define * changed one printf into MVNETA_LOG * removed __func__ from one MVNETA_LOG invocation (inserted automatically by the macro) * minor grammar/spelling correction in comments * removed license text from file with SPDX tag (mvneta.rst) * removed misleading part of comment for mvneta_shadow_txq * changed authorship of the patches to the original author Natalie Samsonov (1): net/mvneta: add reset statistics callback Zyta Szpak (7): net/mvneta: add neta PMD skeleton net/mvneta: add Rx/Tx support net/mvneta: support for setting of MTU net/mvneta: add link update net/mvneta: support for promiscuous net/mvneta: add MAC filtering net/mvneta: add support for basic stats MAINTAINERS |8 + config/common_base|5 + devtools/test-build.sh|2 + doc/guides/nics/features/mvneta.ini | 19 + doc/guides/nics/mvneta.rst| 161 doc/guides/rel_notes/release_18_11.rst|4 + drivers/common/Makefile |4 +- drivers/common/mvep/rte_mvep_common.h |1 + drivers/net/Makefile |1 + drivers/net/meson.build |1 + drivers/net/mvneta/Makefile | 42 + drivers/net/mvneta/meson.build| 28 + drivers/net/mvneta/mvneta_ethdev.c| 1019 + drivers/net/mvneta/mvneta_ethdev.h| 80 ++ drivers/net/mvneta/mvneta_rxtx.c | 850 + drivers/net/mvneta/mvneta_rxtx.h | 168 drivers/net/mvneta/rte_pmd_mvneta_version.map |3 + mk/rte.app.mk |7 +- 18 files changed, 2400 insertions(+), 3 deletions(-) create mode 100644 doc/guides/nics/features/mvneta.ini create mode 100644 doc/guides/nics/mvneta.rst create mode 100644 drivers/net/mvneta/Makefile create mode 100644 drivers/net/mvneta/meson.build create mode 100644 drivers/net/mvneta/mvneta_ethdev.c create mode 100644 drivers/net/mvneta/mvneta_ethdev.h create mode 100644 drivers/net/mvneta/mvneta_rxtx.c create mode 100644 drivers/net/mvneta/mvneta_rxtx.h create mode 100644 drivers/net/mvneta/rte_pmd_mvneta_version.map -- 2.7.4
[dpdk-dev] [PATCH v5 1/8] net/mvneta: add neta PMD skeleton
From: Zyta Szpak Add neta pmd driver skeleton providing base for the further development. Signed-off-by: Natalie Samsonov Signed-off-by: Yelena Krivosheev Signed-off-by: Dmitri Epshtein Signed-off-by: Zyta Szpak Signed-off-by: Andrzej Ostruszka --- MAINTAINERS | 8 + config/common_base| 5 + devtools/test-build.sh| 2 + doc/guides/nics/features/mvneta.ini | 11 + doc/guides/nics/mvneta.rst| 152 +++ doc/guides/rel_notes/release_18_11.rst| 4 + drivers/common/Makefile | 4 +- drivers/common/mvep/rte_mvep_common.h | 1 + drivers/net/Makefile | 1 + drivers/net/meson.build | 1 + drivers/net/mvneta/Makefile | 42 ++ drivers/net/mvneta/meson.build| 27 ++ drivers/net/mvneta/mvneta_ethdev.c| 629 ++ drivers/net/mvneta/mvneta_ethdev.h| 75 +++ drivers/net/mvneta/rte_pmd_mvneta_version.map | 3 + mk/rte.app.mk | 7 +- 16 files changed, 969 insertions(+), 3 deletions(-) create mode 100644 doc/guides/nics/features/mvneta.ini create mode 100644 doc/guides/nics/mvneta.rst create mode 100644 drivers/net/mvneta/Makefile create mode 100644 drivers/net/mvneta/meson.build create mode 100644 drivers/net/mvneta/mvneta_ethdev.c create mode 100644 drivers/net/mvneta/mvneta_ethdev.h create mode 100644 drivers/net/mvneta/rte_pmd_mvneta_version.map diff --git a/MAINTAINERS b/MAINTAINERS index 5967c1d..bbc4d40 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -586,6 +586,14 @@ F: drivers/net/mvpp2/ F: doc/guides/nics/mvpp2.rst F: doc/guides/nics/features/mvpp2.ini +Marvell mvneta +M: Zyta Szpak +M: Dmitri Epshtein +M: Natalie Samsonov +F: drivers/net/mvneta/ +F: doc/guides/nics/mvneta.rst +F: doc/guides/nics/features/mvneta.ini + Mellanox mlx4 M: Matan Azrad M: Shahaf Shuler diff --git a/config/common_base b/config/common_base index 155c7d4..1f8410b 100644 --- a/config/common_base +++ b/config/common_base @@ -400,6 +400,11 @@ CONFIG_RTE_LIBRTE_PMD_FAILSAFE=y CONFIG_RTE_LIBRTE_MVPP2_PMD=n # +# Compile Marvell MVNETA PMD driver +# +CONFIG_RTE_LIBRTE_MVNETA_PMD=n + +# # Compile support for VMBus library # CONFIG_RTE_LIBRTE_VMBUS=n diff --git a/devtools/test-build.sh b/devtools/test-build.sh index 1eee241..2990978 100755 --- a/devtools/test-build.sh +++ b/devtools/test-build.sh @@ -182,6 +182,8 @@ config () # sed -ri's,(PMD_MVSAM_CRYPTO=)n,\1y,' $1/.config test -z "$LIBMUSDK_PATH" || \ sed -ri 's,(MVPP2_PMD=)n,\1y,' $1/.config + test -z "$LIBMUSDK_PATH" || \ + sed -ri 's,(MVNETA_PMD=)n,\1y,' $1/.config build_config_hook $1 $2 $3 # Explicit enabler/disabler (uppercase) diff --git a/doc/guides/nics/features/mvneta.ini b/doc/guides/nics/features/mvneta.ini new file mode 100644 index 000..ba6fe4b --- /dev/null +++ b/doc/guides/nics/features/mvneta.ini @@ -0,0 +1,11 @@ +; +; Supported features of the 'mvneta' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Speed capabilities = Y +Jumbo frame = Y +CRC offload = Y +ARMv8= Y +Usage doc= Y diff --git a/doc/guides/nics/mvneta.rst b/doc/guides/nics/mvneta.rst new file mode 100644 index 000..bf08417 --- /dev/null +++ b/doc/guides/nics/mvneta.rst @@ -0,0 +1,152 @@ +.. SPDX-License-Identifier: BSD-3-Clause +Copyright(c) 2018 Marvell International Ltd. +Copyright(c) 2018 Semihalf. +All rights reserved. + +MVNETA Poll Mode Driver +=== + +The MVNETA PMD (librte_pmd_mvneta) provides poll mode driver support +for the Marvell NETA 1/2.5 Gbps adapter. + +Detailed information about SoCs that use PPv2 can be obtained here: + +* https://www.marvell.com/embedded-processors/armada-3700/ + +.. Note:: + + Due to external dependencies, this driver is disabled by default. It must + be enabled manually by setting relevant configuration option manually. + Please refer to `Config File Options`_ section for further details. + + +Features + + +Features of the MVNETA PMD are: + +- Start/stop +- tx/rx_queue_setup +- Speed capabilities +- Jumbo frame +- CRC offload + + +Limitations +--- + +- Flushing vlans added for filtering is not possible due to MUSDK missing + functionality. Current workaround is to reset board so that NETA has a + chance to start in a sane state. + +Prerequisites +- + +- Custom Linux Kernel sources + + .. code-block:: console + + git clone https://github.com/MarvellEmbeddedProcessors/linux-marvell.git -b linux-4.4.52-armada-17.10 + + +- MUSDK (Marvell User-Space SDK) sources + + .. code-block:: console +
[dpdk-dev] [PATCH v5 2/8] net/mvneta: add Rx/Tx support
From: Zyta Szpak Add part of PMD for actual reception/transmission. Signed-off-by: Yelena Krivosheev Signed-off-by: Dmitri Epshtein Signed-off-by: Zyta Szpak --- doc/guides/nics/features/mvneta.ini | 3 + doc/guides/nics/mvneta.rst | 4 + drivers/net/mvneta/Makefile | 2 +- drivers/net/mvneta/meson.build | 3 +- drivers/net/mvneta/mvneta_ethdev.c | 51 ++- drivers/net/mvneta/mvneta_ethdev.h | 4 + drivers/net/mvneta/mvneta_rxtx.c| 850 drivers/net/mvneta/mvneta_rxtx.h| 168 +++ 8 files changed, 1080 insertions(+), 5 deletions(-) create mode 100644 drivers/net/mvneta/mvneta_rxtx.c create mode 100644 drivers/net/mvneta/mvneta_rxtx.h diff --git a/doc/guides/nics/features/mvneta.ini b/doc/guides/nics/features/mvneta.ini index ba6fe4b..0a89e2f 100644 --- a/doc/guides/nics/features/mvneta.ini +++ b/doc/guides/nics/features/mvneta.ini @@ -7,5 +7,8 @@ Speed capabilities = Y Jumbo frame = Y CRC offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y ARMv8= Y Usage doc= Y diff --git a/doc/guides/nics/mvneta.rst b/doc/guides/nics/mvneta.rst index bf08417..9d25c40 100644 --- a/doc/guides/nics/mvneta.rst +++ b/doc/guides/nics/mvneta.rst @@ -27,9 +27,13 @@ Features of the MVNETA PMD are: - Start/stop - tx/rx_queue_setup +- tx/rx_burst - Speed capabilities - Jumbo frame - CRC offload +- L3 checksum offload +- L4 checksum offload +- Packet type parsing Limitations diff --git a/drivers/net/mvneta/Makefile b/drivers/net/mvneta/Makefile index 149992e..349f550 100644 --- a/drivers/net/mvneta/Makefile +++ b/drivers/net/mvneta/Makefile @@ -37,6 +37,6 @@ LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_cfgfile LDLIBS += -lrte_bus_vdev # library source files -SRCS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta_ethdev.c +SRCS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta_ethdev.c mvneta_rxtx.c include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/mvneta/meson.build b/drivers/net/mvneta/meson.build index 2f31954..c0b1bce 100644 --- a/drivers/net/mvneta/meson.build +++ b/drivers/net/mvneta/meson.build @@ -21,7 +21,8 @@ else endif sources = files( - 'mvneta_ethdev.c' + 'mvneta_ethdev.c', + 'mvneta_rxtx.c' ) deps += ['cfgfile', 'common_mvep'] diff --git a/drivers/net/mvneta/mvneta_ethdev.c b/drivers/net/mvneta/mvneta_ethdev.c index 9ee197a..331cd1d 100644 --- a/drivers/net/mvneta/mvneta_ethdev.c +++ b/drivers/net/mvneta/mvneta_ethdev.c @@ -6,8 +6,6 @@ #include #include -#include -#include #include #include @@ -23,7 +21,7 @@ #include -#include "mvneta_ethdev.h" +#include "mvneta_rxtx.h" #define MVNETA_IFACE_NAME_ARG "iface" @@ -308,6 +306,18 @@ mvneta_dev_start(struct rte_eth_dev *dev) priv->uc_mc_flushed = 1; } + /* Allocate buffers */ + for (i = 0; i < dev->data->nb_rx_queues; i++) { + struct mvneta_rxq *rxq = dev->data->rx_queues[i]; + int num = rxq->size; + + ret = mvneta_buffs_alloc(priv, rxq, &num); + if (ret || num != rxq->size) { + rte_free(rxq); + return ret; + } + } + ret = mvneta_dev_set_link_up(dev); if (ret) { MVNETA_LOG(ERR, "Failed to set link up"); @@ -318,6 +328,8 @@ mvneta_dev_start(struct rte_eth_dev *dev) for (i = 0; i < dev->data->nb_tx_queues; i++) dev->data->tx_queue_state[i] = RTE_ETH_QUEUE_STATE_STARTED; + mvneta_set_tx_function(dev); + return 0; out: @@ -336,11 +348,25 @@ static void mvneta_dev_stop(struct rte_eth_dev *dev) { struct mvneta_priv *priv = dev->data->dev_private; + int i; if (!priv->ppio) return; mvneta_dev_set_link_down(dev); + MVNETA_LOG(INFO, "Flushing rx queues"); + for (i = 0; i < dev->data->nb_rx_queues; i++) { + struct mvneta_rxq *rxq = dev->data->rx_queues[i]; + + mvneta_rx_queue_flush(rxq); + } + + MVNETA_LOG(INFO, "Flushing tx queues"); + for (i = 0; i < dev->data->nb_tx_queues; i++) { + struct mvneta_txq *txq = dev->data->tx_queues[i]; + + mvneta_tx_queue_flush(txq); + } neta_ppio_deinit(priv->ppio); @@ -357,9 +383,20 @@ static void mvneta_dev_close(struct rte_eth_dev *dev) { struct mvneta_priv *priv = dev->data->dev_private; + int i; if (priv->ppio) mvneta_dev_stop(dev); + + for (i = 0; i < dev->data->nb_rx_queues; i++) { + mvneta_rx_queue_release(dev->data->rx_queues[i]); + dev->data->rx_queues[i] = NULL; + } + + for (i = 0; i < dev->data->nb_tx_queues; i++) { + mvneta_tx_queue_release(dev->data->tx_queues[i]); + dev->data->tx_queues[i] = NULL
[dpdk-dev] [PATCH v5 3/8] net/mvneta: support for setting of MTU
From: Zyta Szpak Add callback for setting of MTU. Signed-off-by: Natalie Samsonov Signed-off-by: Zyta Szpak --- doc/guides/nics/features/mvneta.ini | 1 + doc/guides/nics/mvneta.rst | 1 + drivers/net/mvneta/mvneta_ethdev.c | 78 + 3 files changed, 80 insertions(+) diff --git a/doc/guides/nics/features/mvneta.ini b/doc/guides/nics/features/mvneta.ini index 0a89e2f..bc4e400 100644 --- a/doc/guides/nics/features/mvneta.ini +++ b/doc/guides/nics/features/mvneta.ini @@ -5,6 +5,7 @@ ; [Features] Speed capabilities = Y +MTU update = Y Jumbo frame = Y CRC offload = Y L3 checksum offload = Y diff --git a/doc/guides/nics/mvneta.rst b/doc/guides/nics/mvneta.rst index 9d25c40..55ffe57 100644 --- a/doc/guides/nics/mvneta.rst +++ b/doc/guides/nics/mvneta.rst @@ -30,6 +30,7 @@ Features of the MVNETA PMD are: - tx/rx_burst - Speed capabilities - Jumbo frame +- MTU update - CRC offload - L3 checksum offload - L4 checksum offload diff --git a/drivers/net/mvneta/mvneta_ethdev.c b/drivers/net/mvneta/mvneta_ethdev.c index 331cd1d..1b26c87 100644 --- a/drivers/net/mvneta/mvneta_ethdev.c +++ b/drivers/net/mvneta/mvneta_ethdev.c @@ -221,6 +221,77 @@ mvneta_dev_supported_ptypes_get(struct rte_eth_dev *dev __rte_unused) } /** + * DPDK callback to change the MTU. + * + * Setting the MTU affects hardware MRU (packets larger than the MRU + * will be dropped). + * + * @param dev + * Pointer to Ethernet device structure. + * @param mtu + * New MTU. + * + * @return + * 0 on success, negative error value otherwise. + */ +static int +mvneta_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) +{ + struct mvneta_priv *priv = dev->data->dev_private; + uint16_t mbuf_data_size = 0; /* SW buffer size */ + uint16_t mru; + int ret; + + mru = MRVL_NETA_MTU_TO_MRU(mtu); + /* +* min_rx_buf_size is equal to mbuf data size +* if pmd didn't set it differently +*/ + mbuf_data_size = dev->data->min_rx_buf_size - RTE_PKTMBUF_HEADROOM; + /* Prevent PMD from: +* - setting mru greater than the mbuf size resulting in +* hw and sw buffer size mismatch +* - setting mtu that requires the support of scattered packets +* when this feature has not been enabled/supported so far. +*/ + if (!dev->data->scattered_rx && + (mru + MRVL_NETA_PKT_OFFS > mbuf_data_size)) { + mru = mbuf_data_size - MRVL_NETA_PKT_OFFS; + mtu = MRVL_NETA_MRU_TO_MTU(mru); + MVNETA_LOG(WARNING, "MTU too big, max MTU possible limitted by" + " current mbuf size: %u. Set MTU to %u, MRU to %u", + mbuf_data_size, mtu, mru); + } + + if (mtu < ETHER_MIN_MTU || mru > MVNETA_PKT_SIZE_MAX) { + MVNETA_LOG(ERR, "Invalid MTU [%u] or MRU [%u]", mtu, mru); + return -EINVAL; + } + + dev->data->mtu = mtu; + dev->data->dev_conf.rxmode.max_rx_pkt_len = mru - MV_MH_SIZE; + + if (!priv->ppio) + /* It is OK. New MTU will be set later on mvneta_dev_start */ + return 0; + + ret = neta_ppio_set_mru(priv->ppio, mru); + if (ret) { + MVNETA_LOG(ERR, "Failed to change MRU"); + return ret; + } + + ret = neta_ppio_set_mtu(priv->ppio, mtu); + if (ret) { + MVNETA_LOG(ERR, "Failed to change MTU"); + return ret; + } + MVNETA_LOG(INFO, "MTU changed to %u, MRU = %u", mtu, mru); + + return 0; +} + +/** * DPDK callback to bring the link up. * * @param dev @@ -318,6 +389,12 @@ mvneta_dev_start(struct rte_eth_dev *dev) } } + ret = mvneta_mtu_set(dev, dev->data->mtu); + if (ret) { + MVNETA_LOG(ERR, "Failed to set MTU %d", dev->data->mtu); + goto out; + } + ret = mvneta_dev_set_link_up(dev); if (ret) { MVNETA_LOG(ERR, "Failed to set link up"); @@ -433,6 +510,7 @@ static const struct eth_dev_ops mvneta_ops = { .dev_set_link_down = mvneta_dev_set_link_down, .dev_close = mvneta_dev_close, .mac_addr_set = mvneta_mac_addr_set, + .mtu_set = mvneta_mtu_set, .dev_infos_get = mvneta_dev_infos_get, .dev_supported_ptypes_get = mvneta_dev_supported_ptypes_get, .rxq_info_get = mvneta_rxq_info_get, -- 2.7.4
[dpdk-dev] [PATCH v5 4/8] net/mvneta: add link update
From: Zyta Szpak Add callback for updating information about link status/info. Signed-off-by: Natalie Samsonov Signed-off-by: Zyta Szpak --- doc/guides/nics/features/mvneta.ini | 1 + doc/guides/nics/mvneta.rst | 1 + drivers/net/mvneta/mvneta_ethdev.c | 71 + 3 files changed, 73 insertions(+) diff --git a/doc/guides/nics/features/mvneta.ini b/doc/guides/nics/features/mvneta.ini index bc4e400..581ed31 100644 --- a/doc/guides/nics/features/mvneta.ini +++ b/doc/guides/nics/features/mvneta.ini @@ -5,6 +5,7 @@ ; [Features] Speed capabilities = Y +Link status = Y MTU update = Y Jumbo frame = Y CRC offload = Y diff --git a/doc/guides/nics/mvneta.rst b/doc/guides/nics/mvneta.rst index 55ffe57..85cdd1d 100644 --- a/doc/guides/nics/mvneta.rst +++ b/doc/guides/nics/mvneta.rst @@ -31,6 +31,7 @@ Features of the MVNETA PMD are: - Speed capabilities - Jumbo frame - MTU update +- Link status - CRC offload - L3 checksum offload - L4 checksum offload diff --git a/drivers/net/mvneta/mvneta_ethdev.c b/drivers/net/mvneta/mvneta_ethdev.c index 1b26c87..e9cbff7 100644 --- a/drivers/net/mvneta/mvneta_ethdev.c +++ b/drivers/net/mvneta/mvneta_ethdev.c @@ -477,6 +477,76 @@ mvneta_dev_close(struct rte_eth_dev *dev) } /** + * DPDK callback to retrieve physical link information. + * + * @param dev + * Pointer to Ethernet device structure. + * @param wait_to_complete + * Wait for request completion (ignored). + * + * @return + * 0 on success, negative error value otherwise. + */ +static int +mvneta_link_update(struct rte_eth_dev *dev, int wait_to_complete __rte_unused) +{ + /* +* TODO +* once MUSDK provides necessary API use it here +*/ + struct mvneta_priv *priv = dev->data->dev_private; + struct ethtool_cmd edata; + struct ifreq req; + int ret, fd, link_up; + + if (!priv->ppio) + return -EPERM; + + edata.cmd = ETHTOOL_GSET; + + strcpy(req.ifr_name, dev->data->name); + req.ifr_data = (void *)&edata; + + fd = socket(AF_INET, SOCK_DGRAM, 0); + if (fd == -1) + return -EFAULT; + ret = ioctl(fd, SIOCETHTOOL, &req); + if (ret == -1) { + close(fd); + return -EFAULT; + } + + close(fd); + + switch (ethtool_cmd_speed(&edata)) { + case SPEED_10: + dev->data->dev_link.link_speed = ETH_SPEED_NUM_10M; + break; + case SPEED_100: + dev->data->dev_link.link_speed = ETH_SPEED_NUM_100M; + break; + case SPEED_1000: + dev->data->dev_link.link_speed = ETH_SPEED_NUM_1G; + break; + case SPEED_2500: + dev->data->dev_link.link_speed = ETH_SPEED_NUM_2_5G; + break; + default: + dev->data->dev_link.link_speed = ETH_SPEED_NUM_NONE; + } + + dev->data->dev_link.link_duplex = edata.duplex ? ETH_LINK_FULL_DUPLEX : +ETH_LINK_HALF_DUPLEX; + dev->data->dev_link.link_autoneg = edata.autoneg ? ETH_LINK_AUTONEG : + ETH_LINK_FIXED; + + neta_ppio_get_link_state(priv->ppio, &link_up); + dev->data->dev_link.link_status = link_up ? ETH_LINK_UP : ETH_LINK_DOWN; + + return 0; +} + +/** * DPDK callback to set the primary MAC address. * * @param dev @@ -509,6 +579,7 @@ static const struct eth_dev_ops mvneta_ops = { .dev_set_link_up = mvneta_dev_set_link_up, .dev_set_link_down = mvneta_dev_set_link_down, .dev_close = mvneta_dev_close, + .link_update = mvneta_link_update, .mac_addr_set = mvneta_mac_addr_set, .mtu_set = mvneta_mtu_set, .dev_infos_get = mvneta_dev_infos_get, -- 2.7.4
[dpdk-dev] [PATCH v5 5/8] net/mvneta: support for promiscuous
From: Zyta Szpak Add callbacks for enabling/disabling of promiscuous mode. Signed-off-by: Yelena Krivosheev Signed-off-by: Zyta Szpak --- doc/guides/nics/features/mvneta.ini | 1 + doc/guides/nics/mvneta.rst | 1 + drivers/net/mvneta/mvneta_ethdev.c | 54 + 3 files changed, 56 insertions(+) diff --git a/doc/guides/nics/features/mvneta.ini b/doc/guides/nics/features/mvneta.ini index 581ed31..6a140a3 100644 --- a/doc/guides/nics/features/mvneta.ini +++ b/doc/guides/nics/features/mvneta.ini @@ -8,6 +8,7 @@ Speed capabilities = Y Link status = Y MTU update = Y Jumbo frame = Y +Promiscuous mode = Y CRC offload = Y L3 checksum offload = Y L4 checksum offload = Y diff --git a/doc/guides/nics/mvneta.rst b/doc/guides/nics/mvneta.rst index 85cdd1d..1912c3e 100644 --- a/doc/guides/nics/mvneta.rst +++ b/doc/guides/nics/mvneta.rst @@ -31,6 +31,7 @@ Features of the MVNETA PMD are: - Speed capabilities - Jumbo frame - MTU update +- Promiscuous mode - Link status - CRC offload - L3 checksum offload diff --git a/drivers/net/mvneta/mvneta_ethdev.c b/drivers/net/mvneta/mvneta_ethdev.c index e9cbff7..04881d5 100644 --- a/drivers/net/mvneta/mvneta_ethdev.c +++ b/drivers/net/mvneta/mvneta_ethdev.c @@ -547,6 +547,58 @@ mvneta_link_update(struct rte_eth_dev *dev, int wait_to_complete __rte_unused) } /** + * DPDK callback to enable promiscuous mode. + * + * @param dev + * Pointer to Ethernet device structure. + */ +static void +mvneta_promiscuous_enable(struct rte_eth_dev *dev) +{ + struct mvneta_priv *priv = dev->data->dev_private; + int ret, en; + + if (!priv->ppio) + return; + + neta_ppio_get_promisc(priv->ppio, &en); + if (en) { + MVNETA_LOG(INFO, "Promiscuous already enabled"); + return; + } + + ret = neta_ppio_set_promisc(priv->ppio, 1); + if (ret) + MVNETA_LOG(ERR, "Failed to enable promiscuous mode"); +} + +/** + * DPDK callback to disable allmulticast mode. + * + * @param dev + * Pointer to Ethernet device structure. + */ +static void +mvneta_promiscuous_disable(struct rte_eth_dev *dev) +{ + struct mvneta_priv *priv = dev->data->dev_private; + int ret, en; + + if (!priv->ppio) + return; + + neta_ppio_get_promisc(priv->ppio, &en); + if (!en) { + MVNETA_LOG(INFO, "Promiscuous already disabled"); + return; + } + + ret = neta_ppio_set_promisc(priv->ppio, 0); + if (ret) + MVNETA_LOG(ERR, "Failed to disable promiscuous mode"); +} + +/** * DPDK callback to set the primary MAC address. * * @param dev @@ -580,6 +632,8 @@ static const struct eth_dev_ops mvneta_ops = { .dev_set_link_down = mvneta_dev_set_link_down, .dev_close = mvneta_dev_close, .link_update = mvneta_link_update, + .promiscuous_enable = mvneta_promiscuous_enable, + .promiscuous_disable = mvneta_promiscuous_disable, .mac_addr_set = mvneta_mac_addr_set, .mtu_set = mvneta_mtu_set, .dev_infos_get = mvneta_dev_infos_get, -- 2.7.4
[dpdk-dev] [PATCH v5 7/8] net/mvneta: add support for basic stats
From: Zyta Szpak Add support for getting of basic statistics for the driver. Signed-off-by: Yelena Krivosheev Signed-off-by: Natalie Samsonov Signed-off-by: Zyta Szpak --- doc/guides/nics/features/mvneta.ini | 1 + doc/guides/nics/mvneta.rst | 1 + drivers/net/mvneta/mvneta_ethdev.c | 47 + 3 files changed, 49 insertions(+) diff --git a/doc/guides/nics/features/mvneta.ini b/doc/guides/nics/features/mvneta.ini index 59c9c36..701eb03 100644 --- a/doc/guides/nics/features/mvneta.ini +++ b/doc/guides/nics/features/mvneta.ini @@ -14,5 +14,6 @@ CRC offload = Y L3 checksum offload = Y L4 checksum offload = Y Packet type parsing = Y +Basic stats = Y ARMv8= Y Usage doc= Y diff --git a/doc/guides/nics/mvneta.rst b/doc/guides/nics/mvneta.rst index b21581b..5bcf3b6 100644 --- a/doc/guides/nics/mvneta.rst +++ b/doc/guides/nics/mvneta.rst @@ -38,6 +38,7 @@ Features of the MVNETA PMD are: - L3 checksum offload - L4 checksum offload - Packet type parsing +- Basic stats Limitations diff --git a/drivers/net/mvneta/mvneta_ethdev.c b/drivers/net/mvneta/mvneta_ethdev.c index 09b98de..c4ba42b 100644 --- a/drivers/net/mvneta/mvneta_ethdev.c +++ b/drivers/net/mvneta/mvneta_ethdev.c @@ -691,6 +691,52 @@ mvneta_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr) return 0; } +/** + * DPDK callback to get device statistics. + * + * @param dev + * Pointer to Ethernet device structure. + * @param stats + * Stats structure output buffer. + * + * @return + * 0 on success, negative error value otherwise. + */ +static int +mvneta_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) +{ + struct mvneta_priv *priv = dev->data->dev_private; + struct neta_ppio_statistics ppio_stats; + unsigned int ret; + + if (!priv->ppio) + return -EPERM; + + ret = neta_ppio_get_statistics(priv->ppio, &ppio_stats); + if (unlikely(ret)) { + MVNETA_LOG(ERR, "Failed to update port statistics"); + return ret; + } + + stats->ipackets += ppio_stats.rx_packets + + ppio_stats.rx_broadcast_packets + + ppio_stats.rx_multicast_packets; + stats->opackets += ppio_stats.tx_packets + + ppio_stats.tx_broadcast_packets + + ppio_stats.tx_multicast_packets; + stats->ibytes += ppio_stats.rx_bytes; + stats->obytes += ppio_stats.tx_bytes; + stats->imissed += ppio_stats.rx_discard + + ppio_stats.rx_overrun; + + stats->ierrors = ppio_stats.rx_packets_err + + ppio_stats.rx_errors + + ppio_stats.rx_crc_error; + stats->oerrors = ppio_stats.tx_errors; + + return 0; +} + static const struct eth_dev_ops mvneta_ops = { .dev_configure = mvneta_dev_configure, .dev_start = mvneta_dev_start, @@ -705,6 +751,7 @@ static const struct eth_dev_ops mvneta_ops = { .mac_addr_add = mvneta_mac_addr_add, .mac_addr_set = mvneta_mac_addr_set, .mtu_set = mvneta_mtu_set, + .stats_get = mvneta_stats_get, .dev_infos_get = mvneta_dev_infos_get, .dev_supported_ptypes_get = mvneta_dev_supported_ptypes_get, .rxq_info_get = mvneta_rxq_info_get, -- 2.7.4
[dpdk-dev] [PATCH v5 6/8] net/mvneta: add MAC filtering
From: Zyta Szpak Add callbacks for adding/removing MAC addresses. Signed-off-by: Yelena Krivosheev Signed-off-by: Natalie Samsonov Signed-off-by: Zyta Szpak --- doc/guides/nics/features/mvneta.ini | 1 + doc/guides/nics/mvneta.rst | 1 + drivers/net/mvneta/mvneta_ethdev.c | 69 + 3 files changed, 71 insertions(+) diff --git a/doc/guides/nics/features/mvneta.ini b/doc/guides/nics/features/mvneta.ini index 6a140a3..59c9c36 100644 --- a/doc/guides/nics/features/mvneta.ini +++ b/doc/guides/nics/features/mvneta.ini @@ -9,6 +9,7 @@ Link status = Y MTU update = Y Jumbo frame = Y Promiscuous mode = Y +Unicast MAC filter = Y CRC offload = Y L3 checksum offload = Y L4 checksum offload = Y diff --git a/doc/guides/nics/mvneta.rst b/doc/guides/nics/mvneta.rst index 1912c3e..b21581b 100644 --- a/doc/guides/nics/mvneta.rst +++ b/doc/guides/nics/mvneta.rst @@ -32,6 +32,7 @@ Features of the MVNETA PMD are: - Jumbo frame - MTU update - Promiscuous mode +- Unicast MAC filter - Link status - CRC offload - L3 checksum offload diff --git a/drivers/net/mvneta/mvneta_ethdev.c b/drivers/net/mvneta/mvneta_ethdev.c index 04881d5..09b98de 100644 --- a/drivers/net/mvneta/mvneta_ethdev.c +++ b/drivers/net/mvneta/mvneta_ethdev.c @@ -599,6 +599,73 @@ mvneta_promiscuous_disable(struct rte_eth_dev *dev) } /** + * DPDK callback to remove a MAC address. + * + * @param dev + * Pointer to Ethernet device structure. + * @param index + * MAC address index. + */ +static void +mvneta_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index) +{ + struct mvneta_priv *priv = dev->data->dev_private; + char buf[ETHER_ADDR_FMT_SIZE]; + int ret; + + if (!priv->ppio) + return; + + ret = neta_ppio_remove_mac_addr(priv->ppio, + dev->data->mac_addrs[index].addr_bytes); + if (ret) { + ether_format_addr(buf, sizeof(buf), + &dev->data->mac_addrs[index]); + MVNETA_LOG(ERR, "Failed to remove mac %s", buf); + } +} + +/** + * DPDK callback to add a MAC address. + * + * @param dev + * Pointer to Ethernet device structure. + * @param mac_addr + * MAC address to register. + * @param index + * MAC address index. + * @param vmdq + * VMDq pool index to associate address with (unused). + * + * @return + * 0 on success, negative error value otherwise. + */ +static int +mvneta_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr, + uint32_t index, uint32_t vmdq __rte_unused) +{ + struct mvneta_priv *priv = dev->data->dev_private; + char buf[ETHER_ADDR_FMT_SIZE]; + int ret; + + if (index == 0) + /* For setting index 0, mrvl_mac_addr_set() should be used.*/ + return -1; + + if (!priv->ppio) + return 0; + + ret = neta_ppio_add_mac_addr(priv->ppio, mac_addr->addr_bytes); + if (ret) { + ether_format_addr(buf, sizeof(buf), mac_addr); + MVNETA_LOG(ERR, "Failed to add mac %s", buf); + return -1; + } + + return 0; +} + +/** * DPDK callback to set the primary MAC address. * * @param dev @@ -634,6 +701,8 @@ static const struct eth_dev_ops mvneta_ops = { .link_update = mvneta_link_update, .promiscuous_enable = mvneta_promiscuous_enable, .promiscuous_disable = mvneta_promiscuous_disable, + .mac_addr_remove = mvneta_mac_addr_remove, + .mac_addr_add = mvneta_mac_addr_add, .mac_addr_set = mvneta_mac_addr_set, .mtu_set = mvneta_mtu_set, .dev_infos_get = mvneta_dev_infos_get, -- 2.7.4
[dpdk-dev] [PATCH v5 8/8] net/mvneta: add reset statistics callback
From: Natalie Samsonov Add support for resetting of driver statistics. Signed-off-by: Natalie Samsonov --- drivers/net/mvneta/mvneta_ethdev.c | 40 +++--- drivers/net/mvneta/mvneta_ethdev.h | 1 + 2 files changed, 34 insertions(+), 7 deletions(-) diff --git a/drivers/net/mvneta/mvneta_ethdev.c b/drivers/net/mvneta/mvneta_ethdev.c index c4ba42b..c5d190d 100644 --- a/drivers/net/mvneta/mvneta_ethdev.c +++ b/drivers/net/mvneta/mvneta_ethdev.c @@ -720,23 +720,48 @@ mvneta_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) stats->ipackets += ppio_stats.rx_packets + ppio_stats.rx_broadcast_packets + - ppio_stats.rx_multicast_packets; + ppio_stats.rx_multicast_packets - + priv->prev_stats.ipackets; stats->opackets += ppio_stats.tx_packets + ppio_stats.tx_broadcast_packets + - ppio_stats.tx_multicast_packets; - stats->ibytes += ppio_stats.rx_bytes; - stats->obytes += ppio_stats.tx_bytes; + ppio_stats.tx_multicast_packets - + priv->prev_stats.opackets; + stats->ibytes += ppio_stats.rx_bytes - priv->prev_stats.ibytes; + stats->obytes += ppio_stats.tx_bytes - priv->prev_stats.obytes; stats->imissed += ppio_stats.rx_discard + - ppio_stats.rx_overrun; + ppio_stats.rx_overrun - + priv->prev_stats.imissed; stats->ierrors = ppio_stats.rx_packets_err + ppio_stats.rx_errors + - ppio_stats.rx_crc_error; - stats->oerrors = ppio_stats.tx_errors; + ppio_stats.rx_crc_error - + priv->prev_stats.ierrors; + stats->oerrors = ppio_stats.tx_errors - priv->prev_stats.oerrors; return 0; } +/** + * DPDK callback to clear device statistics. + * + * @param dev + * Pointer to Ethernet device structure. + */ +static void +mvneta_stats_reset(struct rte_eth_dev *dev) +{ + struct mvneta_priv *priv = dev->data->dev_private; + unsigned int ret; + + if (!priv->ppio) + return; + + ret = mvneta_stats_get(dev, &priv->prev_stats); + if (unlikely(ret)) + RTE_LOG(ERR, PMD, "Failed to reset port statistics"); +} + + static const struct eth_dev_ops mvneta_ops = { .dev_configure = mvneta_dev_configure, .dev_start = mvneta_dev_start, @@ -752,6 +777,7 @@ static const struct eth_dev_ops mvneta_ops = { .mac_addr_set = mvneta_mac_addr_set, .mtu_set = mvneta_mtu_set, .stats_get = mvneta_stats_get, + .stats_reset = mvneta_stats_reset, .dev_infos_get = mvneta_dev_infos_get, .dev_supported_ptypes_get = mvneta_dev_supported_ptypes_get, .rxq_info_get = mvneta_rxq_info_get, diff --git a/drivers/net/mvneta/mvneta_ethdev.h b/drivers/net/mvneta/mvneta_ethdev.h index 1a78a41..eeea31a 100644 --- a/drivers/net/mvneta/mvneta_ethdev.h +++ b/drivers/net/mvneta/mvneta_ethdev.h @@ -67,6 +67,7 @@ struct mvneta_priv { uint16_t nb_rx_queues; uint64_t rate_max; + struct rte_eth_stats prev_stats; }; /** Current log type. */ -- 2.7.4
Re: [dpdk-dev] How to replace rte_eth_dev_attach with rte_eal_hotplug_add
On Thu, Sep 20, 2018 at 05:46:37PM +0900, Hideyuki Yamashita wrote: > Hello, > > From dpdk 18.08 release rte_eth_dev_attach and > rte_eth_dev_detach becom deprecated API and > it is recommended to replace with rte_eal_hotplug_add > and rte_eal_hotplug_remove. > > My program uses above mentioned deprecated APIs > and have to replace those. > Note that my program uses attach to attach vhost, pcap pmd. > > My question is whether it is correct to replace those as following: > find rte_eth_dev_attach function in rte_ethdev.c and > migrate those content into my program. > > e.g. > lib/librte_ethdev/rte_ethdev.c line 643-686 for attach > lib/librte_ethdev/rte_ethdev.c line 690-720 for detach > > Your advice/guidance are much appreciated. > Thanks! > > BR, > Hideyuki Yamashita > - > Hideyuki Yamashita > NTT TechnoCross > - > > Hello Hideyuki, You could use this code for guidance, while leaving the ethdev specificities such as verifying the eth_dev_count_total(). The hotplug function would already return an error if the PMD was not able to create the necessary devices. The main issue might be to find the port_id of your new port. You won't be able to use eth_dev_last_created_port, so you would have to iterate over the ethdev using RTE_ETH_FOREACH_DEV and find the one matching your parameters (you might for example match the rte_device name with the name you used in hotplug_add, as there is no standard naming scheme at the ethdev level). An possible issue with the deprecation planned for those two functions is that the hotplug API is also meant to evolve [1] this release (not in a big way however, it would mostly simplify your usage of it). [1]: https://mails.dpdk.org/archives/dev/2018-September/42.html Best, -- Gaëtan Rivet 6WIND
Re: [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
On 9/19/18 4:56 PM, Anatoly Burakov wrote: When we allocate and use DPDK memory, we need to be able to differentiate between DPDK hugepage segments and segments that were made part of DPDK but are externally allocated. Add such a property to memseg lists. This breaks the ABI, so bump the EAL library ABI version and document the change in release notes. All current calls for memseg walk functions were adjusted to ignore external segments where it made sense. Mempools is a special case, because we may be asked to allocate a mempool on a specific socket, and we need to ignore all page sizes on other heaps or other sockets. Previously, this assumption of knowing all page sizes was not a problem, but it will be now, so we have to match socket ID with page size when calculating minimum page size for a mempool. Signed-off-by: Anatoly Burakov A couple of minor questions/suggestions below, but it is OK to go as is even if rejected. Acked-by: Andrew Rybchenko <...> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 03e6b5f73..d61c77da3 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size) return new_obj_size * RTE_MEMPOOL_ALIGN; } +struct pagesz_walk_arg { + int socket_id; + size_t min; +}; + static int find_min_pagesz(const struct rte_memseg_list *msl, void *arg) { - size_t *min = arg; + struct pagesz_walk_arg *wa = arg; + bool valid; - if (msl->page_sz < *min) - *min = msl->page_sz; + valid = msl->socket_id == wa->socket_id; Is it intended that we accept externally allocated segment if it is on requested socket? If so, it would be good to add comment to explain why. + valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0; + + if (!valid) + return 0; + + if (msl->page_sz < wa->min) + wa->min = msl->page_sz; I'd suggest to keep single return (it is just a bit shorter) if (valid && msl->page_sz < wa->min) wa->min = msl->page_sz; <...>
[dpdk-dev] [PATCH] eal/bsd: fix compile issue due to unused variables
Fixes: 1009ba1704f9 ("mem: add internal API to get and set segment fd") Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal_memalloc.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c index 06afbcc99..a5847f0bd 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c @@ -49,19 +49,21 @@ eal_memalloc_sync_with_primary(void) } int -eal_memalloc_get_seg_fd(int list_idx, int seg_idx) +eal_memalloc_get_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused) { return -ENOTSUP; } int -eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) +eal_memalloc_set_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused, + int fd __rte_unused) { return -ENOTSUP; } int -eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset) +eal_memalloc_get_seg_fd_offset(int list_idx __rte_unused, + int seg_idx __rte_unused, size_t *offset __rte_unused) { return -ENOTSUP; } -- 2.17.1
Re: [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
On 20-Sep-18 10:30 AM, Andrew Rybchenko wrote: On 9/19/18 4:56 PM, Anatoly Burakov wrote: When we allocate and use DPDK memory, we need to be able to differentiate between DPDK hugepage segments and segments that were made part of DPDK but are externally allocated. Add such a property to memseg lists. This breaks the ABI, so bump the EAL library ABI version and document the change in release notes. All current calls for memseg walk functions were adjusted to ignore external segments where it made sense. Mempools is a special case, because we may be asked to allocate a mempool on a specific socket, and we need to ignore all page sizes on other heaps or other sockets. Previously, this assumption of knowing all page sizes was not a problem, but it will be now, so we have to match socket ID with page size when calculating minimum page size for a mempool. Signed-off-by: Anatoly Burakov A couple of minor questions/suggestions below, but it is OK to go as is even if rejected. Acked-by: Andrew Rybchenko <...> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 03e6b5f73..d61c77da3 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size) return new_obj_size * RTE_MEMPOOL_ALIGN; } +struct pagesz_walk_arg { + int socket_id; + size_t min; +}; + static int find_min_pagesz(const struct rte_memseg_list *msl, void *arg) { - size_t *min = arg; + struct pagesz_walk_arg *wa = arg; + bool valid; - if (msl->page_sz < *min) - *min = msl->page_sz; + valid = msl->socket_id == wa->socket_id; Is it intended that we accept externally allocated segment if it is on requested socket? If so, it would be good to add comment to explain why. Accepting externally allocated segments is precisely the point here - we want to find page size of underlying memory, regardless of whether it's internal or external. We use socket ID to identify valid page sizes for a particular heap (since socket ID is technically a heap identifier, as far as external code is concerned), but within that heap there can be multiple segment lists corresponding to that socket ID, each with its own page size. + valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0; + + if (!valid) + return 0; + + if (msl->page_sz < wa->min) + wa->min = msl->page_sz; I'd suggest to keep single return (it is just a bit shorter) if (valid && msl->page_sz < wa->min) wa->min = msl->page_sz; Sure. If there will be other comments that warrant a v3 respin, i'll incorporate this feedback :) Thanks for the review! <...> -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH v2 05/13] ethdev: add private generic device iterator
On 9/19/18 7:03 PM, Gaetan Rivet wrote: This iterator can be customized with a comparison function that will trigger a stopping condition. It can be leveraged to write several different iterators that have similar but non-identical purposes. It is private to librte_ethdev. Signed-off-by: Gaetan Rivet Acked-by: Andrew Rybchenko
Re: [dpdk-dev] [PATCH] net/fm10k: add imissed stats
Sorry the delay. False alert, I thought that imissed was counted also in ipackets, but it's not the case :) So you can integrate this patch. Thanks in advance ! -- Julien Meunier > -Original Message- > From: Meunier, Julien (Nokia - FR/Paris-Saclay) > Sent: Tuesday, September 11, 2018 10:22 AM > To: Wang, Xiao W ; Zhang, Qi Z > > Cc: dev@dpdk.org > Subject: RE: [PATCH] net/fm10k: add imissed stats > > Hi, > > Please, do not merge this patch. I need to check and readapt this patch. A > version 2 will be sent later. > > Thanks, > > -- > Julien Meunier > > > -Original Message- > > From: Wang, Xiao W > > Sent: Tuesday, September 11, 2018 3:52 AM > > To: Meunier, Julien (Nokia - FR/Paris-Saclay) > > ; Zhang, Qi Z > > Cc: dev@dpdk.org > > Subject: RE: [PATCH] net/fm10k: add imissed stats > > > > Hi, > > > > -Original Message- > > From: Julien Meunier [mailto:julien.meun...@nokia.com] > > Sent: Monday, September 10, 2018 11:51 PM > > To: Zhang, Qi Z ; Wang, Xiao W > > > > Cc: dev@dpdk.org > > Subject: [PATCH] net/fm10k: add imissed stats > > > > Add support of imissed and q_errors statistics, reported by PCIE_QPRDC > > register (see datasheet, section 11.27.2.60), which exposes the number > > of receive packets dropped for a queue. > > > > Signed-off-by: Julien Meunier > > --- > > drivers/net/fm10k/fm10k_ethdev.c | 7 +-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/fm10k/fm10k_ethdev.c > > b/drivers/net/fm10k/fm10k_ethdev.c > > index 541a49b..a9af6c2 100644 > > --- a/drivers/net/fm10k/fm10k_ethdev.c > > +++ b/drivers/net/fm10k/fm10k_ethdev.c > > @@ -1325,7 +1325,7 @@ fm10k_xstats_get(struct rte_eth_dev *dev, struct > > rte_eth_xstat *xstats, static int fm10k_stats_get(struct rte_eth_dev > > *dev, struct rte_eth_stats *stats) { > > - uint64_t ipackets, opackets, ibytes, obytes; > > + uint64_t ipackets, opackets, ibytes, obytes, imissed; > > struct fm10k_hw *hw = > > FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > struct fm10k_hw_stats *hw_stats = > > @@ -1336,22 +1336,25 @@ fm10k_stats_get(struct rte_eth_dev *dev, > > struct rte_eth_stats *stats) > > > > fm10k_update_hw_stats(hw, hw_stats); > > > > - ipackets = opackets = ibytes = obytes = 0; > > + ipackets = opackets = ibytes = obytes = imissed = 0; > > for (i = 0; (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) && > > (i < hw->mac.max_queues); ++i) { > > stats->q_ipackets[i] = hw_stats->q[i].rx_packets.count; > > stats->q_opackets[i] = hw_stats->q[i].tx_packets.count; > > stats->q_ibytes[i] = hw_stats->q[i].rx_bytes.count; > > stats->q_obytes[i] = hw_stats->q[i].tx_bytes.count; > > + stats->q_errors[i] = hw_stats->q[i].rx_drops.count; > > ipackets += stats->q_ipackets[i]; > > opackets += stats->q_opackets[i]; > > ibytes += stats->q_ibytes[i]; > > obytes += stats->q_obytes[i]; > > + imissed += stats->q_errors[i]; > > } > > stats->ipackets = ipackets; > > stats->opackets = opackets; > > stats->ibytes = ibytes; > > stats->obytes = obytes; > > + stats->imissed = imissed; > > return 0; > > } > > > > Acked-by: Xiao Wang > > > > > > -- > > 2.10.2
Re: [dpdk-dev] [PATCH v2 12/13] ethdev: process declarative eth devargs
On 9/19/18 7:03 PM, Gaetan Rivet wrote: Process the eth parameters of a devargs. For each parameters that have a setter implemented, the relevant field in rte_eth_dev field is written. Currently only "name" is implemented. Signed-off-by: Gaetan Rivet Acked-by: Andrew Rybchenko
Re: [dpdk-dev] [PATCH] latency: clear mbuf timestamp after latency calculation
Hi, > -Original Message- > From: long...@viettel.com.vn [mailto:long...@viettel.com.vn] > Sent: Wednesday, September 19, 2018 9:23 AM > To: Pattan, Reshma > Cc: dev@dpdk.org; Bao-Long Tran ; > sta...@dpdk.org > Subject: [PATCH] latency: clear mbuf timestamp after latency calculation > > The timestamp of a mbuf should be cleared after that mbuf was used for > latency calculation, otherwise future packets which reuse the same mbuf > would inherit that previous timestamp. The latencystats library looks for > mbuf with non-zero timestamp, thus incorrectly inherited value would result > in incorrect latency measurement. > > Cc: sta...@dpdk.org > > Signed-off-by: Bao-Long Tran You need to add the Fixes line just before CC: in the commit message. Original commit that introduced the bug was 5cd3cac9ed. So fixes should be added like below Fixes: 5cd3cac9ed ("latency: added new library for latency stats"). You can send v2 with fixes line and my ack. Other than that Acked-by: Reshma Pattan
[dpdk-dev] [PATCH v2] latency: clear mbuf timestamp after latency calculation
The timestamp of a mbuf should be cleared after that mbuf was used for latency calculation, otherwise future packets which reuse the same mbuf would inherit that previous timestamp. The latencystats library looks for mbuf with non-zero timestamp, thus incorrectly inherited value would result in incorrect latency measurement. Fixes: 5cd3cac9ed ("latency: added new library for latency stats"). Cc: sta...@dpdk.org Signed-off-by: Bao-Long Tran Acked-by: Reshma Pattan --- lib/librte_latencystats/rte_latencystats.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/librte_latencystats/rte_latencystats.c b/lib/librte_latencystats/rte_latencystats.c index 1fdec68..2d5384e 100644 --- a/lib/librte_latencystats/rte_latencystats.c +++ b/lib/librte_latencystats/rte_latencystats.c @@ -156,8 +156,10 @@ calc_latency(uint16_t pid __rte_unused, now = rte_rdtsc(); for (i = 0; i < nb_pkts; i++) { - if (pkts[i]->timestamp) + if (pkts[i]->timestamp) { latency[cnt++] = now - pkts[i]->timestamp; + pkts[i]->timestamp = 0; + } } for (i = 0; i < cnt; i++) { -- 2.7.4
[dpdk-dev] [PATCH] eal: fix unused parameter errors on FreeBSD
When compiling with clang on FreeBSD, lots of warnings/errors are thrown for unused parameter. Fix these by marking the parameters as unused in the code. Fixes: 3a44687139eb ("mem: allow querying offset into segment fd") Fixes: 046aa5c4477b ("mem: add memalloc init stage") Fixes: 1009ba1704f9 ("mem: add internal API to get and set segment fd") Signed-off-by: Bruce Richardson --- lib/librte_eal/bsdapp/eal/eal_memalloc.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c index 06afbcc99..524bc0593 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c @@ -49,19 +49,21 @@ eal_memalloc_sync_with_primary(void) } int -eal_memalloc_get_seg_fd(int list_idx, int seg_idx) +eal_memalloc_get_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused) { return -ENOTSUP; } int -eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) +eal_memalloc_set_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused, + int fd __rte_unused) { return -ENOTSUP; } int -eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset) +eal_memalloc_get_seg_fd_offset(int list_idx __rte_unused, + int seg_idx __rte_unused, size_t *offset __rte_unused) { return -ENOTSUP; } -- 2.15.1
Re: [dpdk-dev] [PATCH] devtools: use a common prefix for temporary files
On Wed, Sep 19, 2018 at 07:16:29PM +0200, Thomas Monjalon wrote: > Some temporary files were generated in /tmp, others in the current > directory, and none was "dpdk prefixed". > > All these files have a common path prefix now: $TMPDIR/dpdk. > TMPDIR is /tmp by default. > > Note: the previous use of mktemp, with a template but without -t, > was generating a file in the current directory. > > Signed-off-by: Thomas Monjalon > --- Acked-by: Bruce Richardson
[dpdk-dev] [PATCH v2 1/3] event: add function for reading unlink in progress
This commit introduces a new function in the eventdev API, which allows applications to read the number of unlink requests in progress on a particular port of an eventdev instance. This information allows applications to verify when no more packets from a particular queue (or any queue) will arrive at a port. The application could decide to stop polling, or put the core into a sleep state if it wishes, as it is ensured that no new packets will arrive at a particular port anymore if all queues are unlinked. Suggested-by: Matias Elo Signed-off-by: Harry van Haaren Acked-by: Jerin Jacob dev_ops->port_unlinks_in_progress, 0); + + return (*dev->dev_ops->port_unlinks_in_progress)(dev, + dev->data->ports[port_id]); +} + int rte_event_port_links_get(uint8_t dev_id, uint8_t port_id, uint8_t queues[], uint8_t priorities[]) diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h index b6fd6ee7f..a24213ea7 100644 --- a/lib/librte_eventdev/rte_eventdev.h +++ b/lib/librte_eventdev/rte_eventdev.h @@ -1656,12 +1656,13 @@ rte_event_port_link(uint8_t dev_id, uint8_t port_id, * event port designated by its *port_id* on the event device designated * by its *dev_id*. * - * The unlink establishment shall disable the event port *port_id* from - * receiving events from the specified event queue *queue_id* - * + * The unlink call issues an async request to disable the event port *port_id* + * from receiving events from the specified event queue *queue_id*. * Event queue(s) to event port unlink establishment can be changed at runtime * without re-configuring the device. * + * @see rte_event_port_unlinks_in_progress() to poll for completed unlinks. + * * @param dev_id * The identifier of the device. * @@ -1679,21 +1680,47 @@ rte_event_port_link(uint8_t dev_id, uint8_t port_id, * NULL. * * @return - * The number of unlinks actually established. The return value can be less + * The number of unlinks successfully requested. The return value can be less * than the value of the *nb_unlinks* parameter when the implementation has the * limitation on specific queue to port unlink establishment or * if invalid parameters are specified. * If the return value is less than *nb_unlinks*, the remaining queues at the - * end of queues[] are not established, and the caller has to take care of them. + * end of queues[] are not unlinked, and the caller has to take care of them. * If return value is less than *nb_unlinks* then implementation shall update * the rte_errno accordingly, Possible rte_errno values are * (-EINVAL) Invalid parameter - * */ int rte_event_port_unlink(uint8_t dev_id, uint8_t port_id, uint8_t queues[], uint16_t nb_unlinks); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Returns the number of unlinks in progress. + * + * This function provides the application with a method to detect when an + * unlink has been completed by the implementation. + * + * @see rte_event_port_unlink() to issue unlink requests. + * + * @param dev_id + * The indentifier of the device. + * + * @param port_id + * Event port identifier to select port to check for unlinks in progress. + * + * @return + * The number of unlinks that are in progress. A return of zero indicates that + * there are no outstanding unlink requests. A positive return value indicates + * the number of unlinks that are in progress, but are not yet complete. + * A negative return value indicates an error, -EINVAL indicates an invalid + * parameter passed for *dev_id* or *port_id*. + */ +int __rte_experimental +rte_event_port_unlinks_in_progress(uint8_t dev_id, uint8_t port_id); + /** * Retrieve the list of source event queues and its associated service priority * linked to the destination event port designated by its *port_id* diff --git a/lib/librte_eventdev/rte_eventdev_pmd.h b/lib/librte_eventdev/rte_eventdev_pmd.h index 3fbb4d2b2..65645730a 100644 --- a/lib/librte_eventdev/rte_eventdev_pmd.h +++ b/lib/librte_eventdev/rte_eventdev_pmd.h @@ -332,6 +332,23 @@ typedef int (*eventdev_port_link_t)(struct rte_eventdev *dev, void *port, typedef int (*eventdev_port_unlink_t)(struct rte_eventdev *dev, void *port, uint8_t queues[], uint16_t nb_unlinks); +/** + * Unlinks in progress. Returns number of unlinks that the PMD is currently + * performing, but have not yet been completed. + * + * @param dev + * Event device pointer + * + * @param port + * Event port pointer + * + * @return + * Returns the number of in-progress unlinks. Zero is returned if none are + * in progress. + */ +typedef int (*eventdev_port_unlinks_in_progress_t)(struct rte_eventdev *dev, + void *port); + /** * Converts nanoseconds to *timeout_ticks* value for rte_event_dequeue() * @@ -815,6 +832,8 @@ struct rte_eventdev_ops { /**< Link event queues to an event port.
[dpdk-dev] [PATCH v2 2/3] event/sw: implement unlinks in progress function
This commit adds a counter to each port, which counts the number of unlinks that have been performed. When the scheduler thread starts its scheduling routine, it "acks" all unlinks that have been requested, and the application is gauranteed that no more events will be scheduled to the port from the unlinked queue. Signed-off-by: Harry van Haaren --- v2: - Fix unused "dev" variable (Jerin) --- drivers/event/sw/sw_evdev.c | 12 drivers/event/sw/sw_evdev.h | 8 drivers/event/sw/sw_evdev_scheduler.c | 7 ++- 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index a6bb91388..9e1412537 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -113,9 +113,20 @@ sw_port_unlink(struct rte_eventdev *dev, void *port, uint8_t queues[], } } } + + p->unlinks_in_progress += unlinked; + rte_smp_mb(); + return unlinked; } +static int +sw_port_unlinks_in_progress(struct rte_eventdev *dev, void *port) +{ + struct sw_port *p = port; + return p->unlinks_in_progress; +} + static int sw_port_setup(struct rte_eventdev *dev, uint8_t port_id, const struct rte_event_port_conf *conf) @@ -925,6 +936,7 @@ sw_probe(struct rte_vdev_device *vdev) .port_release = sw_port_release, .port_link = sw_port_link, .port_unlink = sw_port_unlink, + .port_unlinks_in_progress = sw_port_unlinks_in_progress, .eth_rx_adapter_caps_get = sw_eth_rx_adapter_caps_get, diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h index d90b96d4b..7c77b2495 100644 --- a/drivers/event/sw/sw_evdev.h +++ b/drivers/event/sw/sw_evdev.h @@ -148,6 +148,14 @@ struct sw_port { /* A numeric ID for the port */ uint8_t id; + /* An atomic counter for when the port has been unlinked, and the +* scheduler has not yet acked this unlink - hence there may still be +* events in the buffers going to the port. When the unlinks in +* progress is read by the scheduler, no more events will be pushed to +* the port - hence the scheduler core can just assign zero. +*/ + uint8_t unlinks_in_progress; + int16_t is_directed; /** Takes from a single directed QID */ /** * For loadbalanced we can optimise pulling packets from diff --git a/drivers/event/sw/sw_evdev_scheduler.c b/drivers/event/sw/sw_evdev_scheduler.c index e3a41e02f..9b54d5ce7 100644 --- a/drivers/event/sw/sw_evdev_scheduler.c +++ b/drivers/event/sw/sw_evdev_scheduler.c @@ -517,13 +517,18 @@ sw_event_schedule(struct rte_eventdev *dev) /* Pull from rx_ring for ports */ do { in_pkts = 0; - for (i = 0; i < sw->port_count; i++) + for (i = 0; i < sw->port_count; i++) { + /* ack the unlinks in progress as done */ + if (sw->ports[i].unlinks_in_progress) + sw->ports[i].unlinks_in_progress = 0; + if (sw->ports[i].is_directed) in_pkts += sw_schedule_pull_port_dir(sw, i); else if (sw->ports[i].num_ordered_qids > 0) in_pkts += sw_schedule_pull_port_lb(sw, i); else in_pkts += sw_schedule_pull_port_no_reorder(sw, i); + } /* QID scan for re-ordered */ in_pkts += sw_schedule_reorder(sw, 0, -- 2.17.1
[dpdk-dev] [PATCH v2 3/3] event/sw: add unit test for unlinks in progress
This commit adds a unit test that checks the behaviour of the unlinks_in_progress() function, ensuring that the returned values are the number of unlinks requested, until the scheduler runs and "acks" the requests, after which the count should be zero again. Signed-off-by: Harry van Haaren --- v2: - Add print before running unlink test (Harry) --- drivers/event/sw/sw_evdev.c | 1 + drivers/event/sw/sw_evdev_selftest.c | 77 2 files changed, 78 insertions(+) diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 9e1412537..1175d6cdb 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -123,6 +123,7 @@ sw_port_unlink(struct rte_eventdev *dev, void *port, uint8_t queues[], static int sw_port_unlinks_in_progress(struct rte_eventdev *dev, void *port) { + RTE_SET_USED(dev); struct sw_port *p = port; return p->unlinks_in_progress; } diff --git a/drivers/event/sw/sw_evdev_selftest.c b/drivers/event/sw/sw_evdev_selftest.c index c40912db5..d00d5de61 100644 --- a/drivers/event/sw/sw_evdev_selftest.c +++ b/drivers/event/sw/sw_evdev_selftest.c @@ -1903,6 +1903,77 @@ qid_priorities(struct test *t) return 0; } +static int +unlink_in_progress(struct test *t) +{ + /* Test unlinking API, in particular that when an unlink request has +* not yet been seen by the scheduler thread, that the +* unlink_in_progress() function returns the number of unlinks. +*/ + unsigned int i; + /* Create instance with 1 ports, and 3 qids */ + if (init(t, 3, 1) < 0 || + create_ports(t, 1) < 0) { + printf("%d: Error initializing device\n", __LINE__); + return -1; + } + + for (i = 0; i < 3; i++) { + /* Create QID */ + const struct rte_event_queue_conf conf = { + .schedule_type = RTE_SCHED_TYPE_ATOMIC, + /* increase priority (0 == highest), as we go */ + .priority = RTE_EVENT_DEV_PRIORITY_NORMAL - i, + .nb_atomic_flows = 1024, + .nb_atomic_order_sequences = 1024, + }; + + if (rte_event_queue_setup(evdev, i, &conf) < 0) { + printf("%d: error creating qid %d\n", __LINE__, i); + return -1; + } + t->qid[i] = i; + } + t->nb_qids = i; + /* map all QIDs to port */ + rte_event_port_link(evdev, t->port[0], NULL, NULL, 0); + + if (rte_event_dev_start(evdev) < 0) { + printf("%d: Error with start call\n", __LINE__); + return -1; + } + + /* unlink all ports to have outstanding unlink requests */ + int ret = rte_event_port_unlink(evdev, t->port[0], NULL, 0); + if (ret < 0) { + printf("%d: Failed to unlink queues\n", __LINE__); + return -1; + } + + /* get active unlinks here, expect 3 */ + int unlinks_in_progress = + rte_event_port_unlinks_in_progress(evdev, t->port[0]); + if (unlinks_in_progress != 3) { + printf("%d: Expected num unlinks in progress == 3, got %d\n", + __LINE__, unlinks_in_progress); + return -1; + } + + /* run scheduler service on this thread to ack the unlinks */ + rte_service_run_iter_on_app_lcore(t->service_id, 1); + + /* active unlinks expected as 0 as scheduler thread has acked */ + unlinks_in_progress = + rte_event_port_unlinks_in_progress(evdev, t->port[0]); + if (unlinks_in_progress != 0) { + printf("%d: Expected num unlinks in progress == 0, got %d\n", + __LINE__, unlinks_in_progress); + } + + cleanup(t); + return 0; +} + static int load_balancing(struct test *t) { @@ -3260,6 +3331,12 @@ test_sw_eventdev(void) printf("ERROR - QID Priority test FAILED.\n"); goto test_fail; } + printf("*** Running Unlink-in-progress test...\n"); + ret = unlink_in_progress(t); + if (ret != 0) { + printf("ERROR - Unlink in progress test FAILED.\n"); + goto test_fail; + } printf("*** Running Ordered Reconfigure test...\n"); ret = ordered_reconfigure(t); if (ret != 0) { -- 2.17.1
Re: [dpdk-dev] [PATCH] eal/bsd: fix compile issue due to unused variables
On Thu, Sep 20, 2018 at 10:34:46AM +0100, Anatoly Burakov wrote: > Fixes: 1009ba1704f9 ("mem: add internal API to get and set segment fd") > > Signed-off-by: Anatoly Burakov > --- Sorry, I missed your patch by about 1 hour - hadn't got through my mail fully when sent mine. I think you could do with a couple of other fixes tags on it, just take them from my patch. Otherwise: Acked-by: Bruce Richardson > lib/librte_eal/bsdapp/eal/eal_memalloc.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c > b/lib/librte_eal/bsdapp/eal/eal_memalloc.c > index 06afbcc99..a5847f0bd 100644 > --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c > +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c > @@ -49,19 +49,21 @@ eal_memalloc_sync_with_primary(void) > } > > int > -eal_memalloc_get_seg_fd(int list_idx, int seg_idx) > +eal_memalloc_get_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused) > { > return -ENOTSUP; > } > > int > -eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) > +eal_memalloc_set_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused, > + int fd __rte_unused) > { > return -ENOTSUP; > } > > int > -eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset) > +eal_memalloc_get_seg_fd_offset(int list_idx __rte_unused, > + int seg_idx __rte_unused, size_t *offset __rte_unused) > { > return -ENOTSUP; > } > -- > 2.17.1
Re: [dpdk-dev] [PATCH] eal/bsd: fix compile issue due to unused variables
On 20-Sep-18 12:25 PM, Bruce Richardson wrote: On Thu, Sep 20, 2018 at 10:34:46AM +0100, Anatoly Burakov wrote: Fixes: 1009ba1704f9 ("mem: add internal API to get and set segment fd") Signed-off-by: Anatoly Burakov --- Sorry, I missed your patch by about 1 hour - hadn't got through my mail fully when sent mine. I think you could do with a couple of other fixes tags on it, just take them from my patch. Otherwise: Acked-by: Bruce Richardson My patch had one too few Fixes tags, yours had one too many :) I'll resubmit a v2 with new extra tags and probably changed commit message. lib/librte_eal/bsdapp/eal/eal_memalloc.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c index 06afbcc99..a5847f0bd 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c @@ -49,19 +49,21 @@ eal_memalloc_sync_with_primary(void) } int -eal_memalloc_get_seg_fd(int list_idx, int seg_idx) +eal_memalloc_get_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused) { return -ENOTSUP; } int -eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) +eal_memalloc_set_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused, + int fd __rte_unused) { return -ENOTSUP; } int -eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset) +eal_memalloc_get_seg_fd_offset(int list_idx __rte_unused, + int seg_idx __rte_unused, size_t *offset __rte_unused) { return -ENOTSUP; } -- 2.17.1 -- Thanks, Anatoly
[dpdk-dev] [PATCH v3 00/20] Support externally allocated memory in DPDK
This is a proposal to enable using externally allocated memory in DPDK. In a nutshell, here is what is being done here: - Index internal malloc heaps by NUMA node index, rather than NUMA node itself (external heaps will have ID's in order of creation) - Add identifier string to malloc heap, to uniquely identify it - Each new heap will receive a unique socket ID that will be used by allocator to decide from which heap (internal or external) to allocate requested amount of memory - Allow creating named heaps and add/remove memory to/from those heaps - Allocate memseg lists at runtime, to keep track of IOVA addresses of externally allocated memory - If IOVA addresses aren't provided, use RTE_BAD_IOVA - Allow malloc and memzones to allocate from external heaps - Allow other data structures to allocate from externall heaps The responsibility to ensure memory is accessible before using it is on the shoulders of the user - there is no checking done with regards to validity of the memory (nor could there be...). The general approach is to create heap and add memory into it. For any other process wishing to use the same memory, said memory must first be attached (otherwise some things will not work). A design decision was made to make multiprocess synchronization a manual process. Due to underlying issues with attaching to fbarrays in secondary processes, this design was deemed to be better because we don't want to fail to create external heap in the primary because something in the secondary has failed when in fact we may not eve have wanted this memory to be accessible in the secondary in the first place. Using external memory in multiprocess is *hard*, because not only memory space needs to be preallocated, but it also needs to be attached in each process to allow other processes to access the page table. The attach API call may or may not succeed, depending on memory layout, for reasons similar to other multiprocess failures. This is treated as a "known issue" for this release. Creating and destroying heaps is currently restricted to primary processes, because we need to keep track of all socket ID's we've ever used to prevent their reuse, and obviously different processes would have kept different socket ID counters, and it isn't important enough to put into shared memory. This means that secondary processes will not be able to create new heaps. If this use case is important enough, we can put the max socket ID into shared memory, or allow socket ID reuse (which i do not think is a good idea because it has the potential to make things harder to debug). v3 -> v2 changes: - Rebase on top of latest master - Clarifications added to mempool code as per Andrew Rynchenko's comments v2 -> v1 changes: - Fixed NULL dereference on heap socket ID lookup - Fixed memseg offset calculation on adding memory to heap - Improved unit test to test for above bugfixes - Restricted heap creation to primary processes only - Added sample application - Added documentation RFC -> v1 changes: - Removed the "named heaps" API, allocate using fake socket ID instead - Added multiprocess support - Everything is now thread-safe - Numerous bugfixes and API improvements Anatoly Burakov (20): mem: add length to memseg list mem: allow memseg lists to be marked as external malloc: index heaps using heap ID rather than NUMA node mem: do not check for invalid socket ID flow_classify: do not check for invalid socket ID pipeline: do not check for invalid socket ID sched: do not check for invalid socket ID malloc: add name to malloc heaps malloc: add function to query socket ID of named heap malloc: allow creating malloc heaps malloc: allow destroying heaps malloc: allow adding memory to named heaps malloc: allow removing memory from named heaps malloc: allow attaching to external memory chunks malloc: allow detaching from external memory test: add unit tests for external memory support examples: add external memory example app doc: add external memory feature to the release notes doc: add external memory feature to programmer's guide doc: add external memory sample application guide config/common_base| 1 + config/rte_config.h | 1 + .../prog_guide/env_abstraction_layer.rst | 38 ++ doc/guides/rel_notes/deprecation.rst | 15 - doc/guides/rel_notes/release_18_11.rst| 24 +- doc/guides/sample_app_ug/external_mem.rst | 115 + doc/guides/sample_app_ug/index.rst| 1 + drivers/bus/fslmc/fslmc_vfio.c| 7 +- drivers/bus/pci/linux/pci.c | 2 +- drivers/net/mlx4/mlx4_mr.c| 3 + drivers/net/mlx5/mlx5.c | 5 +- drivers/net/mlx5/mlx5_mr.c| 3 + drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +- examples/external_mem/Makefile| 62 +++ examples/external_mem/extmem.c
[dpdk-dev] [PATCH v3 05/20] flow_classify: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov --- lib/librte_flow_classify/rte_flow_classify.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c index 4c3469da1..fb652a2b7 100644 --- a/lib/librte_flow_classify/rte_flow_classify.c +++ b/lib/librte_flow_classify/rte_flow_classify.c @@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params) } /* socket */ - if ((params->socket_id < 0) || - (params->socket_id >= RTE_MAX_NUMA_NODES)) { + if (params->socket_id < 0) { RTE_FLOW_CLASSIFY_LOG(ERR, "%s: Incorrect value for parameter socket_id\n", __func__); -- 2.17.1
[dpdk-dev] [PATCH v3 11/20] malloc: allow destroying heaps
Add an API to destroy specified heap. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 23 + lib/librte_eal/common/malloc_heap.c| 22 lib/librte_eal/common/malloc_heap.h| 3 ++ lib/librte_eal/common/rte_malloc.c | 58 ++ lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 107 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 182afab1c..8a8cc1e6d 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket, int __rte_experimental rte_malloc_heap_create(const char *heap_name); +/** + * Destroys a previously created malloc heap with specified name. + * + * @note This function will return a failure result if not all memory allocated + * from the heap has been freed back to the heap + * + * @note This function will return a failure result if not all memory segments + * were removed from the heap prior to its destruction + * + * @param heap_name + * Name of the heap to create. + * + * @return + * - 0 on success + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - ``heap_name`` was NULL, empty or too long + * ENOENT - heap by the name of ``heap_name`` was not found + * EPERM - attempting to destroy reserved heap + * EBUSY - heap still contains data + */ +int __rte_experimental +rte_malloc_heap_destroy(const char *heap_name); + /** * Find socket ID corresponding to a named heap. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 1dd4ffcf9..e98f720cb 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -1045,6 +1045,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name) return 0; } +int +malloc_heap_destroy(struct malloc_heap *heap) +{ + if (heap->alloc_count != 0) { + RTE_LOG(ERR, EAL, "Heap is still in use\n"); + rte_errno = EBUSY; + return -1; + } + if (heap->first != NULL || heap->last != NULL) { + RTE_LOG(ERR, EAL, "Heap still contains memory segments\n"); + rte_errno = EBUSY; + return -1; + } + if (heap->total_size != 0) + RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n"); + + /* after this, the lock will be dropped */ + memset(heap, 0, sizeof(*heap)); + + return 0; +} + int rte_eal_malloc_heap_init(void) { diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index eebee16dc..75278da3c 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags, int malloc_heap_create(struct malloc_heap *heap, const char *heap_name); +int +malloc_heap_destroy(struct malloc_heap *heap); + int malloc_heap_free(struct malloc_elem *elem); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 39875fe69..6734b0d09 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -288,6 +288,21 @@ rte_malloc_virt2iova(const void *addr) return ms->iova + RTE_PTR_DIFF(addr, ms->addr); } +static struct malloc_heap * +find_named_heap(const char *name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + unsigned int i; + + for (i = 0; i < RTE_MAX_HEAPS; i++) { + struct malloc_heap *heap = &mcfg->malloc_heaps[i]; + + if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN)) + return heap; + } + return NULL; +} + int rte_malloc_heap_create(const char *heap_name) { @@ -338,3 +353,46 @@ rte_malloc_heap_create(const char *heap_name) return ret; } + +int +rte_malloc_heap_destroy(const char *heap_name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + int ret; + + if (heap_name == NULL || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); + + /* start from non-socket heaps */ + heap = find_named_heap(heap_name); + if (heap == NULL) { + RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name); + rte_errno = ENOENT; + ret = -1; + goto unlock; + } + /* we shouldn't be able to destroy internal heaps */ + if (heap->socket
[dpdk-dev] [PATCH v3 03/20] malloc: index heaps using heap ID rather than NUMA node
Switch over all parts of EAL to use heap ID instead of NUMA node ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA node's index within the detected NUMA node list. Heap ID for external heaps will be order of their creation. Signed-off-by: Anatoly Burakov --- config/common_base| 1 + config/rte_config.h | 1 + .../common/include/rte_eal_memconfig.h| 4 +- .../common/include/rte_malloc_heap.h | 1 + lib/librte_eal/common/malloc_heap.c | 98 +-- lib/librte_eal/common/malloc_heap.h | 3 + lib/librte_eal/common/rte_malloc.c| 41 +--- 7 files changed, 106 insertions(+), 43 deletions(-) diff --git a/config/common_base b/config/common_base index 155c7d40e..b52770b27 100644 --- a/config/common_base +++ b/config/common_base @@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64 CONFIG_RTE_LIBRTE_EAL=y CONFIG_RTE_MAX_LCORE=128 CONFIG_RTE_MAX_NUMA_NODES=8 +CONFIG_RTE_MAX_HEAPS=32 CONFIG_RTE_MAX_MEMSEG_LISTS=64 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller diff --git a/config/rte_config.h b/config/rte_config.h index 567051b9c..5dd2ac1ad 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -24,6 +24,7 @@ #define RTE_BUILD_SHARED_LIB /* EAL defines */ +#define RTE_MAX_HEAPS 32 #define RTE_MAX_MEMSEG_LISTS 128 #define RTE_MAX_MEMSEG_PER_LIST 8192 #define RTE_MAX_MEM_MB_PER_LIST 32768 diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index 6baa6854f..d7920a4e0 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -72,8 +72,8 @@ struct rte_mem_config { struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */ - /* Heaps of Malloc per socket */ - struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES]; + /* Heaps of Malloc */ + struct malloc_heap malloc_heaps[RTE_MAX_HEAPS]; /* address of mem_config in primary process. used to map shared config into * exact same address the primary process maps it. diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h index d43fa9097..e7ac32d42 100644 --- a/lib/librte_eal/common/include/rte_malloc_heap.h +++ b/lib/librte_eal/common/include/rte_malloc_heap.h @@ -27,6 +27,7 @@ struct malloc_heap { unsigned alloc_count; size_t total_size; + unsigned int socket_id; } __rte_cache_aligned; #endif /* _RTE_MALLOC_HEAP_H_ */ diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 3c8e2063b..1d1e35708 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz) return check_flag & flags; } +int +malloc_socket_to_heap_id(unsigned int socket_id) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int i; + + for (i = 0; i < RTE_MAX_HEAPS; i++) { + struct malloc_heap *heap = &mcfg->malloc_heaps[i]; + + if (heap->socket_id == socket_id) + return i; + } + return -1; +} + /* * Expand the heap with a memory area. */ @@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl, struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; struct rte_memseg_list *found_msl; struct malloc_heap *heap; - int msl_idx; + int msl_idx, heap_idx; if (msl->external) return 0; - heap = &mcfg->malloc_heaps[msl->socket_id]; + heap_idx = malloc_socket_to_heap_id(msl->socket_id); + heap = &mcfg->malloc_heaps[heap_idx]; /* msl is const, so find it */ msl_idx = msl - mcfg->memsegs; @@ -111,6 +127,7 @@ malloc_add_seg(const struct rte_memseg_list *msl, malloc_heap_add_memory(heap, found_msl, ms->addr, len); heap->total_size += len; + heap->socket_id = msl->socket_id; RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20, msl->socket_id); @@ -561,12 +578,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket, /* this will try lower page sizes first */ static void * -heap_alloc_on_socket(const char *type, size_t size, int socket, - unsigned int flags, size_t align, size_t bound, bool contig) +malloc_heap_alloc_on_heap_id(const char *type, size_t size, + unsigned int heap_id, unsigned int flags, size_t align, + size_t bound, bool contig) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - struct malloc_heap *heap = &mcfg->malloc_hea
[dpdk-dev] [PATCH v3 07/20] sched: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov --- lib/librte_sched/rte_sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c index 9269e5c71..d4e2189c7 100644 --- a/lib/librte_sched/rte_sched.c +++ b/lib/librte_sched/rte_sched.c @@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params) return -1; /* socket */ - if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES)) + if (params->socket < 0) return -3; /* rate */ -- 2.17.1
[dpdk-dev] [PATCH v3 18/20] doc: add external memory feature to the release notes
Document the addition of external memory support to DPDK. Signed-off-by: Anatoly Burakov --- doc/guides/rel_notes/release_18_11.rst | 5 + 1 file changed, 5 insertions(+) diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index 63bbb1b51..9a05c9980 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -67,6 +67,11 @@ New Features SR-IOV option in Hyper-V and Azure. This is an alternative to the previous vdev_netvsc, tap, and failsafe drivers combination. +* **Added support for using externally allocated memory in DPDK.** + + DPDK has gained support for creating new ``rte_malloc`` heaps referencing + memory that was created outside of DPDK's own page allocator, and using that + memory natively with any other DPDK library or data structure. API Changes --- -- 2.17.1
[dpdk-dev] [PATCH v3 19/20] doc: add external memory feature to programmer's guide
Add a short chapter on usage of external memory in DPDK to the Programmer's Guide. Signed-off-by: Anatoly Burakov --- .../prog_guide/env_abstraction_layer.rst | 38 +++ 1 file changed, 38 insertions(+) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index d362c9209..37de8d63d 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -213,6 +213,44 @@ Normally, these options do not need to be changed. can later be mapped into that preallocated VA space (if dynamic memory mode is enabled), and can optionally be mapped into it at startup. +Support for Externally Allocated Memory +~~~ + +It is possible to use externally allocated memory in DPDK, using a set of malloc +heap API's. Support for externally allocated memory is implemented through +overloading the socket ID - externally allocated heaps will have socket ID's +that would be considered invalid under normal circumstances. Requesting an +allocation to take place from a specified externally allocated memory is a +matter of supplying the correct socket ID to DPDK allocator, either directly +(e.g. through a call to ``rte_malloc``) or indirectly (through data +structure-specific allocation API's such as ``rte_ring_create``). + +Since there is no way DPDK can verify whether memory are is available or valid, +this responsibility falls on the shoulders of the user. All multiprocess +synchronization is also user's responsibility, as well as ensuring that all +calls to add/attach/detach/remove memory are done in the correct order. It is +not required to attach to a memory area in all processes - only attach to memory +areas as needed. + +The expected workflow is as follows: + +* Get a pointer to memory area +* Create a named heap +* Add memory area(s) to the heap + * If IOVA table is not specified, IOVA addresses will be assumed to be +unavailable + * Any DMA mappings for the external area are responsibility of the user + * Other processes must attach to the memory area before they can use it +* Get socket ID used for the heap +* Use normal DPDK allocation procedures, using supplied socket ID +* If memory area is no longer needed, it can be removed from the heap + * Other processes must detach from this memory area before it can be removed +* If heap is no longer needed, remove it + * Socket ID will become invalid and will not be reused + +For more information, please refer to ``rte_malloc`` API documentation, +specifically the ``rte_malloc_heap_*`` family of function calls. + PCI Access ~~ -- 2.17.1
[dpdk-dev] [PATCH v3 13/20] malloc: allow removing memory from named heaps
Add an API to remove memory from specified heaps. This will first check if all elements within the region are free, and that the region is the original region that was added to the heap (by comparing its length to length of memory addressed by the underlying memseg list). Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 27 +++ lib/librte_eal/common/malloc_heap.c| 54 ++ lib/librte_eal/common/malloc_heap.h| 4 ++ lib/librte_eal/common/rte_malloc.c | 39 lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 125 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 47f867a05..9bbe8e3af 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -302,6 +302,33 @@ int __rte_experimental rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz); +/** + * Remove memory chunk from heap with specified name. + * + * @note Memory chunk being removed must be the same as one that was added; + * partially removing memory chunks is not supported + * + * @note Memory area must not contain any allocated elements to allow its + * removal from the heap + * + * @param heap_name + * Name of the heap to remove memory from + * @param va_addr + * Virtual address to remove from the heap + * @param len + * Length of virtual area to remove from the heap + * + * @return + * - 0 on success + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to remove memory from a reserved heap + * ENOENT - heap or memory chunk was not found + * EBUSY - memory chunk still contains data + */ +int __rte_experimental +rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len); + /** * Creates a new empty malloc heap with a specified name. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 2f6946f65..3ac3b06de 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -1019,6 +1019,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f) rte_spinlock_unlock(&heap->lock); } +static int +destroy_seg(struct malloc_elem *elem, size_t len) +{ + struct malloc_heap *heap = elem->heap; + struct rte_memseg_list *msl; + + msl = elem->msl; + + /* this element can be removed */ + malloc_elem_free_list_remove(elem); + malloc_elem_hide_region(elem, elem, len); + + heap->total_size -= len; + + memset(elem, 0, sizeof(*elem)); + + /* destroy the fbarray backing this memory */ + if (rte_fbarray_destroy(&msl->memseg_arr) < 0) + return -1; + + /* reset the memseg list */ + memset(msl, 0, sizeof(*msl)); + + return 0; +} + int malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz) @@ -1093,6 +1119,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, return 0; } +int +malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr, + size_t len) +{ + struct malloc_elem *elem = heap->first; + + /* find element with specified va address */ + while (elem != NULL && elem != va_addr) { + elem = elem->next; + /* stop if we've blown past our VA */ + if (elem > (struct malloc_elem *)va_addr) { + rte_errno = ENOENT; + return -1; + } + } + /* check if element was found */ + if (elem == NULL || elem->msl->len != len) { + rte_errno = ENOENT; + return -1; + } + /* if element's size is not equal to segment len, segment is busy */ + if (elem->state == ELEM_BUSY || elem->size != len) { + rte_errno = EBUSY; + return -1; + } + return destroy_seg(elem, len); +} + int malloc_heap_create(struct malloc_heap *heap, const char *heap_name) { diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index 237ce9dc2..e48996d52 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -43,6 +43,10 @@ int malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz); +int +malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr, + size_t len); + int malloc_heap_free(struct malloc_elem *elem); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c in
[dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external
When we allocate and use DPDK memory, we need to be able to differentiate between DPDK hugepage segments and segments that were made part of DPDK but are externally allocated. Add such a property to memseg lists. This breaks the ABI, so bump the EAL library ABI version and document the change in release notes. All current calls for memseg walk functions were adjusted to ignore external segments where it made sense. Mempools is a special case, because we may be asked to allocate a mempool on a specific socket, and we need to ignore all page sizes on other heaps or other sockets. Previously, this assumption of knowing all page sizes was not a problem, but it will be now, so we have to match socket ID with page size when calculating minimum page size for a mempool. Signed-off-by: Anatoly Burakov Acked-by: Andrew Rybchenko --- Notes: v3: - Add comment to explain the process of picking up minimum page sizes for mempool v2: - Add documentation changes and ABI break v1: - Adjust all calls to memseg walk functions to ignore external segments where it made sense to do so doc/guides/rel_notes/deprecation.rst | 15 doc/guides/rel_notes/release_18_11.rst| 12 ++- drivers/bus/fslmc/fslmc_vfio.c| 7 ++-- drivers/net/mlx4/mlx4_mr.c| 3 ++ drivers/net/mlx5/mlx5.c | 5 ++- drivers/net/mlx5/mlx5_mr.c| 3 ++ drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++- lib/librte_eal/bsdapp/eal/Makefile| 2 +- lib/librte_eal/bsdapp/eal/eal.c | 3 ++ lib/librte_eal/bsdapp/eal/eal_memory.c| 7 ++-- lib/librte_eal/common/eal_common_memory.c | 3 ++ .../common/include/rte_eal_memconfig.h| 1 + lib/librte_eal/common/include/rte_memory.h| 9 + lib/librte_eal/common/malloc_heap.c | 9 +++-- lib/librte_eal/linuxapp/eal/Makefile | 2 +- lib/librte_eal/linuxapp/eal/eal.c | 10 +- lib/librte_eal/linuxapp/eal/eal_memalloc.c| 9 + lib/librte_eal/linuxapp/eal/eal_vfio.c| 17 ++--- lib/librte_eal/meson.build| 2 +- lib/librte_mempool/rte_mempool.c | 35 ++- test/test/test_malloc.c | 3 ++ test/test/test_memzone.c | 3 ++ 22 files changed, 125 insertions(+), 40 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 138335dfb..d2aec64d1 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here. Deprecation Notices --- -* eal: certain structures will change in EAL on account of upcoming external - memory support. Aside from internal changes leading to an ABI break, the - following externally visible changes will also be implemented: - - - ``rte_memseg_list`` will change to include a boolean flag indicating -whether a particular memseg list is externally allocated. This will have -implications for any users of memseg-walk-related functions, as they will -now have to skip externally allocated segments in most cases if the intent -is to only iterate over internal DPDK memory. - - ``socket_id`` parameter across the entire DPDK will gain additional meaning, -as some socket ID's will now be representing externally allocated memory. No -changes will be required for existing code as backwards compatibility will -be kept, and those who do not use this feature will not see these extra -socket ID's. - * eal: both declaring and identifying devices will be streamlined in v18.11. New functions will appear to query a specific port from buses, classes of device and device drivers. Device declaration will be made coherent with the diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index bc9b74ec4..e96ec9b43 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -91,6 +91,13 @@ API Changes flag the MAC can be properly configured in any case. This is particularly important for bonding. +* eal: The following API changes were made in 18.11: + + - ``rte_memseg_list`` structure now has an additional flag indicating whether +the memseg list is externally allocated. This will have implications for any +users of memseg-walk-related functions, as they will now have to skip +externally allocated segments in most cases if the intent is to only iterate +over internal DPDK memory. ABI Changes --- @@ -107,6 +114,9 @@ ABI Changes = +* eal: EAL library ABI version was changed due to previously announced work on + supporting external memory in DPDK. + Removed Items - @@ -152,7
[dpdk-dev] [PATCH v3 06/20] pipeline: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov --- lib/librte_pipeline/rte_pipeline.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c index 0cb8b804e..2c047a8a4 100644 --- a/lib/librte_pipeline/rte_pipeline.c +++ b/lib/librte_pipeline/rte_pipeline.c @@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params) } /* socket */ - if ((params->socket_id < 0) || - (params->socket_id >= RTE_MAX_NUMA_NODES)) { + if (params->socket_id < 0) { RTE_LOG(ERR, PIPELINE, "%s: Incorrect value for parameter socket_id\n", __func__); -- 2.17.1
[dpdk-dev] [PATCH v3 14/20] malloc: allow attaching to external memory chunks
In order to use external memory in multiple processes, we need to attach to primary process's memseg lists, so add a new API to do that. It is the responsibility of the user to ensure that memory is accessible and that it has been previously added to the malloc heap by another process. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 48 +-- lib/librte_eal/common/rte_malloc.c | 93 ++ lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 135 insertions(+), 7 deletions(-) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 9bbe8e3af..440496cd9 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket, * * @note Multiple memory chunks can be added to the same heap * + * @note Before accessing this memory in other processes, it needs to be + * attached in each of those processes by calling + * ``rte_malloc_heap_memory_attach`` in each other process. + * * @note Memory must be previously allocated for DPDK to be able to use it as a * malloc heap. Failing to do so will result in undefined behavior, up to and * including segmentation faults. @@ -329,21 +333,48 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, int __rte_experimental rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len); +/** + * Attach to an already existing chunk of external memory in another process. + * + * @note This function must be called before any attempt is made to use an + * already existing external memory chunk. This function does *not* need to + * be called if a call to ``rte_malloc_heap_memory_add`` was made in the + * current process. + * + * @param heap_name + * Heap name to which this chunk of memory belongs + * @param va_addr + * Start address of memory chunk to attach to + * @param len + * Length of memory chunk to attach to + * @return + * 0 on successful attach + * -1 on unsuccessful attach, with rte_errno set to indicate cause for error: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to attach memory to a reserved heap + * ENOENT - heap or memory chunk was not found + */ +int __rte_experimental +rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len); + /** * Creates a new empty malloc heap with a specified name. * * @note Heaps created via this call will automatically get assigned a unique * socket ID, which can be found using ``rte_malloc_heap_get_socket()`` * + * @note This function can only be called in primary process. + * * @param heap_name * Name of the heap to create. * * @return * - 0 on successful creation * - -1 in case of error, with rte_errno set to one of the following: - * EINVAL - ``heap_name`` was NULL, empty or too long - * EEXIST - heap by name of ``heap_name`` already exists - * ENOSPC - no more space in internal config to store a new heap + * EINVAL - ``heap_name`` was NULL, empty or too long + * EEXIST - heap by name of ``heap_name`` already exists + * ENOSPC - no more space in internal config to store a new heap + * E_RTE_SECONDARY - attempted to create a heap in secondary process */ int __rte_experimental rte_malloc_heap_create(const char *heap_name); @@ -357,16 +388,19 @@ rte_malloc_heap_create(const char *heap_name); * @note This function will return a failure result if not all memory segments * were removed from the heap prior to its destruction * + * @note This function can only be called in primary process. + * * @param heap_name * Name of the heap to create. * * @return * - 0 on success * - -1 in case of error, with rte_errno set to one of the following: - * EINVAL - ``heap_name`` was NULL, empty or too long - * ENOENT - heap by the name of ``heap_name`` was not found - * EPERM - attempting to destroy reserved heap - * EBUSY - heap still contains data + * EINVAL - ``heap_name`` was NULL, empty or too long + * ENOENT - heap by the name of ``heap_name`` was not found + * EPERM - attempting to destroy reserved heap + * EBUSY - heap still contains data + * E_RTE_SECONDARY - attempted to destroy a heap in secondary process */ int __rte_experimental rte_malloc_heap_destroy(const char *heap_name); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index aed066882..bc22d21e4 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -393,6 +393,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len) return ret; } +struct sync_mem_walk_arg { + void *va_addr; + size_t len; + int result; +}
[dpdk-dev] [PATCH v3 16/20] test: add unit tests for external memory support
Add simple unit tests to test external memory support. The tests are pretty basic and mostly consist of checking if invalid API calls are handled correctly, plus a simple allocation/deallocation test for malloc and memzone. Signed-off-by: Anatoly Burakov --- test/test/Makefile| 1 + test/test/autotest_data.py| 14 +- test/test/meson.build | 1 + test/test/test_external_mem.c | 389 ++ 4 files changed, 401 insertions(+), 4 deletions(-) create mode 100644 test/test/test_external_mem.c diff --git a/test/test/Makefile b/test/test/Makefile index e6967bab6..074ac6e03 100644 --- a/test/test/Makefile +++ b/test/test/Makefile @@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c SRCS-y += test_reciprocal_division.c SRCS-y += test_reciprocal_division_perf.c SRCS-y += test_fbarray.c +SRCS-y += test_external_mem.c SRCS-y += test_ring.c SRCS-y += test_ring_perf.c diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py index f68d9b111..51f8e1689 100644 --- a/test/test/autotest_data.py +++ b/test/test/autotest_data.py @@ -477,10 +477,16 @@ "Report": None, }, { -"Name":"Fbarray autotest", -"Command": "fbarray_autotest", -"Func":default_autotest, -"Report": None, + "Name":"Fbarray autotest", + "Command": "fbarray_autotest", + "Func":default_autotest, + "Report": None, +}, +{ + "Name":"External memory autotest", + "Command": "external_mem_autotest", + "Func":default_autotest, + "Report": None, }, # #Please always keep all dump tests at the end and together! diff --git a/test/test/meson.build b/test/test/meson.build index b1dd6eca2..3abf02b71 100644 --- a/test/test/meson.build +++ b/test/test/meson.build @@ -155,6 +155,7 @@ test_names = [ 'eventdev_common_autotest', 'eventdev_octeontx_autotest', 'eventdev_sw_autotest', + 'external_mem_autotest', 'func_reentrancy_autotest', 'flow_classify_autotest', 'hash_scaling_autotest', diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c new file mode 100644 index 0..d0837aa35 --- /dev/null +++ b/test/test/test_external_mem.c @@ -0,0 +1,389 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "test.h" + +#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */ + +static int +test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, + int n_pages) +{ + static const char * const names[] = { + NULL, /* NULL name */ + "", /* empty name */ + "this heap name is definitely way too long to be valid" + }; + const char *valid_name = "valid heap name"; + unsigned int i; + + /* check invalid name handling */ + for (i = 0; i < RTE_DIM(names); i++) { + const char *name = names[i]; + + /* these calls may fail for other reasons, so check errno */ + if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Created heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Destroyed heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_get_socket(name) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Found socket for heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_add(name, addr, len, + NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Added memory to heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Removed memory from heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Attached memory to heap with inv
[dpdk-dev] [PATCH v3 20/20] doc: add external memory sample application guide
Add a guide for external memory sample application. The application is identical to Basic Forwarding example in everything except parts of initialization code, so the bits that are identical will not be described. It is also not necessary to describe how external memory is being allocated due to the expectation being that user will have their own mechanisms to allocate memory outside of DPDK, and will only be interested in how to integrate said memory into DPDK. Signed-off-by: Anatoly Burakov --- doc/guides/sample_app_ug/external_mem.rst | 115 ++ doc/guides/sample_app_ug/index.rst| 1 + 2 files changed, 116 insertions(+) create mode 100644 doc/guides/sample_app_ug/external_mem.rst diff --git a/doc/guides/sample_app_ug/external_mem.rst b/doc/guides/sample_app_ug/external_mem.rst new file mode 100644 index 0..594c3397a --- /dev/null +++ b/doc/guides/sample_app_ug/external_mem.rst @@ -0,0 +1,115 @@ +.. SPDX-License-Identifier: BSD-3-Clause +Copyright(c) 2015-2018 Intel Corporation. + +External Memory Sample Application +== + +The External Memory sample application is a simple *skeleton* example of a +forwarding application using externally allocated memory. + +It is intended as a demonstration of the basic workflow of using externally +allocated memory in DPDK. This application is based on Basic Forwarding sample +application, and differs only in its initialization path. For more detailed +explanation of port initialization and packet forwarding code, please refer to +*Basic Forwarding sample application user guide*. + +Compiling the Application +- + +To compile the sample application see :doc:`compiling`. + +The application is located in the ``external_mem`` sub-directory. + +Running the Application +--- + +To run the example in a ``linuxapp`` environment: + +.. code-block:: console + +./build/extmem -l 1 -n 4 + +Refer to *DPDK Getting Started Guide* for general information on running +applications and the Environment Abstraction Layer (EAL) options. + + +Explanation +--- + +For general overview of the code and explanation of the main components of this +application, please refer to *Basic Forwarding sample application user guide*. +This guide will only explain sections of the code relevant to using external +memory in DPDK. + +All DPDK library functions used in the sample code are prefixed with ``rte_`` +and are explained in detail in the *DPDK API Documentation*. + + +External Memory Initialization +~~ + +The ``main()`` function performs the initialization and calls the execution +threads for each lcore. + +After initializing the Environment Abstraction Layer, the application also +initializes external memory (in this case, it's allocating a chunk of memory +using anonymous hugepages) inside the ``setup_extmem()`` local function. + +The first step in this process is to create an external heap: + +.. code-block:: c + +ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME); +if (ret < 0) { +printf("Cannot create heap\n"); +return -1; +} + +Once the heap is created, ``create_extmem`` function is called to create the +actual external memory area the application will be using. While the details of +that process will not be described as they are not pertinent to the external +memory API (it is expected that the user will have their own procedures to +create external memory), there are a few important things to note. + +In order to add an externally allocated memory area to the newly created heap, +the application needs the following pieces of information: + +* Pointer to start address of external memory area +* Length of this area +* Page size of memory backing this memory area +* Optionally, a per-page IOVA table + +All of this information is to be provided by the user. Additionally, if VFIO is +in use and if application intends to do DMA using the memory area, VFIO DMA +mapping must also be performed using ``rte_vfio_dma_map`` function. + +Once the external memory is created and mapped for DMA, the application also has +to add this memory to the heap that was created earlier: + +.. code-block:: c + +ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME, +param.addr, param.len, param.iova_table, +param.iova_table_len, param.pgsz); + +If return value indicates success, the memory area has been successfully added +to the heap. The next step is to retrieve the socket ID of this heap: + +.. code-block:: c + +socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME); +if (socket_id < 0) +rte_exit(EXIT_FAILURE, "Invalid socket for external heap\n"); + +After that, the socket ID has to be supplied to the mempool creation function: + +.. code-block:: c + +mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", +nb_mbufs_per_port * nb_ports, MBUF_CACHE_SIZE, 0, +mbuf_sz, socket_id); +
[dpdk-dev] [PATCH v3 09/20] malloc: add function to query socket ID of named heap
When we will be creating external heaps, they will have their own "fake" socket ID, so add a function that will map the heap name to its socket ID. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 14 lib/librte_eal/common/rte_malloc.c | 37 ++ lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 52 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index a9fb7e452..8870732a6 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -263,6 +263,20 @@ int rte_malloc_get_socket_stats(int socket, struct rte_malloc_socket_stats *socket_stats); +/** + * Find socket ID corresponding to a named heap. + * + * @param name + * Heap name to find socket ID for + * @return + * Socket ID in case of success (a non-negative number) + * -1 in case of error, with rte_errno set to one of the following: + * EINVAL - ``name`` was NULL + * ENOENT - heap identified by the name ``name`` was not found + */ +int __rte_experimental +rte_malloc_heap_get_socket(const char *name); + /** * Dump statistics. * diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 0515d47f3..ce18ac79c 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include @@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f) rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); } +int +rte_malloc_heap_get_socket(const char *name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + unsigned int idx; + int ret; + + if (name == NULL || + strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + for (idx = 0; idx < RTE_MAX_HEAPS; idx++) { + struct malloc_heap *tmp = &mcfg->malloc_heaps[idx]; + + if (!strncmp(name, tmp->name, RTE_HEAP_NAME_MAX_LEN)) { + heap = tmp; + break; + } + } + + if (heap != NULL) { + ret = heap->socket_id; + } else { + rte_errno = ENOENT; + ret = -1; + } + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} + /* * Print stats on memory type. If type is NULL, info on all types is printed */ diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 73282bbb0..d8f9665b8 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -318,6 +318,7 @@ EXPERIMENTAL { rte_fbarray_set_used; rte_log_register_type_and_pick_level; rte_malloc_dump_heaps; + rte_malloc_heap_get_socket; rte_mem_alloc_validator_register; rte_mem_alloc_validator_unregister; rte_mem_event_callback_register; -- 2.17.1
[dpdk-dev] [PATCH v3 17/20] examples: add external memory example app
Introduce an example application demonstrating the use of external memory support. This is a simple application based on skeleton app, but instead of using internal DPDK memory, it is using externally allocated memory. The RX/TX and init path is a carbon-copy of skeleton app, with no modifications whatseoever. The only difference is an additional init stage to allocate memory and create a heap for it, and the socket ID supplied to the mempool initialization function. The memory used by this app is hugepage memory allocated anonymously. Anonymous hugepage memory will not be allocated in a NUMA-aware fashion, so there is a chance of performance degradation when using this app, but given that kernel usually gives hugepages on local socket first, this should not be a problem in most cases. Signed-off-by: Anatoly Burakov --- examples/external_mem/Makefile| 62 examples/external_mem/extmem.c| 461 ++ examples/external_mem/meson.build | 12 + 3 files changed, 535 insertions(+) create mode 100644 examples/external_mem/Makefile create mode 100644 examples/external_mem/extmem.c create mode 100644 examples/external_mem/meson.build diff --git a/examples/external_mem/Makefile b/examples/external_mem/Makefile new file mode 100644 index 0..3b6ab3b2f --- /dev/null +++ b/examples/external_mem/Makefile @@ -0,0 +1,62 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2010-2018 Intel Corporation + +# binary name +APP = extmem + +# all source are stored in SRCS-y +SRCS-y := extmem.c + +# Build using pkg-config variables if possible +$(shell pkg-config --exists libdpdk) +ifeq ($(.SHELLSTATUS),0) + +all: shared +.PHONY: shared static +shared: build/$(APP)-shared + ln -sf $(APP)-shared build/$(APP) +static: build/$(APP)-static + ln -sf $(APP)-static build/$(APP) + +PC_FILE := $(shell pkg-config --path libdpdk) +CFLAGS += -O3 $(shell pkg-config --cflags libdpdk) +CFLAGS += -DALLOW_EXPERIMENTAL_API +LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk) +LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk) + +build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build + $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED) + +build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build + $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC) + +build: + @mkdir -p $@ + +.PHONY: clean +clean: + rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared + rmdir --ignore-fail-on-non-empty build + +else # Build using legacy build system + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overridden by command line or environment +RTE_TARGET ?= x86_64-native-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +CFLAGS += $(WERROR_FLAGS) +CFLAGS += -DALLOW_EXPERIMENTAL_API + +# workaround for a gcc bug with noreturn attribute +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) +CFLAGS_main.o += -Wno-return-type +endif + +include $(RTE_SDK)/mk/rte.extapp.mk +endif diff --git a/examples/external_mem/extmem.c b/examples/external_mem/extmem.c new file mode 100644 index 0..818a02171 --- /dev/null +++ b/examples/external_mem/extmem.c @@ -0,0 +1,461 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2018 Intel Corporation + */ + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#define RX_RING_SIZE 1024 +#define TX_RING_SIZE 1024 + +#define NUM_MBUFS 8191 +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 +#define EXTMEM_HEAP_NAME "extmem" + +static const struct rte_eth_conf port_conf_default = { + .rxmode = { + .max_rx_pkt_len = ETHER_MAX_LEN, + }, +}; + +/* extmem.c: Basic DPDK skeleton forwarding example using external memory. */ + +/* + * Initializes a given port using global settings and with the RX buffers + * coming from the mbuf_pool passed as a parameter. + */ +static inline int +port_init(uint16_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + uint16_t nb_rxd = RX_RING_SIZE; + uint16_t nb_txd = TX_RING_SIZE; + int retval; + uint16_t q; + struct rte_eth_dev_info dev_info; + struct rte_eth_txconf txconf; + + if (!rte_eth_dev_is_valid_port(port)) + return -1; + + rte_eth_dev_info_get(port, &dev_info); + if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_MBUF_FAST_FREE) + port_conf.txmode.offloads |= + DEV_TX_OFFLOAD_MBUF_FAST_FREE; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + retval = rte_eth_dev_adjust_nb_rx_tx_desc
[dpdk-dev] [PATCH v3 01/20] mem: add length to memseg list
Previously, to calculate length of memory area covered by a memseg list, we would've needed to multiply page size by length of fbarray backing that memseg list. This is not obvious and unnecessarily low level, so store length in the memseg list itself. Signed-off-by: Anatoly Burakov --- drivers/bus/pci/linux/pci.c | 2 +- lib/librte_eal/bsdapp/eal/eal_memory.c| 2 ++ lib/librte_eal/common/eal_common_memory.c | 5 ++--- lib/librte_eal/common/include/rte_eal_memconfig.h | 1 + lib/librte_eal/linuxapp/eal/eal_memalloc.c| 3 ++- lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +++- 6 files changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 04648ac93..d6e1027ab 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev) static int find_max_end_va(const struct rte_memseg_list *msl, void *arg) { - size_t sz = msl->memseg_arr.len * msl->page_sz; + size_t sz = msl->len; void *end_va = RTE_PTR_ADD(msl->base_va, sz); void **max_va = arg; diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c index 16d2bc7c3..65ea670f9 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memory.c +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c @@ -79,6 +79,7 @@ rte_eal_hugepage_init(void) } msl->base_va = addr; msl->page_sz = page_sz; + msl->len = internal_config.memory; msl->socket_id = 0; /* populate memsegs. each memseg is 1 page long */ @@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl) return -1; } msl->base_va = addr; + msl->len = mem_sz; return 0; } diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 0b69804ff..30d018209 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl) /* a memseg list was specified, check if it's the right one */ start = msl->base_va; - end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len); + end = RTE_PTR_ADD(start, msl->len); if (addr < start || addr >= end) return NULL; @@ -194,8 +194,7 @@ virt2memseg_list(const void *addr) msl = &mcfg->memsegs[msl_idx]; start = msl->base_va; - end = RTE_PTR_ADD(start, - (size_t)msl->page_sz * msl->memseg_arr.len); + end = RTE_PTR_ADD(start, msl->len); if (addr >= start && addr < end) break; } diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index aff0688dd..1d8b0a6fe 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -30,6 +30,7 @@ struct rte_memseg_list { uint64_t addr_64; /**< Makes sure addr is always 64-bits */ }; + size_t len; /**< Length of memory area covered by this memseg list. */ int socket_id; /**< Socket ID for all memsegs in this list. */ uint64_t page_sz; /**< Page size for all memsegs in this list. */ volatile uint32_t version; /**< version number for multiprocess sync. */ diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index b2e2a9599..71a6e0fd9 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg) int msl_idx, seg_idx, ret, dir_fd = -1; start_addr = (uintptr_t) msl->base_va; - end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz; + end_addr = start_addr + msl->len; if ((uintptr_t)wa->ms->addr < start_addr || (uintptr_t)wa->ms->addr >= end_addr) @@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl, return -1; } local_msl->base_va = primary_msl->base_va; + local_msl->len = primary_msl->len; return 0; } diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index e3ac24815..897d94179 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl) return -1; } msl->base_va = addr; + msl->len = mem_sz; return 0; } @@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void) msl->base_va = addr;
[dpdk-dev] [PATCH v3 12/20] malloc: allow adding memory to named heaps
Add an API to add externally allocated memory to malloc heap. The memory will be stored in memseg lists like regular DPDK memory. Multiple segments are allowed within a heap. If IOVA table is not provided, IOVA addresses are filled in with RTE_BAD_IOVA. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 39 lib/librte_eal/common/malloc_heap.c| 74 ++ lib/librte_eal/common/malloc_heap.h| 4 ++ lib/librte_eal/common/rte_malloc.c | 51 +++ lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 169 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 8a8cc1e6d..47f867a05 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -263,6 +263,45 @@ int rte_malloc_get_socket_stats(int socket, struct rte_malloc_socket_stats *socket_stats); +/** + * Add memory chunk to a heap with specified name. + * + * @note Multiple memory chunks can be added to the same heap + * + * @note Memory must be previously allocated for DPDK to be able to use it as a + * malloc heap. Failing to do so will result in undefined behavior, up to and + * including segmentation faults. + * + * @note Calling this function will erase any contents already present at the + * supplied memory address. + * + * @param heap_name + * Name of the heap to add memory chunk to + * @param va_addr + * Start of virtual area to add to the heap + * @param len + * Length of virtual area to add to the heap + * @param iova_addrs + * Array of page IOVA addresses corresponding to each page in this memory + * area. Can be NULL, in which case page IOVA addresses will be set to + * RTE_BAD_IOVA. + * @param n_pages + * Number of elements in the iova_addrs array. Ignored if ``iova_addrs`` + * is NULL. + * @param page_sz + * Page size of the underlying memory + * + * @return + * - 0 on success + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to add memory to a reserved heap + * ENOSPC - no more space in internal config to store a new memory chunk + */ +int __rte_experimental +rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, + rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz); + /** * Creates a new empty malloc heap with a specified name. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index e98f720cb..2f6946f65 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -1019,6 +1019,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f) rte_spinlock_unlock(&heap->lock); } +int +malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, + rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + char fbarray_name[RTE_FBARRAY_NAME_LEN]; + struct rte_memseg_list *msl = NULL; + struct rte_fbarray *arr; + size_t seg_len = n_pages * page_sz; + unsigned int i; + + /* first, find a free memseg list */ + for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { + struct rte_memseg_list *tmp = &mcfg->memsegs[i]; + if (tmp->base_va == NULL) { + msl = tmp; + break; + } + } + if (msl == NULL) { + RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n"); + rte_errno = ENOSPC; + return -1; + } + + snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p", + heap->name, va_addr); + + /* create the backing fbarray */ + if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages, + sizeof(struct rte_memseg)) < 0) { + RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n"); + return -1; + } + arr = &msl->memseg_arr; + + /* fbarray created, fill it up */ + for (i = 0; i < n_pages; i++) { + struct rte_memseg *ms; + + rte_fbarray_set_used(arr, i); + ms = rte_fbarray_get(arr, i); + ms->addr = RTE_PTR_ADD(va_addr, i * page_sz); + ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i]; + ms->hugepage_sz = page_sz; + ms->len = page_sz; + ms->nchannel = rte_memory_get_nchannel(); + ms->nrank = rte_memory_get_nrank(); + ms->socket_id = heap->socket_id; + } + + /* set up the memseg list */ + msl->base_va = va_addr; + msl->page_sz = page_sz; + msl->socket_id = heap->socket_id; + msl->l
[dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. This changes the semantics of what we understand by "socket ID", so document the change in the release notes. Signed-off-by: Anatoly Burakov --- doc/guides/rel_notes/release_18_11.rst | 7 +++ lib/librte_eal/common/eal_common_memzone.c | 8 +--- lib/librte_eal/common/malloc_heap.c| 2 +- lib/librte_eal/common/rte_malloc.c | 4 4 files changed, 13 insertions(+), 8 deletions(-) diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index e96ec9b43..63bbb1b51 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -98,6 +98,13 @@ API Changes users of memseg-walk-related functions, as they will now have to skip externally allocated segments in most cases if the intent is to only iterate over internal DPDK memory. + - ``socket_id`` parameter across the entire DPDK has gained additional +meaning, as some socket ID's will now be representing externally allocated +memory. No changes will be required for existing code as backwards +compatibility will be kept, and those who do not use this feature will not +see these extra socket ID's. Any new API's must not check socket ID +parameters themselves, and must instead leave it to the memory subsystem to +decide whether socket ID is a valid one. ABI Changes --- diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c index 7300fe05d..b7081afbf 100644 --- a/lib/librte_eal/common/eal_common_memzone.c +++ b/lib/librte_eal/common/eal_common_memzone.c @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len, return NULL; } - if ((socket_id != SOCKET_ID_ANY) && - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) { + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) { rte_errno = EINVAL; return NULL; } - if (!rte_eal_has_hugepages()) + /* only set socket to SOCKET_ID_ANY if we aren't allocating for an +* external heap. +*/ + if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES) socket_id = SOCKET_ID_ANY; contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0; diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 1d1e35708..73e478076 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg, if (size == 0 || (align && !rte_is_power_of_2(align))) return NULL; - if (!rte_eal_has_hugepages()) + if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES) socket_arg = SOCKET_ID_ANY; if (socket_arg == SOCKET_ID_ANY) diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index dfcdf380a..458c44ba6 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align, if (!rte_eal_has_hugepages()) socket_arg = SOCKET_ID_ANY; - /* Check socket parameter */ - if (socket_arg >= RTE_MAX_NUMA_NODES) - return NULL; - return malloc_heap_alloc(type, size, socket_arg, 0, align == 0 ? 1 : align, 0, false); } -- 2.17.1
[dpdk-dev] [PATCH v3 10/20] malloc: allow creating malloc heaps
Add API to allow creating new malloc heaps. They will be created with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing with internal heaps. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 19 lib/librte_eal/common/malloc_heap.c| 30 + lib/librte_eal/common/malloc_heap.h| 3 ++ lib/librte_eal/common/rte_malloc.c | 52 ++ lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 105 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 8870732a6..182afab1c 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -263,6 +263,25 @@ int rte_malloc_get_socket_stats(int socket, struct rte_malloc_socket_stats *socket_stats); +/** + * Creates a new empty malloc heap with a specified name. + * + * @note Heaps created via this call will automatically get assigned a unique + * socket ID, which can be found using ``rte_malloc_heap_get_socket()`` + * + * @param heap_name + * Name of the heap to create. + * + * @return + * - 0 on successful creation + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - ``heap_name`` was NULL, empty or too long + * EEXIST - heap by name of ``heap_name`` already exists + * ENOSPC - no more space in internal config to store a new heap + */ +int __rte_experimental +rte_malloc_heap_create(const char *heap_name); + /** * Find socket ID corresponding to a named heap. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 2a5d2a381..1dd4ffcf9 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -29,6 +29,10 @@ #include "malloc_heap.h" #include "malloc_mp.h" +/* start external socket ID's at a very high number */ +#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */ +#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES)) + static unsigned check_hugepage_sz(unsigned flags, uint64_t hugepage_sz) { @@ -1015,6 +1019,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f) rte_spinlock_unlock(&heap->lock); } +int +malloc_heap_create(struct malloc_heap *heap, const char *heap_name) +{ + static uint32_t next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID; + + /* prevent overflow. did you really create 2 billion heaps??? */ + if (next_socket_id > INT32_MAX) { + RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n"); + rte_errno = ENOSPC; + return -1; + } + + /* initialize empty heap */ + heap->alloc_count = 0; + heap->first = NULL; + heap->last = NULL; + LIST_INIT(heap->free_head); + rte_spinlock_init(&heap->lock); + heap->total_size = 0; + heap->socket_id = next_socket_id++; + + /* set up name */ + strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN); + return 0; +} + int rte_eal_malloc_heap_init(void) { diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index 61b844b6f..eebee16dc 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -33,6 +33,9 @@ void * malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags, size_t align, bool contig); +int +malloc_heap_create(struct malloc_heap *heap, const char *heap_name); + int malloc_heap_free(struct malloc_elem *elem); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index ce18ac79c..39875fe69 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -286,3 +287,54 @@ rte_malloc_virt2iova(const void *addr) return ms->iova + RTE_PTR_DIFF(addr, ms->addr); } + +int +rte_malloc_heap_create(const char *heap_name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + int i, ret; + + if (heap_name == NULL || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + /* check if there is space in the heap list, or if heap with this name +* already exists. +*/ + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); + + for (i = 0; i < RTE_MAX_HEAPS; i++) { + struct malloc_heap *tmp = &mcfg->malloc_heaps[i]; + /* existing heap */ + if (strncmp(heap_name, tmp->name, + RTE_HEAP_NAME_MAX_LEN) == 0) { +
[dpdk-dev] [PATCH v3 08/20] malloc: add name to malloc heaps
We will need to refer to external heaps in some way. While we use heap ID's internally, for external API use it has to be something more user-friendly. So, we will be using a string to uniquely identify a heap. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++ lib/librte_eal/common/malloc_heap.c | 15 ++- lib/librte_eal/common/rte_malloc.c | 1 + 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h index e7ac32d42..1c08ef3e0 100644 --- a/lib/librte_eal/common/include/rte_malloc_heap.h +++ b/lib/librte_eal/common/include/rte_malloc_heap.h @@ -12,6 +12,7 @@ /* Number of free lists per heap, grouped by size. */ #define RTE_HEAP_NUM_FREELISTS 13 +#define RTE_HEAP_NAME_MAX_LEN 32 /* dummy definition, for pointers */ struct malloc_elem; @@ -28,6 +29,7 @@ struct malloc_heap { unsigned alloc_count; size_t total_size; unsigned int socket_id; + char name[RTE_HEAP_NAME_MAX_LEN]; } __rte_cache_aligned; #endif /* _RTE_MALLOC_HEAP_H_ */ diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 73e478076..2a5d2a381 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl, malloc_heap_add_memory(heap, found_msl, ms->addr, len); heap->total_size += len; - heap->socket_id = msl->socket_id; RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20, msl->socket_id); @@ -1020,6 +1019,20 @@ int rte_eal_malloc_heap_init(void) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + unsigned int i; + + /* assign names to default DPDK heaps */ + for (i = 0; i < rte_socket_count(); i++) { + struct malloc_heap *heap = &mcfg->malloc_heaps[i]; + char heap_name[RTE_HEAP_NAME_MAX_LEN]; + int socket_id = rte_socket_id_by_idx(i); + + snprintf(heap_name, sizeof(heap_name) - 1, + "socket_%i", socket_id); + strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN); + heap->socket_id = socket_id; + } + if (register_mp_requests()) { RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n"); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 458c44ba6..0515d47f3 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type) malloc_heap_get_stats(heap, &sock_stats); fprintf(f, "Heap id:%u\n", heap_id); + fprintf(f, "\tHeap name:%s\n", heap->name); fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes); fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes); fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes); -- 2.17.1
[dpdk-dev] [PATCH v3 15/20] malloc: allow detaching from external memory
Add API to detach from existing chunk of external memory in a process. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 27 ++ lib/librte_eal/common/rte_malloc.c | 27 ++ lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 50 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 440496cd9..d2236c421 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, * @note Memory area must not contain any allocated elements to allow its * removal from the heap * + * @note All other processes must detach from the memory chunk prior to it being + * removed from the heap. + * * @param heap_name * Name of the heap to remove memory from * @param va_addr @@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len); int __rte_experimental rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len); +/** + * Detach from a chunk of external memory in secondary process. + * + * @note This function must be called in before any attempt is made to remove + * external memory from the heap in another process. This function does *not* + * need to be called if a call to ``rte_malloc_heap_memory_remove`` will be + * called in current process. + * + * @param heap_name + * Heap name to which this chunk of memory belongs + * @param va_addr + * Start address of memory chunk to attach to + * @param len + * Length of memory chunk to attach to + * @return + * 0 on successful detach + * -1 on unsuccessful detach, with rte_errno set to indicate cause for error: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to detach memory from a reserved heap + * ENOENT - heap or memory chunk was not found + */ +int __rte_experimental +rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len); + /** * Creates a new empty malloc heap with a specified name. * diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index bc22d21e4..e9be179d5 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -397,10 +397,11 @@ struct sync_mem_walk_arg { void *va_addr; size_t len; int result; + bool attach; }; static int -attach_mem_walk(const struct rte_memseg_list *msl, void *arg) +sync_mem_walk(const struct rte_memseg_list *msl, void *arg) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; struct sync_mem_walk_arg *wa = arg; @@ -415,7 +416,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg) msl_idx = msl - mcfg->memsegs; found_msl = &mcfg->memsegs[msl_idx]; - ret = rte_fbarray_attach(&found_msl->memseg_arr); + if (wa->attach) + ret = rte_fbarray_attach(&found_msl->memseg_arr); + else + ret = rte_fbarray_detach(&found_msl->memseg_arr); if (ret < 0) wa->result = -rte_errno; @@ -426,8 +430,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg) return 0; } -int -rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) +static int +sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; struct malloc_heap *heap = NULL; @@ -461,9 +465,10 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) wa.va_addr = va_addr; wa.len = len; wa.result = -ENOENT; /* fail unless explicitly told to succeed */ + wa.attach = attach; /* we're already holding a read lock */ - rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa); + rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa); if (wa.result < 0) { rte_errno = -wa.result; @@ -476,6 +481,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) return ret; } +int +rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) +{ + return sync_memory(heap_name, va_addr, len, true); +} + +int +rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len) +{ + return sync_memory(heap_name, va_addr, len, false); +} + int rte_malloc_heap_create(const char *heap_name) { diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 0a2e46767..a535c4da8 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -323,6 +323,7 @@ EXPERIMENTAL
Re: [dpdk-dev] [PATCH v2 2/2] net/failsafe: support multicast address list set
On 9/19/18 6:12 PM, Gaëtan Rivet wrote: On Wed, Sep 19, 2018 at 04:50:57PM +0200, Gaëtan Rivet wrote: Hi, Sorry about the delay on this, overall it looks ok; I have an issue however, see inline. On Mon, Sep 03, 2018 at 07:55:22AM +0100, Andrew Rybchenko wrote: From: Evgeny Im Signed-off-by: Evgeny Im Signed-off-by: Andrew Rybchenko --- doc/guides/nics/features/failsafe.ini | 1 + doc/guides/rel_notes/release_18_11.rst | 6 drivers/net/failsafe/failsafe.c | 1 + drivers/net/failsafe/failsafe_ether.c | 17 + drivers/net/failsafe/failsafe_ops.c | 48 + drivers/net/failsafe/failsafe_private.h | 2 ++ 6 files changed, 75 insertions(+) diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini index 83cc99d19..39ee57965 100644 --- a/doc/guides/nics/features/failsafe.ini +++ b/doc/guides/nics/features/failsafe.ini @@ -12,6 +12,7 @@ Jumbo frame = Y Promiscuous mode = Y Allmulticast mode= Y Unicast MAC filter = Y +Multicast MAC filter = Y VLAN filter = Y Flow control = Y Flow API = Y diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index 24204e67b..54e0e4ee4 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -54,6 +54,12 @@ New Features Also, make sure to start the actual text at the margin. = +* **Updated failsafe driver.** + + Updated the failsafe driver including the following changes: + + * Support multicast MAC address set. + API Changes --- diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c index 657919f93..c3999f026 100644 --- a/drivers/net/failsafe/failsafe.c +++ b/drivers/net/failsafe/failsafe.c @@ -304,6 +304,7 @@ fs_rte_eth_free(const char *name) ret = pthread_mutex_destroy(&PRIV(dev)->hotplug_mutex); if (ret) ERROR("Error while destroying hotplug mutex"); + rte_free(PRIV(dev)->mcast_addrs); rte_free(PRIV(dev)); rte_eth_dev_release_port(dev); return ret; diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c index 5b5cb3b49..5078feabe 100644 --- a/drivers/net/failsafe/failsafe_ether.c +++ b/drivers/net/failsafe/failsafe_ether.c @@ -424,6 +424,23 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev) ret = dev->dev_ops->dev_start(dev); if (ret) goto err_remove; + /* +* Propagate multicast MAC addresses to sub-devices, +* if non zero number of addresses is set. +* The condition is required to avoid breakage of failsafe +* for sub-devices which do not support the operation +* if the feature is really not used. +*/ + if (PRIV(dev)->nb_mcast_addr > 0) { + ret = dev->dev_ops->set_mc_addr_list(dev, +PRIV(dev)->mcast_addrs, +PRIV(dev)->nb_mcast_addr); + if (ret) { + ERROR("Could not set list of multicast addresses to sub_device %d", + i); + goto err_remove; + } + } Using here the dev-ops instead of calling the rte_eth_* API as is done for the other configuration items, is unorthodox and I believe could lead to issues. Sorry I forgot the mention it, but it seems that this could be done in fs_eth_dev_conf_apply() instead, which explains why I would consider using the dev-ops being unorthodox. Yes, I've overlooked it on my review before submission. Why didn't you call rte_eth_dev_set_mc_addr_list on the new port only instead, the same way it is done for the other configuration item? Using the dev-ops, you are making the other sub-device re-apply the same configuration periodically (in case of repeated hotplug error), twice per sub-device upkeep cycle. This is unnecessary and seems to foster instability for no clear benefit. Can you justify it? If it was necessary to call this after the dev_start, I think it would be better to restrict the configuration to inactive sub-devices, in any case. In theory, multicast addresses list is not listed in configuration items retained across stop/start, so, could be wrong to set before start. I hope it is just incomplete documentation in ethdev and we should just fix it. Thanks a lot for review, Andrew.
Re: [dpdk-dev] [PATCH] latency: clear mbuf timestamp after latency calculation
Hi, Thanks, I have sent a v2. Any comment on the problem of dropped mbuf that I described in the cover letter? In our application the max_latency_ns metric is useless since after running for a while it would always take on obviously incorrect value (up to a few minutes). I suspect the impact on avg_latency_ns is much less severe but significant nonetheless. > -Original Message- > From: reshma.pat...@intel.com [mailto:reshma.pat...@intel.com] > Sent: Thursday, September 20, 2018 5:25 PM > To: long...@viettel.com.vn > Cc: dev@dpdk.org; sta...@dpdk.org > Subject: RE: [PATCH] latency: clear mbuf timestamp after latency calculation > > Hi, > > > -Original Message- > > From: long...@viettel.com.vn [mailto:long...@viettel.com.vn] > > Sent: Wednesday, September 19, 2018 9:23 AM > > To: Pattan, Reshma > > Cc: dev@dpdk.org; Bao-Long Tran ; > > sta...@dpdk.org > > Subject: [PATCH] latency: clear mbuf timestamp after latency > > calculation > > > > The timestamp of a mbuf should be cleared after that mbuf was used for > > latency calculation, otherwise future packets which reuse the same > > mbuf would inherit that previous timestamp. The latencystats library > > looks for mbuf with non-zero timestamp, thus incorrectly inherited > > value would result in incorrect latency measurement. > > > > Cc: sta...@dpdk.org > > > > Signed-off-by: Bao-Long Tran > > You need to add the Fixes line just before CC: in the commit message. > > Original commit that introduced the bug was 5cd3cac9ed. So fixes should be > added like below > Fixes: 5cd3cac9ed ("latency: added new library for latency stats"). > > You can send v2 with fixes line and my ack. Other than that > > Acked-by: Reshma Pattan
[dpdk-dev] [PATCH v2] eal/bsd: fix unused parameters compile error
When compiling on FreeBSD, lots of warnings/errors are thrown for unused parameter. Fix these by marking the parameters as unused in the code. Fixes: 1009ba1704f9 ("mem: add internal API to get and set segment fd") Fixes: 3a44687139eb ("mem: allow querying offset into segment fd") Signed-off-by: Anatoly Burakov Acked-by: Bruce Richardson --- Notes: v2: - Added missing Fixes tag - Reworded commit message lib/librte_eal/bsdapp/eal/eal_memalloc.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c index 06afbcc99..a5847f0bd 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c @@ -49,19 +49,21 @@ eal_memalloc_sync_with_primary(void) } int -eal_memalloc_get_seg_fd(int list_idx, int seg_idx) +eal_memalloc_get_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused) { return -ENOTSUP; } int -eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) +eal_memalloc_set_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused, + int fd __rte_unused) { return -ENOTSUP; } int -eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset) +eal_memalloc_get_seg_fd_offset(int list_idx __rte_unused, + int seg_idx __rte_unused, size_t *offset __rte_unused) { return -ENOTSUP; } -- 2.17.1
Re: [dpdk-dev] [PATCH] net/bonding: ensure fairness among slaves
On Thu, Sep 20, 2018 at 2:28 AM Matan Azrad wrote: > > Hi Chas > Please see small comments. > > > From: Chas Williams > > Some PMDs, especially ones with vector receives, require a minimum > > number of receive buffers in order to receive any packets. If the first > > slave > > read leaves less than this number available, a read from the next slave may > > return 0 implying that the slave doesn't have any packets which results in > > skipping over that slave as the next active slave. > > It is true not only in case of 0. > It makes sense that the first polling slave gets the majority part of the > burst while the others just get smaller part > I suggest to rephrase to the general issue . It doesn't happen for the 802.3ad burst routines in general. If you run out of buffers then you don't advance to the next slave and that slave picks up where you left off during the next rx burst. If some slave is attempting to do this, it will consume all the buffers and you will be at the next slave for the next rx and all is well. There are just some odd corner cases, where you read just slow (or fast?) enough that the first slave leaves just a few buffers. But reading the next slave results in a 0 (because of the vector RX), and you don't loop back around to the first slave. So next time around you start back at the troublesome slave. The fix for the other RX burst routines is just completeness. > > > > > To fix this, implement round robin for the slaves during receive that is > > only > > advanced to the next slave at the end of each receive burst. > > This should also provide some additional fairness in processing in > > bond_ethdev_rx_burst as well. > > > > Fixes: 2efb58cbab6e ("bond: new link bonding library") > > If it is a fix, why not to use a fix title? > Maybe > net/bonding: fix the slaves Rx fairness I can use the word fix I suppose. > > > Cc: sta...@dpdk.org > > > > Signed-off-by: Chas Williams > Besides that: > Acked-by: Matan Azrad > > > --- > > drivers/net/bonding/rte_eth_bond_pmd.c | 50 > > ++ > > 1 file changed, 32 insertions(+), 18 deletions(-) > > > > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c > > b/drivers/net/bonding/rte_eth_bond_pmd.c > > index b84f32263..f25faa75c 100644 > > --- a/drivers/net/bonding/rte_eth_bond_pmd.c > > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c > > @@ -58,28 +58,33 @@ bond_ethdev_rx_burst(void *queue, struct > > rte_mbuf **bufs, uint16_t nb_pkts) { > > struct bond_dev_private *internals; > > > > - uint16_t num_rx_slave = 0; > > uint16_t num_rx_total = 0; > > - > > + uint16_t slave_count; > > + uint16_t active_slave; > > int i; > > > > /* Cast to structure, containing bonded device's port id and queue id > > */ > > struct bond_rx_queue *bd_rx_q = (struct bond_rx_queue *)queue; > > - > > internals = bd_rx_q->dev_private; > > + slave_count = internals->active_slave_count; > > + active_slave = internals->active_slave; > > > > + for (i = 0; i < slave_count && nb_pkts; i++) { > > + uint16_t num_rx_slave; > > > > - for (i = 0; i < internals->active_slave_count && nb_pkts; i++) { > > /* Offset of pointer to *bufs increases as packets are > > received > >* from other slaves */ > > - num_rx_slave = rte_eth_rx_burst(internals- > > >active_slaves[i], > > + num_rx_slave = rte_eth_rx_burst( > > + internals->active_slaves[active_slave], > > bd_rx_q->queue_id, bufs + num_rx_total, > > nb_pkts); > > - if (num_rx_slave) { > > - num_rx_total += num_rx_slave; > > - nb_pkts -= num_rx_slave; > > - } > > + num_rx_total += num_rx_slave; > > + nb_pkts -= num_rx_slave; > > + if (++active_slave == slave_count) > > + active_slave = 0; > > } > > > > + if (++internals->active_slave == slave_count) > > + internals->active_slave = 0; > > return num_rx_total; > > } > > > > @@ -258,25 +263,32 @@ bond_ethdev_rx_burst_8023ad_fast_queue(void > > *queue, struct rte_mbuf **bufs, > > uint16_t num_rx_total = 0; /* Total number of received packets > > */ > > uint16_t slaves[RTE_MAX_ETHPORTS]; > > uint16_t slave_count; > > - > > - uint16_t i, idx; > > + uint16_t active_slave; > > + uint16_t i; > > > > /* Copy slave list to protect against slave up/down changes during tx > >* bursting */ > > slave_count = internals->active_slave_count; > > + active_slave = internals->active_slave; > > memcpy(slaves, internals->active_slaves, > > sizeof(internals->active_slaves[0]) * slave_count); > > > > - for (i = 0, idx = internals->active_slave; > > - i < slave_count && num_rx_total < nb_pkts; i++, > > idx++) { > > - idx
[dpdk-dev] [PATCH v2] mem: fix undefined behavior in NUMA code
When NUMA-aware hugepages config option is set, we rely on libnuma to tell the kernel to allocate hugepages on a specific NUMA node. However, we allocate node mask before we check if NUMA is available in the first place, which, according to the manpage [1], causes undefined behaviour. Fix by only using nodemask when we have NUMA available. [1] https://linux.die.net/man/3/numa_alloc_onnode Bugzilla ID: 20 Fixes: 1b72605d2416 ("mem: balanced allocation of hugepages") Cc: i.maxim...@samsung.com Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_memory.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index dbf19499e..1a2a84a65 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -263,7 +263,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi, int node_id = -1; int essential_prev = 0; int oldpolicy; - struct bitmask *oldmask = numa_allocate_nodemask(); + struct bitmask *oldmask = NULL; bool have_numa = true; unsigned long maxnode = 0; @@ -275,6 +275,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi, if (have_numa) { RTE_LOG(DEBUG, EAL, "Trying to obtain current memory policy.\n"); + oldmask = numa_allocate_nodemask(); if (get_mempolicy(&oldpolicy, oldmask->maskp, oldmask->size + 1, 0, 0) < 0) { RTE_LOG(ERR, EAL, @@ -401,8 +402,8 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi, strerror(errno)); numa_set_localalloc(); } + numa_free_cpumask(oldmask); } - numa_free_cpumask(oldmask); #endif return i; } -- 2.17.1
[dpdk-dev] [PATCH v2] net/bonding: fix RX slave fairness
From: Chas Williams Some PMDs, especially ones with vector receives, require a minimum number of receive buffers in order to receive any packets. If the first slave read leaves less than this number available, a read from the next slave may return 0 implying that the slave doesn't have any packets which results in skipping over that slave as the next active slave. To fix this, implement round robin for the slaves during receive that is only advanced to the next slave at the end of each receive burst. This is also done to provide some additional fairness in processing in other bonding RX burst routines as well. Fixes: 2efb58cbab6e ("bond: new link bonding library") Cc: sta...@dpdk.org Signed-off-by: Chas Williams Acked-by: Luca Boccassi Acked-by: Matan Azrad --- v2: - Reworded title and commit message - Fix checkpatch issue drivers/net/bonding/rte_eth_bond_pmd.c | 53 ++ 1 file changed, 34 insertions(+), 19 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index b84f32263..5efd046a1 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -58,28 +58,34 @@ bond_ethdev_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) { struct bond_dev_private *internals; - uint16_t num_rx_slave = 0; uint16_t num_rx_total = 0; - + uint16_t slave_count; + uint16_t active_slave; int i; /* Cast to structure, containing bonded device's port id and queue id */ struct bond_rx_queue *bd_rx_q = (struct bond_rx_queue *)queue; - internals = bd_rx_q->dev_private; + slave_count = internals->active_slave_count; + active_slave = internals->active_slave; + for (i = 0; i < slave_count && nb_pkts; i++) { + uint16_t num_rx_slave; - for (i = 0; i < internals->active_slave_count && nb_pkts; i++) { /* Offset of pointer to *bufs increases as packets are received * from other slaves */ - num_rx_slave = rte_eth_rx_burst(internals->active_slaves[i], - bd_rx_q->queue_id, bufs + num_rx_total, nb_pkts); - if (num_rx_slave) { - num_rx_total += num_rx_slave; - nb_pkts -= num_rx_slave; - } + num_rx_slave = + rte_eth_rx_burst(internals->active_slaves[active_slave], +bd_rx_q->queue_id, +bufs + num_rx_total, nb_pkts); + num_rx_total += num_rx_slave; + nb_pkts -= num_rx_slave; + if (++active_slave == slave_count) + active_slave = 0; } + if (++internals->active_slave == slave_count) + internals->active_slave = 0; return num_rx_total; } @@ -258,25 +264,32 @@ bond_ethdev_rx_burst_8023ad_fast_queue(void *queue, struct rte_mbuf **bufs, uint16_t num_rx_total = 0; /* Total number of received packets */ uint16_t slaves[RTE_MAX_ETHPORTS]; uint16_t slave_count; - - uint16_t i, idx; + uint16_t active_slave; + uint16_t i; /* Copy slave list to protect against slave up/down changes during tx * bursting */ slave_count = internals->active_slave_count; + active_slave = internals->active_slave; memcpy(slaves, internals->active_slaves, sizeof(internals->active_slaves[0]) * slave_count); - for (i = 0, idx = internals->active_slave; - i < slave_count && num_rx_total < nb_pkts; i++, idx++) { - idx = idx % slave_count; + for (i = 0; i < slave_count && nb_pkts; i++) { + uint16_t num_rx_slave; /* Read packets from this slave */ - num_rx_total += rte_eth_rx_burst(slaves[idx], bd_rx_q->queue_id, - &bufs[num_rx_total], nb_pkts - num_rx_total); + num_rx_slave = rte_eth_rx_burst(slaves[active_slave], + bd_rx_q->queue_id, + bufs + num_rx_total, nb_pkts); + num_rx_total += num_rx_slave; + nb_pkts -= num_rx_slave; + + if (++active_slave == slave_count) + active_slave = 0; } - internals->active_slave = idx; + if (++internals->active_slave == slave_count) + internals->active_slave = 0; return num_rx_total; } @@ -459,7 +472,9 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf **bufs, idx = 0; } - internals->active_slave = idx; + if (++internals->active_slave == slave_count) + internals->active_slave = 0; + return num_rx
Re: [dpdk-dev] [PATCH v2] eal/bsd: fix unused parameters compile error
20/09/2018 14:26, Anatoly Burakov: > When compiling on FreeBSD, lots of warnings/errors are thrown for > unused parameter. Fix these by marking the parameters as unused > in the code. > > Fixes: 1009ba1704f9 ("mem: add internal API to get and set segment fd") > Fixes: 3a44687139eb ("mem: allow querying offset into segment fd") > > Signed-off-by: Anatoly Burakov > Acked-by: Bruce Richardson Applied, thanks
Re: [dpdk-dev] How to replace rte_eth_dev_attach with rte_eal_hotplug_add
20/09/2018 11:09, Gaëtan Rivet: > On Thu, Sep 20, 2018 at 05:46:37PM +0900, Hideyuki Yamashita wrote: > > Hello, > > > > From dpdk 18.08 release rte_eth_dev_attach and > > rte_eth_dev_detach becom deprecated API and > > it is recommended to replace with rte_eal_hotplug_add > > and rte_eal_hotplug_remove. > > > > My program uses above mentioned deprecated APIs > > and have to replace those. > > Note that my program uses attach to attach vhost, pcap pmd. > > > > My question is whether it is correct to replace those as following: > > find rte_eth_dev_attach function in rte_ethdev.c and > > migrate those content into my program. > > > > e.g. > > lib/librte_ethdev/rte_ethdev.c line 643-686 for attach > > lib/librte_ethdev/rte_ethdev.c line 690-720 for detach > > > > Your advice/guidance are much appreciated. > > Thanks! > > Hello Hideyuki, > > You could use this code for guidance, while leaving the ethdev > specificities such as verifying the eth_dev_count_total(). The hotplug > function would already return an error if the PMD was not able to create > the necessary devices. > > The main issue might be to find the port_id of your new port. > You won't be able to use eth_dev_last_created_port, so you would have to > iterate over the ethdev using RTE_ETH_FOREACH_DEV and find the one > matching your parameters (you might for example match the rte_device > name with the name you used in hotplug_add, as there is no standard > naming scheme at the ethdev level). It is recommended to register a callback to receive the notifications of new ethdev ports. So it may be a change of programming style: sync vs async. > An possible issue with the deprecation planned for those two functions is > that the hotplug API is also meant to evolve [1] this release (not in a big > way however, it would mostly simplify your usage of it). > > [1]: https://mails.dpdk.org/archives/dev/2018-September/42.html I will probably not change the existing functions. A v2 will be sent soon, with new simple functions.
Re: [dpdk-dev] Incorrect latencystats implementation
> -Original Message- > From: long...@viettel.com.vn [mailto:long...@viettel.com.vn] > Sent: Wednesday, September 19, 2018 9:17 AM > To: Pattan, Reshma > Cc: dev@dpdk.org; Bao-Long Tran > Subject: Incorrect latencystats implementation > > > I have submit a patch to implement the trivial fix. For the drop case I can > think of 2 options. We can either clear timestamp when putting mbufs back > to their pool, or change lib latencystats implementation to perform packet > selection at TX callback and let RX callback add timestamp to every packet. > Both option could affect performance but I think the second option is less > aggressive. What happens when applications drop the packets? Do they free the mbuf? In such case, can application set the timestamp to 0 before freeing the mbuf, instead of making these changes in latency library.? Regards, Reshma
[dpdk-dev] [PATCH v3] pci/vfio: allow mapping MSI-X BARs if kernel allows it
Currently, DPDK will skip mapping some areas (or even an entire BAR) if MSI-X table happens to be in them but is smaller than page size. Kernels 4.16+ will allow mapping MSI-X BARs [1], and will report this as a capability flag. Capability flags themselves are also only supported since kernel 4.6 [2]. This commit will introduce support for checking VFIO capabilities, and will use it to check if we are allowed to map BARs with MSI-X tables in them, along with backwards compatibility for older kernels, including a workaround for a variable rename in VFIO region info structure [3]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/ linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/ linux.git/commit/?id=c84982adb23bcf3b99b79ca33527cd2625fbe279 [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/ linux.git/commit/?id=ff63eb638d63b95e489f976428f1df01391e15e4 Signed-off-by: Anatoly Burakov --- Notes: v3->v2: - Fix potential uninitialized value access as per Takeshi's comments - Fix potential memory leak on failed memory reallocation v2->v1: - Fix pointer in pci_vfio_get_region_info - Fix commit message drivers/bus/pci/linux/pci_vfio.c | 132 --- lib/librte_eal/common/include/rte_vfio.h | 26 + 2 files changed, 145 insertions(+), 13 deletions(-) diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c index 686386d6a..d112b4b54 100644 --- a/drivers/bus/pci/linux/pci_vfio.c +++ b/drivers/bus/pci/linux/pci_vfio.c @@ -415,6 +415,93 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res, return 0; } +/* + * region info may contain capability headers, so we need to keep reallocating + * the memory until we match allocated memory size with argsz. + */ +static int +pci_vfio_get_region_info(int vfio_dev_fd, struct vfio_region_info **info, + int region) +{ + struct vfio_region_info *ri; + size_t argsz = sizeof(*ri); + int ret; + + ri = malloc(sizeof(*ri)); + if (ri == NULL) { + RTE_LOG(ERR, EAL, "Cannot allocate memory for region info\n"); + return -1; + } +again: + memset(ri, 0, argsz); + ri->argsz = argsz; + ri->index = region; + + ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, ri); + if (ret < 0) { + free(ri); + return ret; + } + if (ri->argsz != argsz) { + struct vfio_region_info *tmp; + + argsz = ri->argsz; + tmp = realloc(ri, argsz); + + if (tmp == NULL) { + /* realloc failed but the ri is still there */ + free(ri); + RTE_LOG(ERR, EAL, "Cannot reallocate memory for region info\n"); + return -1; + } + ri = tmp; + goto again; + } + *info = ri; + + return 0; +} + +static struct vfio_info_cap_header * +pci_vfio_info_cap(struct vfio_region_info *info, int cap) +{ + struct vfio_info_cap_header *h; + size_t offset; + + if ((info->flags & RTE_VFIO_INFO_FLAG_CAPS) == 0) { + /* VFIO info does not advertise capabilities */ + return NULL; + } + + offset = VFIO_CAP_OFFSET(info); + while (offset != 0) { + h = RTE_PTR_ADD(info, offset); + if (h->id == cap) + return h; + offset = h->next; + } + return NULL; +} + +static int +pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region) +{ + struct vfio_region_info *info; + int ret; + + ret = pci_vfio_get_region_info(vfio_dev_fd, &info, msix_region); + if (ret < 0) + return -1; + + ret = pci_vfio_info_cap(info, RTE_VFIO_CAP_MSIX_MAPPABLE) != NULL; + + /* cleanup */ + free(info); + + return ret; +} + + static int pci_vfio_map_resource_primary(struct rte_pci_device *dev) { @@ -464,56 +551,75 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev) if (ret < 0) { RTE_LOG(ERR, EAL, " %s cannot get MSI-X BAR number!\n", pci_addr); - goto err_vfio_dev_fd; + goto err_vfio_res; + } + /* if we found our MSI-X BAR region, check if we can mmap it */ + if (vfio_res->msix_table.bar_index != -1) { + int ret = pci_vfio_msix_is_mappable(vfio_dev_fd, + vfio_res->msix_table.bar_index); + if (ret < 0) { + RTE_LOG(ERR, EAL, "Couldn't check if MSI-X BAR is mappable\n"); + goto err_vfio_res; + } else if (ret != 0) { + /* we can map it, so we don't care where it is */ +
Re: [dpdk-dev] [PATCH v2] doc: support building HTML guides with meson
On Wed, 19 Sep 2018 14:48:27 +0100 Luca Boccassi wrote: > From: Bruce Richardson > > Signed-off-by: Bruce Richardson > Signed-off-by: Luca Boccassi > --- > v2: send on behalf of Bruce on request. > - tell sphinx to create .doctrees working files in the > parent of the target directory so that they don't get > installed > - change the output directory so that it matches the legacy > makefiles directory structure (on install) > - add install_dir to fix ninja install > - add post-install calls to delete .buildinfo sphinx > temporary file and to install custom.css as the makefiles > do > The installed directory has been verified to be bit-by-bit > identical to the legacy makefiles with diffoscope. > > doc/api/meson.build| 5 ++--- > doc/guides/meson.build | 27 +++ > doc/meson.build| 11 +++ > 3 files changed, 40 insertions(+), 3 deletions(-) > create mode 100644 doc/guides/meson.build > > diff --git a/doc/api/meson.build b/doc/api/meson.build > index 13fcbb8cd7..30bdc573b5 100644 > --- a/doc/api/meson.build > +++ b/doc/api/meson.build > @@ -50,7 +50,6 @@ if doxygen.found() > install_dir: htmldir, > build_by_default: false) > > - run_target('doc', command: 'true', depends: doxy_build) > -else > - run_target('doc', command: ['echo', 'doxygen', 'not', 'found']) > + doc_targets += doxy_build > + doc_target_names += 'Doxygen_API' > endif > diff --git a/doc/guides/meson.build b/doc/guides/meson.build > new file mode 100644 > index 00..24f5316c5c > --- /dev/null > +++ b/doc/guides/meson.build > @@ -0,0 +1,27 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2018 Intel Corporation > + > +sphinx = find_program('sphinx-build', required: get_option('enable_docs')) > + > +if sphinx.found() > + htmldir = join_paths('share', 'doc', 'dpdk') > + html_guides_build = custom_target('html_guides_build', > + input: meson.current_source_dir(), > + output: 'guides', > + command: [sphinx, '-b', 'html', '@INPUT@', > meson.current_build_dir() + '/guides', > + '-d', meson.current_build_dir() + '/.doctrees'], This on RHEL 7.5 doesn't work, since old versions of sphinx (I have v1.1.3) need all options before the paths. I tried to move "'-d', meson.current_build_dir() + '/.doctrees'" before "'@INPUT@'" and it worked. > + build_by_default: false, > + install: get_option('enable_docs'), > + install_dir: htmldir) > + > + doc_targets += html_guides_build > + doc_target_names += 'HTML_Guides' > + > + # sphinx leaves a .buildinfo in the target directory, which we don't > + # want to install. Note that sh -c has to be used, otherwise the > + # env var does not get expanded if calling rm/install directly. > + meson.add_install_script('sh', '-c', > + 'rm -f > $MESON_INSTALL_DESTDIR_PREFIX/share/doc/dpdk/guides/.buildinfo') > + meson.add_install_script('sh', '-c', > + 'install -D -m0644 $MESON_SOURCE_ROOT/doc/guides/custom.css > $MESON_INSTALL_DESTDIR_PREFIX/share/doc/dpdk/guides/_static/css/') This on RHEL 7.5 doesn't work. I had to append custom.css to DEST ($MESON_INSTALL_DESTDIR_PREFIX/share/doc/dpdk/guides/_static/css/custom.css) to make it work. > +endif > diff --git a/doc/meson.build b/doc/meson.build > index afca2e7133..c5410d85d6 100644 > --- a/doc/meson.build > +++ b/doc/meson.build > @@ -1,4 +1,15 @@ > # SPDX-License-Identifier: BSD-3-Clause > # Copyright(c) 2018 Luca Boccassi > > +doc_targets = [] > +doc_target_names = [] > subdir('api') > +subdir('guides') > + > +if doc_targets.length() == 0 > + message = 'No docs targets found' > +else > + message = 'Building docs:' > +endif > +run_target('doc', command: ['echo', message, doc_target_names], > + depends: doc_targets)
[dpdk-dev] [PATCH] test: restructure and cleanup ring PMD test
From: Chaitanya Babu Talluri Divided main test to smaller logical tests. Registered with UT framework. Added cleanup of the resources else ring creation fails during consecutive test runs. Freed the allocated mempool, rings and uninitalized the drivers. Signed-off-by: Chaitanya Babu Talluri --- test/test/test_pmd_ring.c | 312 -- 1 file changed, 188 insertions(+), 124 deletions(-) diff --git a/test/test/test_pmd_ring.c b/test/test/test_pmd_ring.c index 19d7d20a0..f7d46d834 100644 --- a/test/test/test_pmd_ring.c +++ b/test/test/test_pmd_ring.c @@ -2,20 +2,22 @@ * Copyright(c) 2010-2015 Intel Corporation */ #include "test.h" +#include #include #include #include - -static struct rte_mempool *mp; -static int tx_porta, rx_portb, rxtx_portc, rxtx_portd, rxtx_porte; +#include #define SOCKET0 0 #define RING_SIZE 256 #define NUM_RINGS 2 #define NB_MBUF 512 +static struct rte_mempool *mp; +struct rte_ring *rxtx[NUM_RINGS]; +static int tx_porta, rx_portb, rxtx_portc, rxtx_portd, rxtx_porte; static int test_ethdev_configure_port(int port) @@ -71,21 +73,21 @@ test_send_basic_packets(void) if (rte_eth_tx_burst(tx_porta, 0, pbufs, RING_SIZE/2) < RING_SIZE/2) { printf("Failed to transmit packet burst port %d\n", tx_porta); - return -1; + return TEST_FAILED; } if (rte_eth_rx_burst(rx_portb, 0, pbufs, RING_SIZE) != RING_SIZE/2) { printf("Failed to receive packet burst on port %d\n", rx_portb); - return -1; + return TEST_FAILED; } for (i = 0; i < RING_SIZE/2; i++) if (pbufs[i] != &bufs[i]) { printf("Error: received data does not match that transmitted\n"); - return -1; + return TEST_FAILED; } - return 0; + return TEST_SUCCESS; } static int @@ -212,7 +214,7 @@ test_stats_reset(int port) } static int -test_pmd_ring_pair_create_attach(int portd, int porte) +test_pmd_ring_pair_create_attach(void) { struct rte_eth_stats stats, stats2; struct rte_mbuf buf, *pbuf = &buf; @@ -220,185 +222,217 @@ test_pmd_ring_pair_create_attach(int portd, int porte) memset(&null_conf, 0, sizeof(struct rte_eth_conf)); - if ((rte_eth_dev_configure(portd, 1, 1, &null_conf) < 0) - || (rte_eth_dev_configure(porte, 1, 1, &null_conf) < 0)) { + if ((rte_eth_dev_configure(rxtx_portd, 1, 1, &null_conf) < 0) + || (rte_eth_dev_configure(rxtx_porte, 1, 1, &null_conf) < 0)) { printf("Configure failed for port\n"); - return -1; + return TEST_FAILED; } - if ((rte_eth_tx_queue_setup(portd, 0, RING_SIZE, SOCKET0, NULL) < 0) - || (rte_eth_tx_queue_setup(porte, 0, RING_SIZE, SOCKET0, NULL) < 0)) { + if ((rte_eth_tx_queue_setup(rxtx_portd, 0, RING_SIZE, + SOCKET0, NULL) < 0) + || (rte_eth_tx_queue_setup(rxtx_porte, 0, RING_SIZE, + SOCKET0, NULL) < 0)) { printf("TX queue setup failed\n"); - return -1; + return TEST_FAILED; } - if ((rte_eth_rx_queue_setup(portd, 0, RING_SIZE, SOCKET0, NULL, mp) < 0) - || (rte_eth_rx_queue_setup(porte, 0, RING_SIZE, SOCKET0, NULL, mp) < 0)) { + if ((rte_eth_rx_queue_setup(rxtx_portd, 0, RING_SIZE, + SOCKET0, NULL, mp) < 0) + || (rte_eth_rx_queue_setup(rxtx_porte, 0, RING_SIZE, + SOCKET0, NULL, mp) < 0)) { printf("RX queue setup failed\n"); - return -1; + return TEST_FAILED; } - if ((rte_eth_dev_start(portd) < 0) - || (rte_eth_dev_start(porte) < 0)) { + if ((rte_eth_dev_start(rxtx_portd) < 0) + || (rte_eth_dev_start(rxtx_porte) < 0)) { printf("Error starting port\n"); - return -1; + return TEST_FAILED; } - rte_eth_stats_reset(portd); + rte_eth_stats_reset(rxtx_portd); /* check stats of port, should all be zero */ - rte_eth_stats_get(portd, &stats); + rte_eth_stats_get(rxtx_portd, &stats); if (stats.ipackets != 0 || stats.opackets != 0 || stats.ibytes != 0 || stats.obytes != 0 || stats.ierrors != 0 || stats.oerrors != 0) { - printf("Error: port %d stats are not zero\n", portd); - return -1; + printf("Error: port %d stats are not zero\n", rxtx_portd); + return TEST_FAILED; } - rte_eth_stats_reset(porte); + rte_eth_stats_reset(rxtx_porte); /* check stats of port, should all be zero */ - rte_eth_stats_get(
Re: [dpdk-dev] [PATCH 1/4] app/testpmd: add queue deferred start switch
On 9/3/18 7:56 PM, Ferruh Yigit wrote: On 8/29/2018 8:16 AM, Andrew Rybchenko wrote: From: Ian Dolzhansky Signed-off-by: Ian Dolzhansky Signed-off-by: Andrew Rybchenko --- app/test-pmd/cmdline.c | 91 ++ doc/guides/rel_notes/release_18_11.rst | 6 ++ 2 files changed, 97 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 589121d69..f47ec99f1 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -883,6 +883,10 @@ static void cmd_help_long_parsed(void *parsed_result, "Start/stop a rx/tx queue of port X. Only take effect" " when port X is started\n\n" + "port (port_id) (rxq|txq) (queue_id) deferred_start (on|off)\n" + "Switch on/off a deferred start of port X rx/tx queue. Only" + " take effect when port X is stopped.\n\n" Overall looks good to me. But testpmd doc needs to be updated to add new command. doc/guides/testpmd_app_ug/testpmd_funcs.rst Will fix in v2 Also is there a way to see the current deferred_start status of a port, either with a specific command or as part of other command like "show port info"? If not does it make sense to have a way to see it? The following command does it: show rxq|txq info Thanks, Andrew.
[dpdk-dev] [PATCH v3] doc: support building HTML guides with meson
From: Bruce Richardson Signed-off-by: Bruce Richardson Signed-off-by: Luca Boccassi --- v2: send on behalf of Bruce on request. - tell sphinx to create .doctrees working files in the parent of the target directory so that they don't get installed - change the output directory so that it matches the legacy makefiles directory structure (on install) - add install_dir to fix ninja install - add post-install calls to delete .buildinfo sphinx temporary file and to install custom.css as the makefiles do The installed directory has been verified to be bit-by-bit identical to the legacy makefiles with diffoscope. v3: re-arranged sphinx parameters to make it work on older versions (RHEL 7), and specify DEST filename on the install call, to make it work on older version (RHEL 7) doc/api/meson.build| 5 ++--- doc/guides/meson.build | 28 doc/meson.build| 11 +++ 3 files changed, 41 insertions(+), 3 deletions(-) create mode 100644 doc/guides/meson.build diff --git a/doc/api/meson.build b/doc/api/meson.build index 13fcbb8cd7..30bdc573b5 100644 --- a/doc/api/meson.build +++ b/doc/api/meson.build @@ -50,7 +50,6 @@ if doxygen.found() install_dir: htmldir, build_by_default: false) - run_target('doc', command: 'true', depends: doxy_build) -else - run_target('doc', command: ['echo', 'doxygen', 'not', 'found']) + doc_targets += doxy_build + doc_target_names += 'Doxygen_API' endif diff --git a/doc/guides/meson.build b/doc/guides/meson.build new file mode 100644 index 00..06f14882bb --- /dev/null +++ b/doc/guides/meson.build @@ -0,0 +1,28 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Intel Corporation + +sphinx = find_program('sphinx-build', required: get_option('enable_docs')) + +if sphinx.found() + htmldir = join_paths('share', 'doc', 'dpdk') + html_guides_build = custom_target('html_guides_build', + input: meson.current_source_dir(), + output: 'guides', + command: [sphinx, '-b', 'html', + '-d', meson.current_build_dir() + '/.doctrees', + '@INPUT@', meson.current_build_dir() + '/guides'], + build_by_default: false, + install: get_option('enable_docs'), + install_dir: htmldir) + + doc_targets += html_guides_build + doc_target_names += 'HTML_Guides' + + # sphinx leaves a .buildinfo in the target directory, which we don't + # want to install. Note that sh -c has to be used, otherwise the + # env var does not get expanded if calling rm/install directly. + meson.add_install_script('sh', '-c', + 'rm -f $MESON_INSTALL_DESTDIR_PREFIX/share/doc/dpdk/guides/.buildinfo') + meson.add_install_script('sh', '-c', + 'install -D -m0644 $MESON_SOURCE_ROOT/doc/guides/custom.css $MESON_INSTALL_DESTDIR_PREFIX/share/doc/dpdk/guides/_static/css/custom.css') +endif diff --git a/doc/meson.build b/doc/meson.build index afca2e7133..c5410d85d6 100644 --- a/doc/meson.build +++ b/doc/meson.build @@ -1,4 +1,15 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2018 Luca Boccassi +doc_targets = [] +doc_target_names = [] subdir('api') +subdir('guides') + +if doc_targets.length() == 0 + message = 'No docs targets found' +else + message = 'Building docs:' +endif +run_target('doc', command: ['echo', message, doc_target_names], + depends: doc_targets) -- 2.18.0
Re: [dpdk-dev] [PATCH v3] doc: support building HTML guides with meson
On Thu, 20 Sep 2018 14:22:08 +0100 Luca Boccassi wrote: > From: Bruce Richardson > > Signed-off-by: Bruce Richardson > Signed-off-by: Luca Boccassi Tested on Fedora 28 (sphinx 1.7.5) and RHEL 7.5 (sphinx 1.1.3) Tested-by: Timothy Redaelli
[dpdk-dev] [PATCH v2 3/4] net/failsafe: add Rx queue start and stop functions
From: Ian Dolzhansky Support Rx queue deferred start. Signed-off-by: Ian Dolzhansky Signed-off-by: Andrew Rybchenko Acked-by: Gaetan Rivet --- doc/guides/nics/features/failsafe.ini | 1 + doc/guides/rel_notes/release_18_11.rst | 7 ++ drivers/net/failsafe/failsafe_ether.c | 44 drivers/net/failsafe/failsafe_ops.c| 96 -- 4 files changed, 143 insertions(+), 5 deletions(-) diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini index 39ee57965..712c0b7f7 100644 --- a/doc/guides/nics/features/failsafe.ini +++ b/doc/guides/nics/features/failsafe.ini @@ -7,6 +7,7 @@ Link status = Y Link status event= Y Rx interrupt = Y +Queue start/stop = P MTU update = Y Jumbo frame = Y Promiscuous mode = Y diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index e9f3a415c..d6af403b1 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -72,6 +72,13 @@ New Features choose the latest vector path that the platform supported. For example, users can use AVX2 vector path on BDW/HSW to get better performance. +* **Updated failsafe driver.** + + Updated the failsafe driver including the following changes: + + * Support for Rx queues start and stop. + * Support for Rx queues deferred start. + * **Added ability to switch queue deferred start flag on testpmd app.** Added a console command to testpmd app, giving ability to switch diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c index 5b5cb3b49..305deed63 100644 --- a/drivers/net/failsafe/failsafe_ether.c +++ b/drivers/net/failsafe/failsafe_ether.c @@ -366,6 +366,47 @@ failsafe_dev_remove(struct rte_eth_dev *dev) } } +static int +failsafe_eth_dev_rx_queues_sync(struct rte_eth_dev *dev) +{ + struct rxq *rxq; + int ret; + uint16_t i; + + for (i = 0; i < dev->data->nb_rx_queues; i++) { + rxq = dev->data->rx_queues[i]; + + if (rxq->info.conf.rx_deferred_start && + dev->data->rx_queue_state[i] == + RTE_ETH_QUEUE_STATE_STARTED) { + /* +* The subdevice Rx queue does not launch on device +* start if deferred start flag is set. It needs to be +* started manually in case an appropriate failsafe Rx +* queue has been started earlier. +*/ + ret = dev->dev_ops->rx_queue_start(dev, i); + if (ret) { + ERROR("Could not synchronize Rx queue %d", i); + return ret; + } + } else if (dev->data->rx_queue_state[i] == + RTE_ETH_QUEUE_STATE_STOPPED) { + /* +* The subdevice Rx queue needs to be stopped manually +* in case an appropriate failsafe Rx queue has been +* stopped earlier. +*/ + ret = dev->dev_ops->rx_queue_stop(dev, i); + if (ret) { + ERROR("Could not synchronize Rx queue %d", i); + return ret; + } + } + } + return 0; +} + int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev) { @@ -422,6 +463,9 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev) if (PRIV(dev)->state < DEV_STARTED) return 0; ret = dev->dev_ops->dev_start(dev); + if (ret) + goto err_remove; + ret = failsafe_eth_dev_rx_queues_sync(dev); if (ret) goto err_remove; return 0; diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c index 9608d09cb..402c5b62c 100644 --- a/drivers/net/failsafe/failsafe_ops.c +++ b/drivers/net/failsafe/failsafe_ops.c @@ -168,6 +168,20 @@ fs_dev_configure(struct rte_eth_dev *dev) return 0; } +static void +fs_set_queues_state_start(struct rte_eth_dev *dev) +{ + struct rxq *rxq; + uint16_t i; + + for (i = 0; i < dev->data->nb_rx_queues; i++) { + rxq = dev->data->rx_queues[i]; + if (!rxq->info.conf.rx_deferred_start) + dev->data->rx_queue_state[i] = + RTE_ETH_QUEUE_STATE_STARTED; + } +} + static int fs_dev_start(struct rte_eth_dev *dev) { @@ -202,13 +216,24 @@ fs_dev_start(struct rte_eth_dev *dev) } sdev->state = DEV_STARTED; } - if (PRIV(dev)->state < DEV_STARTED) + if (PRIV(dev)->state < DEV_STARTED)
[dpdk-dev] [PATCH v2 1/4] app/testpmd: add queue deferred start switch
From: Ian Dolzhansky Signed-off-by: Ian Dolzhansky Signed-off-by: Andrew Rybchenko Acked-by: Gaetan Rivet --- app/test-pmd/cmdline.c | 91 + doc/guides/rel_notes/release_18_11.rst | 8 ++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 ++ 3 files changed, 106 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 0cbd340c1..0c5399dc4 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -883,6 +883,10 @@ static void cmd_help_long_parsed(void *parsed_result, "Start/stop a rx/tx queue of port X. Only take effect" " when port X is started\n\n" + "port (port_id) (rxq|txq) (queue_id) deferred_start (on|off)\n" + "Switch on/off a deferred start of port X rx/tx queue. Only" + " take effect when port X is stopped.\n\n" + "port (port_id) (rxq|txq) (queue_id) setup\n" "Setup a rx/tx queue of port X.\n\n" @@ -2439,6 +2443,92 @@ cmdline_parse_inst_t cmd_config_rxtx_queue = { }, }; +/* *** configure port rxq/txq deferred start on/off *** */ +struct cmd_config_deferred_start_rxtx_queue { + cmdline_fixed_string_t port; + portid_t port_id; + cmdline_fixed_string_t rxtxq; + uint16_t qid; + cmdline_fixed_string_t opname; + cmdline_fixed_string_t state; +}; + +static void +cmd_config_deferred_start_rxtx_queue_parsed(void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_config_deferred_start_rxtx_queue *res = parsed_result; + struct rte_port *port; + uint8_t isrx; + uint8_t ison; + uint8_t needreconfig = 0; + + if (port_id_is_invalid(res->port_id, ENABLED_WARN)) + return; + + if (port_is_started(res->port_id) != 0) { + printf("Please stop port %u first\n", res->port_id); + return; + } + + port = &ports[res->port_id]; + + isrx = !strcmp(res->rxtxq, "rxq"); + + if (isrx && rx_queue_id_is_invalid(res->qid)) + return; + else if (!isrx && tx_queue_id_is_invalid(res->qid)) + return; + + ison = !strcmp(res->state, "on"); + + if (isrx && port->rx_conf[res->qid].rx_deferred_start != ison) { + port->rx_conf[res->qid].rx_deferred_start = ison; + needreconfig = 1; + } else if (!isrx && port->tx_conf[res->qid].tx_deferred_start != ison) { + port->tx_conf[res->qid].tx_deferred_start = ison; + needreconfig = 1; + } + + if (needreconfig) + cmd_reconfig_device_queue(res->port_id, 0, 1); +} + +cmdline_parse_token_string_t cmd_config_deferred_start_rxtx_queue_port = + TOKEN_STRING_INITIALIZER(struct cmd_config_deferred_start_rxtx_queue, + port, "port"); +cmdline_parse_token_num_t cmd_config_deferred_start_rxtx_queue_port_id = + TOKEN_NUM_INITIALIZER(struct cmd_config_deferred_start_rxtx_queue, + port_id, UINT16); +cmdline_parse_token_string_t cmd_config_deferred_start_rxtx_queue_rxtxq = + TOKEN_STRING_INITIALIZER(struct cmd_config_deferred_start_rxtx_queue, + rxtxq, "rxq#txq"); +cmdline_parse_token_num_t cmd_config_deferred_start_rxtx_queue_qid = + TOKEN_NUM_INITIALIZER(struct cmd_config_deferred_start_rxtx_queue, + qid, UINT16); +cmdline_parse_token_string_t cmd_config_deferred_start_rxtx_queue_opname = + TOKEN_STRING_INITIALIZER(struct cmd_config_deferred_start_rxtx_queue, + opname, "deferred_start"); +cmdline_parse_token_string_t cmd_config_deferred_start_rxtx_queue_state = + TOKEN_STRING_INITIALIZER(struct cmd_config_deferred_start_rxtx_queue, + state, "on#off"); + +cmdline_parse_inst_t cmd_config_deferred_start_rxtx_queue = { + .f = cmd_config_deferred_start_rxtx_queue_parsed, + .data = NULL, + .help_str = "port rxq|txq deferred_start on|off", + .tokens = { + (void *)&cmd_config_deferred_start_rxtx_queue_port, + (void *)&cmd_config_deferred_start_rxtx_queue_port_id, + (void *)&cmd_config_deferred_start_rxtx_queue_rxtxq, + (void *)&cmd_config_deferred_start_rxtx_queue_qid, + (void *)&cmd_config_deferred_start_rxtx_queue_opname, + (void *)&cmd_config_deferred_start_rxtx_queue_state, + NULL, + }, +}; + /* *** configure port rxq/txq setup *** */ struct cmd_setup_rxtx_queue { cmdline_fixed_string_t port; @@ -17709,6 +17799,7 @@ cmdline_p
[dpdk-dev] [PATCH v2 0/4] net/failsafe: support deferred queue start
Since the topic is raised in multicast address list patchset, I'd like to highlight it here as well. Current version uses failsafe ops directly on sync to synchronize queues state which iterates over all sub-devices. For already in sync sub-devices it does not go to driver since ethdev functions checks current state and do nothing if it is already OK. In theory it is possible to limit it to inactive devices and use ethdev API instead of direct ops, but it requires a bit more lines of code. v2: - fix ops ordering - update testpmd documentation - add Ga??tan's acks Ian Dolzhansky (4): app/testpmd: add queue deferred start switch net/failsafe: add checks for deferred queue setup net/failsafe: add Rx queue start and stop functions net/failsafe: add Tx queue start and stop functions app/test-pmd/cmdline.c | 91 +++ doc/guides/nics/features/failsafe.ini | 1 + doc/guides/rel_notes/release_18_11.rst | 15 ++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 + drivers/net/failsafe/failsafe_ether.c | 88 +++ drivers/net/failsafe/failsafe_ops.c | 167 +++- 6 files changed, 368 insertions(+), 1 deletion(-) -- 2.17.1
[dpdk-dev] [PATCH v2 4/4] net/failsafe: add Tx queue start and stop functions
From: Ian Dolzhansky Support Tx queue deferred start. Signed-off-by: Ian Dolzhansky Signed-off-by: Andrew Rybchenko Acked-by: Gaetan Rivet --- doc/guides/nics/features/failsafe.ini | 2 +- doc/guides/rel_notes/release_18_11.rst | 4 +- drivers/net/failsafe/failsafe_ether.c | 44 +++ drivers/net/failsafe/failsafe_ops.c| 77 -- 4 files changed, 120 insertions(+), 7 deletions(-) diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini index 712c0b7f7..74eae4a62 100644 --- a/doc/guides/nics/features/failsafe.ini +++ b/doc/guides/nics/features/failsafe.ini @@ -7,7 +7,7 @@ Link status = Y Link status event= Y Rx interrupt = Y -Queue start/stop = P +Queue start/stop = Y MTU update = Y Jumbo frame = Y Promiscuous mode = Y diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index d6af403b1..d8faf9ed7 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -76,8 +76,8 @@ New Features Updated the failsafe driver including the following changes: - * Support for Rx queues start and stop. - * Support for Rx queues deferred start. + * Support for Rx and Tx queues start and stop. + * Support for Rx and Tx queues deferred start. * **Added ability to switch queue deferred start flag on testpmd app.** diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c index 305deed63..191f95f14 100644 --- a/drivers/net/failsafe/failsafe_ether.c +++ b/drivers/net/failsafe/failsafe_ether.c @@ -407,6 +407,47 @@ failsafe_eth_dev_rx_queues_sync(struct rte_eth_dev *dev) return 0; } +static int +failsafe_eth_dev_tx_queues_sync(struct rte_eth_dev *dev) +{ + struct txq *txq; + int ret; + uint16_t i; + + for (i = 0; i < dev->data->nb_tx_queues; i++) { + txq = dev->data->tx_queues[i]; + + if (txq->info.conf.tx_deferred_start && + dev->data->tx_queue_state[i] == + RTE_ETH_QUEUE_STATE_STARTED) { + /* +* The subdevice Tx queue does not launch on device +* start if deferred start flag is set. It needs to be +* started manually in case an appropriate failsafe Tx +* queue has been started earlier. +*/ + ret = dev->dev_ops->tx_queue_start(dev, i); + if (ret) { + ERROR("Could not synchronize Tx queue %d", i); + return ret; + } + } else if (dev->data->tx_queue_state[i] == + RTE_ETH_QUEUE_STATE_STOPPED) { + /* +* The subdevice Tx queue needs to be stopped manually +* in case an appropriate failsafe Tx queue has been +* stopped earlier. +*/ + ret = dev->dev_ops->tx_queue_stop(dev, i); + if (ret) { + ERROR("Could not synchronize Tx queue %d", i); + return ret; + } + } + } + return 0; +} + int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev) { @@ -466,6 +507,9 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev) if (ret) goto err_remove; ret = failsafe_eth_dev_rx_queues_sync(dev); + if (ret) + goto err_remove; + ret = failsafe_eth_dev_tx_queues_sync(dev); if (ret) goto err_remove; return 0; diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c index 402c5b62c..9237e6331 100644 --- a/drivers/net/failsafe/failsafe_ops.c +++ b/drivers/net/failsafe/failsafe_ops.c @@ -172,6 +172,7 @@ static void fs_set_queues_state_start(struct rte_eth_dev *dev) { struct rxq *rxq; + struct txq *txq; uint16_t i; for (i = 0; i < dev->data->nb_rx_queues; i++) { @@ -180,6 +181,12 @@ fs_set_queues_state_start(struct rte_eth_dev *dev) dev->data->rx_queue_state[i] = RTE_ETH_QUEUE_STATE_STARTED; } + for (i = 0; i < dev->data->nb_tx_queues; i++) { + txq = dev->data->tx_queues[i]; + if (!txq->info.conf.tx_deferred_start) + dev->data->tx_queue_state[i] = + RTE_ETH_QUEUE_STATE_STARTED; + } } static int @@ -232,6 +239,8 @@ fs_set_queues_state_stop(struct rte_eth_dev *dev) for (i = 0; i < dev->data->nb_rx_queues; i++) dev->data-
[dpdk-dev] [PATCH v2 2/4] net/failsafe: add checks for deferred queue setup
From: Ian Dolzhansky Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD") Cc: sta...@dpdk.org Signed-off-by: Ian Dolzhansky Signed-off-by: Andrew Rybchenko Acked-by: Gaetan Rivet --- drivers/net/failsafe/failsafe_ops.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c index d989a17bf..9608d09cb 100644 --- a/drivers/net/failsafe/failsafe_ops.c +++ b/drivers/net/failsafe/failsafe_ops.c @@ -338,6 +338,11 @@ fs_rx_queue_setup(struct rte_eth_dev *dev, uint8_t i; int ret; + if (rx_conf->rx_deferred_start) { + ERROR("Rx queue deferred start is not supported"); + return -EINVAL; + } + fs_lock(dev, 0); rxq = dev->data->rx_queues[rx_queue_id]; if (rxq != NULL) { @@ -495,6 +500,11 @@ fs_tx_queue_setup(struct rte_eth_dev *dev, uint8_t i; int ret; + if (tx_conf->tx_deferred_start) { + ERROR("Tx queue deferred start is not supported"); + return -EINVAL; + } + fs_lock(dev, 0); txq = dev->data->tx_queues[tx_queue_id]; if (txq != NULL) { -- 2.17.1
Re: [dpdk-dev] [PATCH] test: restructure and cleanup ring PMD test
Adding Bruce in CC Thanks M.P.Jananee >-Original Message- >From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jananee >Parthasarathy >Sent: Thursday, September 20, 2018 6:54 PM >To: dev@dpdk.org >Cc: Pattan, Reshma ; Chaitanya Babu, TalluriX > >Subject: [dpdk-dev] [PATCH] test: restructure and cleanup ring PMD test > >From: Chaitanya Babu Talluri > >Divided main test to smaller logical tests. >Registered with UT framework. >Added cleanup of the resources else ring creation fails during consecutive test >runs. >Freed the allocated mempool, rings and uninitalized the drivers. > >Signed-off-by: Chaitanya Babu Talluri >--- > test/test/test_pmd_ring.c | 312 --- >--- > 1 file changed, 188 insertions(+), 124 deletions(-) > >diff --git a/test/test/test_pmd_ring.c b/test/test/test_pmd_ring.c index >19d7d20a0..f7d46d834 100644 >--- a/test/test/test_pmd_ring.c >+++ b/test/test/test_pmd_ring.c >@@ -2,20 +2,22 @@ > * Copyright(c) 2010-2015 Intel Corporation > */ > #include "test.h" >+#include > > #include > > #include > #include >- >-static struct rte_mempool *mp; >-static int tx_porta, rx_portb, rxtx_portc, rxtx_portd, rxtx_porte; >+#include > > #define SOCKET0 0 > #define RING_SIZE 256 > #define NUM_RINGS 2 > #define NB_MBUF 512 > >+static struct rte_mempool *mp; >+struct rte_ring *rxtx[NUM_RINGS]; >+static int tx_porta, rx_portb, rxtx_portc, rxtx_portd, rxtx_porte; > > static int > test_ethdev_configure_port(int port) >@@ -71,21 +73,21 @@ test_send_basic_packets(void) > > if (rte_eth_tx_burst(tx_porta, 0, pbufs, RING_SIZE/2) < RING_SIZE/2) { > printf("Failed to transmit packet burst port %d\n", tx_porta); >- return -1; >+ return TEST_FAILED; > } > > if (rte_eth_rx_burst(rx_portb, 0, pbufs, RING_SIZE) != RING_SIZE/2) { > printf("Failed to receive packet burst on port %d\n", >rx_portb); >- return -1; >+ return TEST_FAILED; > } > > for (i = 0; i < RING_SIZE/2; i++) > if (pbufs[i] != &bufs[i]) { > printf("Error: received data does not match that >transmitted\n"); >- return -1; >+ return TEST_FAILED; > } > >- return 0; >+ return TEST_SUCCESS; > } > > static int >@@ -212,7 +214,7 @@ test_stats_reset(int port) } > > static int >-test_pmd_ring_pair_create_attach(int portd, int porte) >+test_pmd_ring_pair_create_attach(void) > { > struct rte_eth_stats stats, stats2; > struct rte_mbuf buf, *pbuf = &buf; >@@ -220,185 +222,217 @@ test_pmd_ring_pair_create_attach(int portd, int >porte) > > memset(&null_conf, 0, sizeof(struct rte_eth_conf)); > >- if ((rte_eth_dev_configure(portd, 1, 1, &null_conf) < 0) >- || (rte_eth_dev_configure(porte, 1, 1, &null_conf) < 0)) { >+ if ((rte_eth_dev_configure(rxtx_portd, 1, 1, &null_conf) < 0) >+ || (rte_eth_dev_configure(rxtx_porte, 1, 1, &null_conf) < 0)) { > printf("Configure failed for port\n"); >- return -1; >+ return TEST_FAILED; > } > >- if ((rte_eth_tx_queue_setup(portd, 0, RING_SIZE, SOCKET0, NULL) < 0) >- || (rte_eth_tx_queue_setup(porte, 0, RING_SIZE, SOCKET0, >NULL) < 0)) { >+ if ((rte_eth_tx_queue_setup(rxtx_portd, 0, RING_SIZE, >+ SOCKET0, NULL) < 0) >+ || (rte_eth_tx_queue_setup(rxtx_porte, 0, RING_SIZE, >+ SOCKET0, NULL) < 0)) { > printf("TX queue setup failed\n"); >- return -1; >+ return TEST_FAILED; > } > >- if ((rte_eth_rx_queue_setup(portd, 0, RING_SIZE, SOCKET0, NULL, mp) >< 0) >- || (rte_eth_rx_queue_setup(porte, 0, RING_SIZE, SOCKET0, >NULL, mp) < 0)) { >+ if ((rte_eth_rx_queue_setup(rxtx_portd, 0, RING_SIZE, >+ SOCKET0, NULL, mp) < 0) >+ || (rte_eth_rx_queue_setup(rxtx_porte, 0, RING_SIZE, >+ SOCKET0, NULL, mp) < 0)) { > printf("RX queue setup failed\n"); >- return -1; >+ return TEST_FAILED; > } > >- if ((rte_eth_dev_start(portd) < 0) >- || (rte_eth_dev_start(porte) < 0)) { >+ if ((rte_eth_dev_start(rxtx_portd) < 0) >+ || (rte_eth_dev_start(rxtx_porte) < 0)) { > printf("Error starting port\n"); >- return -1; >+ return TEST_FAILED; > } > >- rte_eth_stats_reset(portd); >+ rte_eth_stats_reset(rxtx_portd); > /* check stats of port, should all be zero */ >- rte_eth_stats_get(portd, &stats); >+ rte_eth_stats_get(rxtx_portd, &stats); > if (stats.ipackets != 0 || stats.opackets != 0 || > stats.ibytes != 0 || stats.obytes != 0 || > stats.ierrors != 0 || stats.oerrors !
Re: [dpdk-dev] [RFC] ipsec: new library for IPsec data-path processing
Hi Konstantin, On 9/18/2018 6:12 PM, Ananyev, Konstantin wrote: I am not saying this should be the ONLY way to do as it does not work very well with non NPU/FPGA class of SoC. So how about making the proposed IPSec library as plugin/driver to rte_security. As I mentioned above, I don't think that pushing whole IPSec data-path into rte_security is the best possible approach. Though I probably understand your concern: In RFC code we always do whole prepare/process in SW (attach/remove ESP headers/trailers, so paddings etc.), i.e. right now only device types: RTE_SECURITY_ACTION_TYPE_NONE and RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO are covered. Though there are devices where most of prepare/process can be done in HW (RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL/RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL), plus in future could be devices where prepare/process would be split between HW/SW in a custom way. Is that so? To address that issue I suppose we can do: 1. Add support for RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL security devices into ipsec. We planned to do it anyway, just don't have it done yet. 2. For custom case - introduce RTE_SECURITY_ACTION_TYPE_INLINE_CUSTOM and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_CUSTOM and add into rte_security_ops new functions: uint16_t lookaside_prepare(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num); uint16_t lookaside_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num); uint16_t inline_process(struct rte_security_session *sess, struct rte_mbuf *mb[], struct struct rte_crypto_op *cop[], uint16_t num); So for custom HW, PMD can overwrite normal prepare/process behavior. Actually after another thought: My previous assumption (probably wrong one) was that for both RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL and RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL devices can do whole data-path ipsec processing totally in HW - no need for any SW support (except init/config). Now looking at dpaa and dpaa2 devices (the only ones that supports RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL right now) I am not so sure about that - looks like some SW help might be needed for replay window updates, etc. Hemant, Shreyansh - can you guys confirm what is expected from RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL devices (HW/SW roses/responsibilities)? About RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL - I didn't find any driver inside DPDK source tree that does support that capability. So my question is there any devices/drivers that do support it? If so, where could source code could be found, and what are HW/SW roles/responsibilities for that type of devices? Konstantin In case of LOOKASIDE, the protocol errors like antireplay and sequence number overflow shall be the responsibility of either PMD or the HW. It should notify the application that the error has occurred and application need to decide what it needs to decide next. As Jerin said in other email, the roles/responsibility of the PMD in case of inline proto and lookaside case, nothing much is required from the application to do any processing for ipsec. As per my understanding, the proposed RFC is to make the application code cleaner for the protocol processing. 1. For inline proto and lookaside there won't be any change in the data path. The main changes would be in the control path. 2. But in case of inline crypto and RTE_SECURITY_ACTION_TYPE_NONE, the protocol processing will be done in the library and there would be changes in both control and data path. As the rte_security currently provide generic APIs for control path only and we may have it expanded for protocol specific datapath processing. So for the application, working with inline crypto/ inline proto would be quite similar and it won't need to do some extra processing for inline crypto. Same will be the case for RTE_SECURITY_ACTION_TYPE_NONE and lookaside. We may have the protocol specific APIs reside inside the rte_security and we can use either the crypto/net PMD underneath it. Moving the SPD lookup inside the ipsec library may not be beneficial in terms of performance as well as configurability for the application. It would just be based on the rss hash. Please let me know if my understanding is not correct anywhere. -Akhil
Re: [dpdk-dev] [PATCH] net: fix Intel prepare function for IP checksum offload
> > Current Intel tx prepare function does not properly handle the > case where only IP checksum is requested, without requesting > any L4 checksum or TSO: IP checksum is not properly reset to 0 > and output packet may contain invalid IP checksum. > > Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation") > Cc: sta...@dpdk.org > > Signed-off-by: Didier Pallard > --- > lib/librte_net/rte_net.h | 20 > 1 file changed, 8 insertions(+), 12 deletions(-) > > diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h > index b6ab6e1d57b2..e59760a0a108 100644 > --- a/lib/librte_net/rte_net.h > +++ b/lib/librte_net/rte_net.h > @@ -122,14 +122,16 @@ rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, > uint64_t ol_flags) > (ol_flags & PKT_TX_OUTER_IPV6)) > inner_l3_offset += m->outer_l2_len + m->outer_l3_len; > > - if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) { > - if (ol_flags & PKT_TX_IPV4) { > - ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, > - inner_l3_offset); > + if (ol_flags & PKT_TX_IPV4) { > + ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, > + inner_l3_offset); > > - if (ol_flags & PKT_TX_IP_CKSUM) > - ipv4_hdr->hdr_checksum = 0; > + if (ol_flags & PKT_TX_IP_CKSUM) > + ipv4_hdr->hdr_checksum = 0; > + } > > + if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) { > + if (ol_flags & PKT_TX_IPV4) { > udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr + > m->l3_len); > udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr, > @@ -146,12 +148,6 @@ rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, > uint64_t ol_flags) > } else if ((ol_flags & PKT_TX_TCP_CKSUM) || > (ol_flags & PKT_TX_TCP_SEG)) { > if (ol_flags & PKT_TX_IPV4) { > - ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, > - inner_l3_offset); > - > - if (ol_flags & PKT_TX_IP_CKSUM) > - ipv4_hdr->hdr_checksum = 0; > - > /* non-TSO tcp or TSO */ > tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + > m->l3_len); > -- Acked-by: Konstantin Ananyev > 2.11.0
Re: [dpdk-dev] [PATCH v2 2/3] kni: fix kni fifo synchronization
> -Original Message- > > Date: Wed, 19 Sep 2018 21:42:39 +0800 > > From: Phil Yang > > To: dev@dpdk.org > > CC: n...@arm.com, jerin.ja...@caviumnetworks.com, > > kkokkilaga...@caviumnetworks.com, honnappa.nagaraha...@arm.com, > > gavin...@arm.com > > Subject: [PATCH v2 2/3] kni: fix kni fifo synchronization > > X-Mailer: git-send-email 2.7.4 > > > > + Ferruh Yigit > > > > > With existing code in kni_fifo_put, rx_q values are not being updated > > before updating fifo_write. While reading rx_q in kni_net_rx_normal, > > This is causing the sync issue on other core. The same situation > > happens in kni_fifo_get as well. > > > > So syncing the values by adding C11 atomic memory barriers to make > > sure the values being synced before updating fifo_write and fifo_read. > > > > Fixes: 3fc5ca2 ("kni: initial import") > > Signed-off-by: Phil Yang > > Reviewed-by: Honnappa Nagarahalli > > Reviewed-by: Gavin Hu > > --- > > .../linuxapp/eal/include/exec-env/rte_kni_common.h | 5 > > lib/librte_kni/rte_kni_fifo.h | 30 > > +- > > 2 files changed, 34 insertions(+), 1 deletion(-) > > > > diff --git > > a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > > b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > > index cfa9448..1fd713b 100644 > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > > @@ -54,8 +54,13 @@ struct rte_kni_request { > > * Writing should never overwrite the read position > > */ > > struct rte_kni_fifo { > > +#ifndef RTE_USE_C11_MEM_MODEL > > volatile unsigned write; /**< Next position to be written*/ > > volatile unsigned read; /**< Next position to be read */ > > +#else > > + unsigned write; /**< Next position to be written*/ > > + unsigned read; /**< Next position to be read */ > > +#endif > > unsigned len;/**< Circular buffer length */ > > unsigned elem_size; /**< Pointer size - for 32/64 bit OS */ > > void *volatile buffer[]; /**< The buffer contains mbuf pointers > > */ > > diff --git a/lib/librte_kni/rte_kni_fifo.h > > b/lib/librte_kni/rte_kni_fifo.h index ac26a8c..f4171a1 100644 > > --- a/lib/librte_kni/rte_kni_fifo.h > > +++ b/lib/librte_kni/rte_kni_fifo.h > > @@ -28,8 +28,13 @@ kni_fifo_put(struct rte_kni_fifo *fifo, void > > **data, unsigned num) { > > unsigned i = 0; > > unsigned fifo_write = fifo->write; > > - unsigned fifo_read = fifo->read; > > unsigned new_write = fifo_write; > > +#ifdef RTE_USE_C11_MEM_MODEL > > + unsigned fifo_read = __atomic_load_n(&fifo->read, > > +__ATOMIC_ACQUIRE); > > +#else > > + unsigned fifo_read = fifo->read; #endif > > Correct. My apologies, did not follow your comment here. Do you want us to correct anything here? '#endif' is not appearing on the correct line in the email, but it shows up fine on the patch work. > > > > > > for (i = 0; i < num; i++) { > > new_write = (new_write + 1) & (fifo->len - 1); @@ > > -39,7 +44,12 @@ kni_fifo_put(struct rte_kni_fifo *fifo, void **data, > unsigned num) > > fifo->buffer[fifo_write] = data[i]; > > fifo_write = new_write; > > } > > +#ifdef RTE_USE_C11_MEM_MODEL > > + __atomic_store_n(&fifo->write, fifo_write, __ATOMIC_RELEASE); > > +#else > > + rte_smp_wmb(); > > fifo->write = fifo_write; > > +#endif > > Correct. > > return i; > > } > > > > @@ -51,7 +61,12 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void > > **data, unsigned num) { > > unsigned i = 0; > > unsigned new_read = fifo->read; > > +#ifdef RTE_USE_C11_MEM_MODEL > > + unsigned fifo_write = __atomic_load_n(&fifo->write, > > +__ATOMIC_ACQUIRE); #else > > unsigned fifo_write = fifo->write; > > +#endif > > Correct. > > > + > > for (i = 0; i < num; i++) { > > if (new_read == fifo_write) > > break; > > @@ -59,7 +74,12 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void **data, > unsigned num) > > data[i] = fifo->buffer[new_read]; > > new_read = (new_read + 1) & (fifo->len - 1); > > } > > +#ifdef RTE_USE_C11_MEM_MODEL > > + __atomic_store_n(&fifo->read, new_read, __ATOMIC_RELEASE); > > +#else > > + rte_smp_wmb(); > > fifo->read = new_read; > > +#endif > > Correct. > > > return i; > > } > > > > @@ -69,5 +89,13 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void > > **data, unsigned num) static inline uint32_t kni_fifo_count(struct > > rte_kni_fifo *fifo) { > > +#ifdef RTE_USE_C11_MEM_MODEL > > + unsigned fifo_write = __atomic_load_n(&fifo->write, > > + __ATOMIC_ACQUIRE); > > + u
[dpdk-dev] [PATCH v2 7/7] eal: improve musl compatibility
Musl complains about pthread id being of wrong size, because on musl, pthread_t is a struct pointer, not an unsinged int. Fix the printing code by casting pthread id to unsigned pointer type and adjusting the format specifier to be of appropriate size. Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal.c| 4 ++-- lib/librte_eal/linuxapp/eal/eal_thread.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index e59ac6577..1d6a9ac44 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -979,8 +979,8 @@ rte_eal_init(int argc, char **argv) ret = eal_thread_dump_affinity(cpuset, sizeof(cpuset)); - RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%x;cpuset=[%s%s])\n", - rte_config.master_lcore, (int)thread_id, cpuset, + RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%zx;cpuset=[%s%s])\n", + rte_config.master_lcore, (uintptr_t)thread_id, cpuset, ret == 0 ? "" : "..."); RTE_LCORE_FOREACH_SLAVE(i) { diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c b/lib/librte_eal/linuxapp/eal/eal_thread.c index b496fc711..379773b68 100644 --- a/lib/librte_eal/linuxapp/eal/eal_thread.c +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c @@ -121,8 +121,8 @@ eal_thread_loop(__attribute__((unused)) void *arg) ret = eal_thread_dump_affinity(cpuset, sizeof(cpuset)); - RTE_LOG(DEBUG, EAL, "lcore %u is ready (tid=%x;cpuset=[%s%s])\n", - lcore_id, (int)thread_id, cpuset, ret == 0 ? "" : "..."); + RTE_LOG(DEBUG, EAL, "lcore %u is ready (tid=%zx;cpuset=[%s%s])\n", + lcore_id, (uintptr_t)thread_id, cpuset, ret == 0 ? "" : "..."); /* read on our pipe to get commands */ while (1) { -- 2.17.1
[dpdk-dev] [PATCH v2 6/7] string_fns: improve musl compatibility
Musl wraps various string functions such as strlcpy in order to harden them. However, the fortify wrappers are included without including the actual string functions being wrapped, which throws missing definition compile errors. Fix by including string.h in string functions header. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_string_fns.h | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/librte_eal/common/include/rte_string_fns.h b/lib/librte_eal/common/include/rte_string_fns.h index ecd141b85..295844ad2 100644 --- a/lib/librte_eal/common/include/rte_string_fns.h +++ b/lib/librte_eal/common/include/rte_string_fns.h @@ -16,6 +16,7 @@ extern "C" { #endif #include +#include /** * Takes string "string" parameter and splits it at character "delim" -- 2.17.1
[dpdk-dev] [PATCH v2 5/7] mem: improve musl compatibility
When built against musl, fcntl.h doesn't silently get included. Fix by including it explicitly. Bugzilla ID: 31 Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_memory.c | 1 + lib/librte_eal/linuxapp/eal/eal_memory.c | 1 + 2 files changed, 2 insertions(+) diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 0b69804ff..9b5eacc57 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -2,6 +2,7 @@ * Copyright(c) 2010-2014 Intel Corporation */ +#include #include #include #include diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index e3ac24815..256cab526 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -5,6 +5,7 @@ #define _FILE_OFFSET_BITS 64 #include +#include #include #include #include -- 2.17.1
[dpdk-dev] [PATCH v2 3/7] fbarray: improve musl compatibility
When built against musl, fcntl.h doesn't silently get included. Fix by including it explicitly. Bugzilla ID: 34 Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_fbarray.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c index ba6c4ae39..ea0735cb8 100644 --- a/lib/librte_eal/common/eal_common_fbarray.c +++ b/lib/librte_eal/common/eal_common_fbarray.c @@ -2,6 +2,7 @@ * Copyright(c) 2017-2018 Intel Corporation */ +#include #include #include #include -- 2.17.1
[dpdk-dev] [PATCH v2 1/7] linuxapp: build with _GNU_SOURCE defined by default
We use _GNU_SOURCE all over the place, but often times we miss defining it, resulting in broken builds on musl. Rather than fixing every library's and driver's and application's makefile, fix it by simply defining _GNU_SOURCE by default for all Linuxapp builds. Signed-off-by: Anatoly Burakov --- app/meson.build | 9 - drivers/bus/pci/linux/Makefile | 2 -- drivers/meson.build | 6 ++ drivers/net/softnic/conn.c | 1 - examples/meson.build | 6 ++ lib/librte_eal/linuxapp/eal/Makefile | 16 lib/meson.build | 6 ++ mk/exec-env/linuxapp/rte.vars.mk | 2 ++ test/test/meson.build| 5 + 9 files changed, 33 insertions(+), 20 deletions(-) diff --git a/app/meson.build b/app/meson.build index 99e0b93ec..c9a52a22b 100644 --- a/app/meson.build +++ b/app/meson.build @@ -11,13 +11,20 @@ apps = ['pdump', # for BSD only lib_execinfo = cc.find_library('execinfo', required: false) +default_cflags = machine_args + +# on Linux, specify -D_GNU_SOURCE unconditionally +if host_machine.system() == 'linux' + default_cflags += '-D_GNU_SOURCE' +endif + foreach app:apps build = true name = app allow_experimental_apis = false sources = [] includes = [] - cflags = machine_args + cflags = default_cflags objs = [] # other object files to link against, used e.g. for # instruction-set optimized versions of code diff --git a/drivers/bus/pci/linux/Makefile b/drivers/bus/pci/linux/Makefile index 96ea1d540..90404468b 100644 --- a/drivers/bus/pci/linux/Makefile +++ b/drivers/bus/pci/linux/Makefile @@ -4,5 +4,3 @@ SRCS += pci.c SRCS += pci_uio.c SRCS += pci_vfio.c - -CFLAGS += -D_GNU_SOURCE diff --git a/drivers/meson.build b/drivers/meson.build index 47b4215a3..74fec716d 100644 --- a/drivers/meson.build +++ b/drivers/meson.build @@ -16,6 +16,12 @@ default_cflags = machine_args if cc.has_argument('-Wno-format-truncation') default_cflags += '-Wno-format-truncation' endif + +# on Linux, specify -D_GNU_SOURCE unconditionally +if host_machine.system() == 'linux' + default_cflags += '-D_GNU_SOURCE' +endif + foreach class:driver_classes drivers = [] std_deps = [] diff --git a/drivers/net/softnic/conn.c b/drivers/net/softnic/conn.c index 990cf40fc..8b6658088 100644 --- a/drivers/net/softnic/conn.c +++ b/drivers/net/softnic/conn.c @@ -8,7 +8,6 @@ #include #include -#define __USE_GNU #include #include diff --git a/examples/meson.build b/examples/meson.build index 4ee7a1114..70c22eb62 100644 --- a/examples/meson.build +++ b/examples/meson.build @@ -22,6 +22,12 @@ default_cflags = machine_args if cc.has_argument('-Wno-format-truncation') default_cflags += '-Wno-format-truncation' endif + +# on Linux, specify -D_GNU_SOURCE unconditionally +if host_machine.system() == 'linux' + default_cflags += '-D_GNU_SOURCE' +endif + foreach example: examples name = example build = true diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index fd92c75c2..bfee453bc 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -85,22 +85,6 @@ SRCS-y += rte_cycles.c CFLAGS_eal_common_cpuflags.o := $(CPUFLAGS_LIST) -CFLAGS_eal.o := -D_GNU_SOURCE -CFLAGS_eal_interrupts.o := -D_GNU_SOURCE -CFLAGS_eal_vfio_mp_sync.o := -D_GNU_SOURCE -CFLAGS_eal_timer.o := -D_GNU_SOURCE -CFLAGS_eal_lcore.o := -D_GNU_SOURCE -CFLAGS_eal_memalloc.o := -D_GNU_SOURCE -CFLAGS_eal_thread.o := -D_GNU_SOURCE -CFLAGS_eal_log.o := -D_GNU_SOURCE -CFLAGS_eal_common_log.o := -D_GNU_SOURCE -CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE -CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE -CFLAGS_eal_common_options.o := -D_GNU_SOURCE -CFLAGS_eal_common_thread.o := -D_GNU_SOURCE -CFLAGS_eal_common_lcore.o := -D_GNU_SOURCE -CFLAGS_rte_cycles.o := -D_GNU_SOURCE - # workaround for a gcc bug with noreturn attribute # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) diff --git a/lib/meson.build b/lib/meson.build index 3acc67e6e..2c7ea436a 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -32,6 +32,12 @@ if cc.has_argument('-Wno-format-truncation') endif enabled_libs = [] # used to print summary at the end + +# on Linux, specify -D_GNU_SOURCE unconditionally +if host_machine.system() == 'linux' + default_cflags += '-D_GNU_SOURCE' +endif + foreach l:libraries build = true name = l diff --git a/mk/exec-env/linuxapp/rte.vars.mk b/mk/exec-env/linuxapp/rte.vars.mk index 3129edc8c..91b778fcc 100644 --- a/mk/exec-env/linuxapp/rte.vars.mk +++ b/mk/exec-env/linuxapp/rte.vars.mk @@ -17,6 +17,8 @@ else EXECENV_CFLAGS = -pthread endif +EXECENV_CFLAGS += -D_GNU_SOURCE + EXECENV_LDLIBS = EXECENV_ASFLAGS = diff --git a/test/test/meson.build b/test/test/meson
[dpdk-dev] [PATCH v2 2/7] pci/vfio: improve musl compatibility
Musl already has PAGE_SIZE defined, and our define clashed with it. Rename our define to SYS_PAGE_SIZE. Bugzilla ID: 36 Signed-off-by: Anatoly Burakov --- drivers/bus/pci/linux/pci_vfio.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c index 686386d6a..85f6c9803 100644 --- a/drivers/bus/pci/linux/pci_vfio.c +++ b/drivers/bus/pci/linux/pci_vfio.c @@ -35,7 +35,9 @@ #ifdef VFIO_PRESENT +#ifndef PAGE_SIZE #define PAGE_SIZE (sysconf(_SC_PAGESIZE)) +#endif #define PAGE_MASK (~(PAGE_SIZE - 1)) static struct rte_tailq_elem rte_vfio_tailq = { -- 2.17.1
[dpdk-dev] [PATCH v2 0/7] Improve core EAL musl compatibility
This patchset fixes numerous issues with musl compatibility in the core EAL libraries. It does not fix anything beyond core EAL (so, PCI driver is still broken, so are a few other drivers), but it's a good start. Tested on container with Alpine Linux. Alpine dependencies: build-base bsd-compat-headers libexecinfo-dev linux-headers numactl-dev For numactl-dev, testing repository needs to be enabled: echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing"; >> /etc/apk/repositories If successful (using a very broad definition of "success"), the build should fail somewhere in PCI bus driver in UIO. v2 -> v1: - Fixed patch 2 to use existing define if available - Fixed patch 7 to use proper format specifier and cast pthread ID to unsigned pointer type Anatoly Burakov (7): linuxapp: build with _GNU_SOURCE defined by default pci/vfio: improve musl compatibility fbarray: improve musl compatibility eal/hugepage_info: improve musl compatibility mem: improve musl compatibility string_fns: improve musl compatibility eal: improve musl compatibility app/meson.build | 9 - drivers/bus/pci/linux/Makefile | 2 -- drivers/bus/pci/linux/pci_vfio.c| 2 ++ drivers/meson.build | 6 ++ drivers/net/softnic/conn.c | 1 - examples/meson.build| 6 ++ lib/librte_eal/common/eal_common_fbarray.c | 1 + lib/librte_eal/common/eal_common_memory.c | 1 + lib/librte_eal/common/include/rte_string_fns.h | 1 + lib/librte_eal/linuxapp/eal/Makefile| 16 lib/librte_eal/linuxapp/eal/eal.c | 4 ++-- lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 1 + lib/librte_eal/linuxapp/eal/eal_memory.c| 1 + lib/librte_eal/linuxapp/eal/eal_thread.c| 4 ++-- lib/meson.build | 6 ++ mk/exec-env/linuxapp/rte.vars.mk| 2 ++ test/test/meson.build | 5 + 17 files changed, 44 insertions(+), 24 deletions(-) -- 2.17.1
[dpdk-dev] [PATCH v2 4/7] eal/hugepage_info: improve musl compatibility
When built against musl, fcntl.h doesn't silently get included. Fix by including it explicitly. Bugzilla ID: 33 Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 3a7d4b222..0eab1cf71 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include -- 2.17.1
Re: [dpdk-dev] [PATCH v2 2/3] kni: fix kni fifo synchronization
-Original Message- > Date: Thu, 20 Sep 2018 15:20:53 + > From: Honnappa Nagarahalli > To: Jerin Jacob , "Phil Yang (Arm > Technology China)" > CC: "dev@dpdk.org" , nd , > "kkokkilaga...@caviumnetworks.com" , > "Gavin Hu (Arm Technology China)" , > "ferruh.yi...@intel.com" > Subject: RE: [PATCH v2 2/3] kni: fix kni fifo synchronization > > > > -Original Message- > > > Date: Wed, 19 Sep 2018 21:42:39 +0800 > > > From: Phil Yang > > > To: dev@dpdk.org > > > CC: n...@arm.com, jerin.ja...@caviumnetworks.com, > > > kkokkilaga...@caviumnetworks.com, honnappa.nagaraha...@arm.com, > > > gavin...@arm.com > > > Subject: [PATCH v2 2/3] kni: fix kni fifo synchronization > > > X-Mailer: git-send-email 2.7.4 > > > > > > > + Ferruh Yigit > > > > > > > > With existing code in kni_fifo_put, rx_q values are not being updated > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal, > > > This is causing the sync issue on other core. The same situation > > > happens in kni_fifo_get as well. > > > > > > So syncing the values by adding C11 atomic memory barriers to make > > > sure the values being synced before updating fifo_write and fifo_read. > > > > > > Fixes: 3fc5ca2 ("kni: initial import") > > > Signed-off-by: Phil Yang > > > Reviewed-by: Honnappa Nagarahalli > > > Reviewed-by: Gavin Hu > > > --- > > > .../linuxapp/eal/include/exec-env/rte_kni_common.h | 5 > > > lib/librte_kni/rte_kni_fifo.h | 30 > > > +- > > > 2 files changed, 34 insertions(+), 1 deletion(-) > > > > > > diff --git > > > a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > > > b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > > > index cfa9448..1fd713b 100644 > > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h > > > @@ -54,8 +54,13 @@ struct rte_kni_request { > > > * Writing should never overwrite the read position > > > */ > > > struct rte_kni_fifo { > > > +#ifndef RTE_USE_C11_MEM_MODEL > > > volatile unsigned write; /**< Next position to be written*/ > > > volatile unsigned read; /**< Next position to be read */ > > > +#else > > > + unsigned write; /**< Next position to be written*/ > > > + unsigned read; /**< Next position to be read */ > > > +#endif > > > unsigned len;/**< Circular buffer length */ > > > unsigned elem_size; /**< Pointer size - for 32/64 bit OS > > > */ > > > void *volatile buffer[]; /**< The buffer contains mbuf > > > pointers */ > > > diff --git a/lib/librte_kni/rte_kni_fifo.h > > > b/lib/librte_kni/rte_kni_fifo.h index ac26a8c..f4171a1 100644 > > > --- a/lib/librte_kni/rte_kni_fifo.h > > > +++ b/lib/librte_kni/rte_kni_fifo.h > > > @@ -28,8 +28,13 @@ kni_fifo_put(struct rte_kni_fifo *fifo, void > > > **data, unsigned num) { > > > unsigned i = 0; > > > unsigned fifo_write = fifo->write; > > > - unsigned fifo_read = fifo->read; > > > unsigned new_write = fifo_write; > > > +#ifdef RTE_USE_C11_MEM_MODEL > > > + unsigned fifo_read = __atomic_load_n(&fifo->read, > > > +__ATOMIC_ACQUIRE); > > > +#else > > > + unsigned fifo_read = fifo->read; #endif > > > > Correct. > > My apologies, did not follow your comment here. Do you want us to correct > anything here? '#endif' is not appearing on the correct line in the email, > but it shows up fine on the patch work. No. What I meant is, code is correct. > > > > > > > > > > > for (i = 0; i < num; i++) { > > > new_write = (new_write + 1) & (fifo->len - 1); @@ > > > -39,7 +44,12 @@ kni_fifo_put(struct rte_kni_fifo *fifo, void **data, > > unsigned num) > > > fifo->buffer[fifo_write] = data[i]; > > > fifo_write = new_write; > > > } > > > +#ifdef RTE_USE_C11_MEM_MODEL > > > + __atomic_store_n(&fifo->write, fifo_write, __ATOMIC_RELEASE); > > > +#else > > > + rte_smp_wmb(); > > > fifo->write = fifo_write; > > > +#endif > > > > Correct. > > > return i; > > > } > > > > > > @@ -51,7 +61,12 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void > > > **data, unsigned num) { > > > unsigned i = 0; > > > unsigned new_read = fifo->read; > > > +#ifdef RTE_USE_C11_MEM_MODEL > > > + unsigned fifo_write = __atomic_load_n(&fifo->write, > > > +__ATOMIC_ACQUIRE); #else > > > unsigned fifo_write = fifo->write; > > > +#endif > > > > Correct. > > > > > + > > > for (i = 0; i < num; i++) { > > > if (new_read == fifo_write) > > > break; > > > @@ -59,7 +74,12 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void **data, > > unsigned num) > > > data[i] = fifo->buffer[new_read]; > > > new_read = (new_re
[dpdk-dev] [PATCH v3 1/2] vhost: introduce API to get vDPA device number
Signed-off-by: Xiaolong Ye --- lib/librte_vhost/rte_vdpa.h| 3 +++ lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/vdpa.c| 6 ++ 3 files changed, 10 insertions(+) diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index 90465ca26..b8223e337 100644 --- a/lib/librte_vhost/rte_vdpa.h +++ b/lib/librte_vhost/rte_vdpa.h @@ -84,4 +84,7 @@ rte_vdpa_find_device_id(struct rte_vdpa_dev_addr *addr); struct rte_vdpa_device * __rte_experimental rte_vdpa_get_device(int did); +/* Get current available vdpa device number */ +int __rte_experimental +rte_vdpa_get_device_num(void); #endif /* _RTE_VDPA_H_ */ diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index da220dd02..ae39b6e21 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -67,6 +67,7 @@ EXPERIMENTAL { rte_vdpa_unregister_device; rte_vdpa_find_device_id; rte_vdpa_get_device; + rte_vdpa_get_device_num; rte_vhost_driver_attach_vdpa_device; rte_vhost_driver_detach_vdpa_device; rte_vhost_driver_get_vdpa_device_id; diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c index c82fd4370..c2c5dff1d 100644 --- a/lib/librte_vhost/vdpa.c +++ b/lib/librte_vhost/vdpa.c @@ -113,3 +113,9 @@ rte_vdpa_get_device(int did) return vdpa_devices[did]; } + +int +rte_vdpa_get_device_num(void) +{ + return vdpa_device_num; +} -- 2.17.1
[dpdk-dev] [PATCH v3 0/2] introduce vdpa sample
Hi, This patchset introduces vdpa sample to demonstrate the vDPA use case. v3 changes: * list cmd would show queue number and supported features of vdpa devices. * address Xiao's review comments v2 changes: * fix a compilation error reported by Rosen * improve create cmd in interactive mode and add two new cmds: list, * quit * add application documentation Xiaolong Ye (2): vhost: introduce API to get vDPA device number examples/vdpa: introduce a new sample for vDPA MAINTAINERS| 2 + doc/guides/sample_app_ug/index.rst | 1 + doc/guides/sample_app_ug/vdpa.rst | 115 +++ examples/Makefile | 2 +- examples/vdpa/Makefile | 32 ++ examples/vdpa/main.c | 458 + examples/vdpa/meson.build | 16 + lib/librte_vhost/rte_vdpa.h| 3 + lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/vdpa.c| 6 + 10 files changed, 635 insertions(+), 1 deletion(-) create mode 100644 doc/guides/sample_app_ug/vdpa.rst create mode 100644 examples/vdpa/Makefile create mode 100644 examples/vdpa/main.c create mode 100644 examples/vdpa/meson.build -- 2.17.1
[dpdk-dev] [PATCH v3 2/2] examples/vdpa: introduce a new sample for vDPA
The vdpa sample application creates vhost-user sockets by using the vDPA backend. vDPA stands for vhost Data Path Acceleration which utilizes virtio ring compatible devices to serve virtio driver directly to enable datapath acceleration. As vDPA driver can help to set up vhost datapath, this application doesn't need to launch dedicated worker threads for vhost enqueue/dequeue operations. Signed-off-by: Xiao Wang Signed-off-by: Xiaolong Ye --- MAINTAINERS| 2 + doc/guides/sample_app_ug/index.rst | 1 + doc/guides/sample_app_ug/vdpa.rst | 115 examples/Makefile | 2 +- examples/vdpa/Makefile | 32 ++ examples/vdpa/main.c | 458 + examples/vdpa/meson.build | 16 + 7 files changed, 625 insertions(+), 1 deletion(-) create mode 100644 doc/guides/sample_app_ug/vdpa.rst create mode 100644 examples/vdpa/Makefile create mode 100644 examples/vdpa/main.c create mode 100644 examples/vdpa/meson.build diff --git a/MAINTAINERS b/MAINTAINERS index 5967c1dd3..5656f18e8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -683,6 +683,8 @@ F: doc/guides/sample_app_ug/vhost.rst F: examples/vhost_scsi/ F: doc/guides/sample_app_ug/vhost_scsi.rst F: examples/vhost_crypto/ +F: examples/vdpa/ +F: doc/guides/sample_app_ug/vdpa.rst Vhost PMD M: Maxime Coquelin diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst index 5bedf4f6f..74b12af85 100644 --- a/doc/guides/sample_app_ug/index.rst +++ b/doc/guides/sample_app_ug/index.rst @@ -45,6 +45,7 @@ Sample Applications User Guides vhost vhost_scsi vhost_crypto +vdpa netmap_compatibility ip_pipeline test_pipeline diff --git a/doc/guides/sample_app_ug/vdpa.rst b/doc/guides/sample_app_ug/vdpa.rst new file mode 100644 index 0..44fe6736d --- /dev/null +++ b/doc/guides/sample_app_ug/vdpa.rst @@ -0,0 +1,115 @@ +.. SPDX-License-Identifier: BSD-3-Clause +Copyright(c) 2018 Intel Corporation. + +Vdpa Sample Application +=== + +The vdpa sample application creates vhost-user sockets by using the +vDPA backend. vDPA stands for vhost Data Path Acceleration which utilizes +virtio ring compatible devices to serve virtio driver directly to enable +datapath acceleration. As vDPA driver can help to set up vhost datapath, +this application doesn't need to launch dedicated worker threads for vhost +enqueue/dequeue operations. + +Testing steps +- + +This section shows the steps of how to start VMs with vDPA vhost-user +backend and verify network connection & live migration. + +Build +~ + +To compile the sample application see :doc:`compiling`. + +The application is located in the ``vdpa`` sub-directory. + +Start the vdpa example +~~ + +.. code-block:: console + +./vdpa [EAL options] -- [--interactive|-i] or [--iface SOCKET_PATH] + +where + +* --iface specifies the path prefix of the UNIX domain socket file, e.g. + /tmp/vhost-user-, then the socket files will be named as /tmp/vhost-user- + (n starts from 0). +* --interactive means run the vdpa sample in interactive mode, currently 4 + internal cmds are supported: + + 1. help: show help message + 2. list: list all available vdpa devices + 3. create: create a new vdpa port with socket file and vdpa device address + 4. quit: unregister vhost driver and exit the application + +Take IFCVF driver for example: + +.. code-block:: console + +./vdpa --log-level=9 -c 0x6 -n 4 --socket-mem 1024,1024 \ +-w :06:00.2,vdpa=1 -w :06:00.3,vdpa=1 \ +-- --interactive + +.. note:: +We need to bind vfio-pci to VFs before running vdpa sample. + +* modprobe vfio-pci +* ./usertools/dpdk-devbind.py -b vfio-pci 06:00.2 06:00.3 + +Then we can create 2 vdpa ports in interactive cmdline. + +.. code-block:: console + +vdpa> list +device id device address +0 :06:00.2 +1 :06:00.3 +vdpa> create /tmp/vdpa-socket0 :06:00.2 +vdpa> create /tmp/vdpa-socket1 :06:00.3 + +.. _vdpa_app_run_vm: + +Start the VMs +~ + +.. code-block:: console + + qemu-system-x86_64 -cpu host -enable-kvm \ + + -mem-prealloc \ + -chardev socket,id=char0,path= \ + -netdev type=vhost-user,id=vdpa,chardev=char0 \ + -device virtio-net-pci,netdev=vdpa,mac=00:aa:bb:cc:dd:ee \ + +After the VMs launches, we can login the VMs and configure the ip, verify the +network connection via ping or netperf. + +.. note:: +Suggest to use QEMU 3.0.0 which extends vhost-user for vDPA. + +Live Migration +~~ +vDPA supports cross-backend live migration, user can migrate SW vhost backend +VM to vDPA backend VM and vice versa. Here are the detailed steps. Assume A is +the source host with SW vhost VM and B is the destination host with vDPA. + +1. Start vd
[dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
Currently, command-line switches for legacy mem mode or single-file segments mode are only stored in internal config. This leads to a situation where these flags have to always match between primary and secondary, which is bad for usability. Fix this by storing these flags in the shared config as well, so that secondary process can know if the primary was launched in single-file segments or legacy mem mode. This bumps the EAL ABI, however there's an EAL deprecation notice already in place[1] for a different feature, so that's OK. [1] http://patches.dpdk.org/patch/43502/ Signed-off-by: Anatoly Burakov --- Notes: v2: - Added documentation on ABI break doc/guides/rel_notes/rel_description.rst | 5 + doc/guides/rel_notes/release_18_11.rst| 6 +- .../common/include/rte_eal_memconfig.h| 4 lib/librte_eal/linuxapp/eal/Makefile | 2 +- lib/librte_eal/linuxapp/eal/eal.c | 20 +++ lib/librte_eal/meson.build| 2 +- 6 files changed, 36 insertions(+), 3 deletions(-) diff --git a/doc/guides/rel_notes/rel_description.rst b/doc/guides/rel_notes/rel_description.rst index 8f285566f..3fd289939 100644 --- a/doc/guides/rel_notes/rel_description.rst +++ b/doc/guides/rel_notes/rel_description.rst @@ -10,3 +10,8 @@ release version |release| and previous releases. It lists new features, fixed bugs, API and ABI changes and known issues. For instructions on compiling and running the release, see the :ref:`DPDK Getting Started Guide `. + +* eal: new ABI version for EAL library due to adding ``legacy_mem`` and + ``single_file_segments`` values to ``rte_config`` structure on account of + improving DPDK usability when using either ``--legacy-mem`` or + ``--single-file-segments`` flags. diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index 3ae6b3f58..34acf01d9 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -83,6 +83,10 @@ ABI Changes Also, make sure to start the actual text at the margin. = +* eal: added ``legacy_mem`` and ``single_file_segments`` values to + ``rte_config`` structure on account of improving DPDK usability when + using either ``--legacy-mem`` or ``--single-file-segments`` flags. + Removed Items - @@ -129,7 +133,7 @@ The libraries prepended with a plus sign were incremented in this version. librte_compressdev.so.1 librte_cryptodev.so.5 librte_distributor.so.1 - librte_eal.so.8 + + librte_eal.so.9 librte_ethdev.so.10 librte_eventdev.so.4 librte_flow_classify.so.1 diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index aff0688dd..62a21c2dc 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -77,6 +77,10 @@ struct rte_mem_config { * exact same address the primary process maps it. */ uint64_t mem_cfg_addr; + + /* legacy mem and single file segments options are shared */ + uint32_t legacy_mem; + uint32_t single_file_segments; } __attribute__((__packed__)); diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index fd92c75c2..5c16bc40f 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH) EXPORT_MAP := ../../rte_eal_version.map VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR) -LIBABIVER := 8 +LIBABIVER := 9 VPATH += $(RTE_SDK)/lib/librte_eal/common diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index e59ac6577..4a55d3b69 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -352,6 +352,24 @@ eal_proc_type_detect(void) return ptype; } +/* copies data from internal config to shared config */ +static void +eal_update_mem_config(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + mcfg->legacy_mem = internal_config.legacy_mem; + mcfg->single_file_segments = internal_config.single_file_segments; +} + +/* copies data from shared config to internal config */ +static void +eal_update_internal_config(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + internal_config.legacy_mem = mcfg->legacy_mem; + internal_config.single_file_segments = mcfg->single_file_segments; +} + /* Sets up rte_config structure with the pointer to shared memory config.*/ static void rte_config_init(void) @@ -361,11 +379,13 @@ rte_config_init(void) switch (rte_config.process_type){ case RTE_PROC_PRIMARY: rte_eal_config_create(); + eal_update_mem_config();
Re: [dpdk-dev] [PATCH v3 1/5] vhost: unify VhostUserMsg usage
On 15-Sep-18 6:20 AM, Nikolay Nikolaev wrote: Use the typedef version of struct VhostUserMsg. Also unify the related parameter name. Signed-off-by: Nikolay Nikolaev --- I'm probably missing some background on this, but according to DPDK coding style guide, typedef structs are to be avoided [1]. [1] https://doc.dpdk.org/guides-18.08/contributing/coding_style.html#typedefs -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH] test: disable alarm autotest in FreeBSD
On 19-Sep-18 3:39 PM, Pallantla Poornima wrote: Disabled the alarm_autotest UT in FreeBSD Interrupts are not supported in FreeBSD. Alarm API depends on interrupts, so disabled alarm test on FreeBSD. Signed-off-by: Pallantla Poornima --- Reviewed-by: Anatoly Burakov -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH v2 04/13] bus/vdev: add device matching field driver
19/09/2018 18:03, Gaetan Rivet: > The vdev bus parses a field "driver", matching > a vdev driver name with one passed as follows: > >"bus=vdev,driver=" I think the property should be "name". We already have a "driver" category. So it may be "bus=vdev,name=mytap/driver=tap" Until now we were using the name of the driver as a prefix for the device name because it was the only way of knowing the driver to use. With a richer syntax like above, this restriction can be removed.
Re: [dpdk-dev] [PATCH] app/testpmd: fix missing jump action in flow action
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Reshma Pattan > Sent: Wednesday, September 19, 2018 3:01 PM > To: adrien.mazarg...@6wind.com; dev@dpdk.org > Cc: Pattan, Reshma > Subject: [dpdk-dev] [PATCH] app/testpmd: fix missing jump action in flow > action > > Added missing JUMP flow action in flow_action array. > Without this the flow rule cannot be created for JUMP action. > > Fixes: 938a184a18 ("app/testpmd: implement basic support for flow API") > > Signed-off-by: Reshma Pattan Acked-by: Bernard Iremonger
Re: [dpdk-dev] [PATCH v2 07/13] ethdev: add device matching field name
19/09/2018 18:03, Gaetan Rivet: > The eth device class can now parse a field name, > matching the eth_dev name with one passed as > >"class=eth,name=xx" I am not sure what is the purpose of the "name" property. I think we should not need it to choose a port by its ethdev name. If you are thinking about a vdev, we can use the rte_device name (at bus level).
Re: [dpdk-dev] [PATCH 00/17] net/qede: add enhancements and fixes
On 9/8/2018 9:30 PM, Rasesh Mody wrote: > This patchset adds enhancements and fixes for QEDE PMD. > > Rasesh Mody (8): > net/qede/base: fix to handle stag update event > net/qede/base: add support for OneView APIs > net/qede/base: get pre-negotiated values for stag and bw > net/qede: fix to program HW regs with ether type > net/qede/base: limit number of non ethernet queues to 64 > net/qede/base: correct MCP error handler's log verbosity > net/qede/base: fix logic for sfp get/set > net/qede/base: use pointer for bytes len read > > Shahed Shaikh (9): > net/qede/base: use trust mode for forced MAC limitations > net/qede: reorganize filter code > net/qede: fix flow director bug for IPv6 filter > net/qede: refactor fdir code into generic aRFS > net/qede: add support for generic flow API > net/qede: fix Rx buffer size calculation > net/qede: add support for Rx descriptor status > net/qede/base: fix MFW FLR flow bug > net/qede: add support for dev reset Series applied to dpdk-next-net/master, thanks.
Re: [dpdk-dev] [PATCH v6] app/testpmd: add forwarding mode to simulate a noisy neighbour
On 09/18/2018 10:35 AM, Jens Freimann wrote: > This adds a new forwarding mode to testpmd to simulate > more realistic behavior of a guest machine engaged in receiving > and sending packets performing Virtual Network Function (VNF). > > The goal is to enable a simple way of measuring performance impact on > cache and memory footprint utilization from various VNF co-located on > the same host machine. For this it does: > > * Buffer packets in a FIFO: > > Create a fifo to buffer received packets. Once it flows over put > those packets into the actual tx queue. The fifo is created per tx > queue and its size can be set with the --noisy-tx-sw-buffer-flushtime > commandline parameter. > > A second commandline parameter is used to set a timeout in > milliseconds after which the fifo is flushed. > > --noisy-tx-sw-buffer-size [packet numbers] > Keep the mbuf in a FIFO and forward the over flooding packets from the > FIFO. This queue is per TX-queue (after all other packet processing). > > --noisy-tx-sw-buffer-flushtime [delay] > Flush the packet queue if no packets have been seen during > [delay]. As long as packets are seen, the timer is reset. > > Add several options to simulate route lookups (memory reads) in tables > that can be quite large, as well as route hit statistics update. > These options simulates the while stack traversal and > will trash the cache. Memory access is random. > > * simulate route lookups: > > Allocate a buffer and perform reads and writes on it as specified by > commandline options: > > --noisy-lkup-memory [size] > Size of the VNF internal memory (MB), in which the random > read/write will be done, allocated by rte_malloc (hugepages). > > --noisy-lkup-num-writes [num] > Number of random writes in memory per packet should be > performed, simulating hit-flags update. 64 bits per write, > all write in different cache lines. > > --noisy-lkup-num-reads [num] > Number of random reads in memory per packet should be > performed, simulating FIB/table lookups. 64 bits per read, > all write in different cache lines. > > --noisy-lkup-num-reads-writes [num] > Number of random reads and writes in memory per packet should > be performed, simulating stats update. 64 bits per read-write, all > reads and writes in different cache lines. > > Signed-off-by: Jens Freimann > --- Hi Jens, thanks for the new version. A small few remaining comments below, Kevin. > + > +static void > +noisy_fwd_begin(portid_t pi) > +{ > + struct noisy_config *n; > + char name[NOISY_STRSIZE]; > + > + noisy_cfg[pi] = rte_zmalloc("testpmd noisy fifo and timers", > + sizeof(struct noisy_config), > + RTE_CACHE_LINE_SIZE); > + if (noisy_cfg == NULL) { Looks like it should be 'if (noisy_cfg[pi] == NULL)' > + rte_exit(EXIT_FAILURE, > + "rte_zmalloc(%d) struct noisy_config) \ > + failed\n", (int) pi); > + } > + n = noisy_cfg[pi]; > + n->do_buffering = noisy_tx_sw_bufsz > 0; > + n->do_sim = noisy_lkup_num_writes + noisy_lkup_num_reads + > + noisy_lkup_num_reads_writes; > + n->do_flush = noisy_tx_sw_buf_flush_time > 0; > + > + if (n->do_buffering) { > + snprintf(name, NOISY_STRSIZE, NOISY_RING, pi); > + n->f = rte_ring_create(name, noisy_tx_sw_bufsz, > + rte_socket_id(), 0); > + if (!n->f) > + rte_exit(EXIT_FAILURE, > + "rte_ring_create(%d), size %d) \ > + failed\n", (int) pi, > + noisy_tx_sw_bufsz); > + } > + if (noisy_lkup_mem_sz > 0) { > + n->vnf_mem = (char *) rte_zmalloc("vnf sim memory", > + noisy_lkup_mem_sz * 1024 * 1024, > + RTE_CACHE_LINE_SIZE); > + if (!n->vnf_mem) > + rte_exit(EXIT_FAILURE, > + "rte_zmalloc(%" PRIu64 ") for vnf \ > + memory) failed\n", noisy_lkup_mem_sz); > + } else if (n->do_sim) { > + rte_exit(EXIT_FAILURE, "--noisy-lkup-memory-size \ > + must be > 0\n"); > + } > +} > + > +struct fwd_engine noisy_vnf_engine = { > + .fwd_mode_name = "noisy", > + .port_fwd_begin = noisy_fwd_begin, > + .port_fwd_end = noisy_fwd_end, > + .packet_fwd = pkt_burst_noisy_vnf, > +}; > + new blank line at EOF. + warning: 1 line adds whitespace errors. > diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c > index 9220e1c1b..3231b0c51 100644 > --- a/app/test-pmd/parameters.c > +++ b/app/test-pmd/parameters.c > @@ -625,6 +625,12 @@ launch_args_parse(int argc, char** argv) > { "vxlan-gpe-port", 1, 0, 0 }, > { "mlockall", 0, 0, 0 }, > { "no-mlockall",0, 0,
[dpdk-dev] [PATCH v4 1/5] eventdev: add eth Tx adapter APIs
The ethernet Tx adapter abstracts the transmit stage of an event driven packet processing application. The transmit stage may be implemented with eventdev PMD support or use a rte_service function implemented in the adapter. These APIs provide a common configuration and control interface and an transmit API for the eventdev PMD implementation. The transmit port is specified using mbuf::port. The transmit queue is specified using the rte_event_eth_tx_adapter_txq_set() function. Signed-off-by: Nikhil Rao Acked-by: Jerin Jacob --- lib/librte_eventdev/rte_event_eth_tx_adapter.h | 462 + lib/librte_mbuf/rte_mbuf.h | 5 +- MAINTAINERS| 5 + doc/api/doxy-api-index.md | 1 + 4 files changed, 472 insertions(+), 1 deletion(-) create mode 100644 lib/librte_eventdev/rte_event_eth_tx_adapter.h diff --git a/lib/librte_eventdev/rte_event_eth_tx_adapter.h b/lib/librte_eventdev/rte_event_eth_tx_adapter.h new file mode 100644 index 000..3e0d5c6 --- /dev/null +++ b/lib/librte_eventdev/rte_event_eth_tx_adapter.h @@ -0,0 +1,462 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation. + */ + +#ifndef _RTE_EVENT_ETH_TX_ADAPTER_ +#define _RTE_EVENT_ETH_TX_ADAPTER_ + +/** + * @file + * + * RTE Event Ethernet Tx Adapter + * + * The event ethernet Tx adapter provides configuration and data path APIs + * for the ethernet transmit stage of an event driven packet processing + * application. These APIs abstract the implementation of the transmit stage + * and allow the application to use eventdev PMD support or a common + * implementation. + * + * In the common implementation, the application enqueues mbufs to the adapter + * which runs as a rte_service function. The service function dequeues events + * from its event port and transmits the mbufs referenced by these events. + * + * The ethernet Tx event adapter APIs are: + * + * - rte_event_eth_tx_adapter_create() + * - rte_event_eth_tx_adapter_create_ext() + * - rte_event_eth_tx_adapter_free() + * - rte_event_eth_tx_adapter_start() + * - rte_event_eth_tx_adapter_stop() + * - rte_event_eth_tx_adapter_queue_add() + * - rte_event_eth_tx_adapter_queue_del() + * - rte_event_eth_tx_adapter_stats_get() + * - rte_event_eth_tx_adapter_stats_reset() + * - rte_event_eth_tx_adapter_enqueue() + * - rte_event_eth_tx_adapter_event_port_get() + * - rte_event_eth_tx_adapter_service_id_get() + * + * The application creates the adapter using + * rte_event_eth_tx_adapter_create() or rte_event_eth_tx_adapter_create_ext(). + * + * The adapter will use the common implementation when the eventdev PMD + * does not have the RTE_EVENT_ETH_TX_ADAPTER_CAP_INTERNAL_PORT capability. + * The common implementation uses an event port that is created using the port + * configuration parameter passed to rte_event_eth_tx_adapter_create(). The + * application can get the port identifier using + * rte_event_eth_tx_adapter_event_port_get() and must link an event queue to + * this port. + * + * If the eventdev PMD has the RTE_EVENT_ETH_TX_ADAPTER_CAP_INTERNAL_PORT + * flags set, Tx adapter events should be enqueued using the + * rte_event_eth_tx_adapter_enqueue() function, else the application should + * use rte_event_enqueue_burst(). + * + * Transmit queues can be added and deleted from the adapter using + * rte_event_eth_tx_adapter_queue_add()/del() APIs respectively. + * + * The application can start and stop the adapter using the + * rte_event_eth_tx_adapter_start/stop() calls. + * + * The common adapter implementation uses an EAL service function as described + * before and its execution is controlled using the rte_service APIs. The + * rte_event_eth_tx_adapter_service_id_get() + * function can be used to retrieve the adapter's service function ID. + * + * The ethernet port and transmit queue index to transmit the mbuf on are + * specified using the mbuf port and the higher 16 bits of + * struct rte_mbuf::hash::sched:hi. The application should use the + * rte_event_eth_tx_adapter_txq_set() and rte_event_eth_tx_adapter_txq_get() + * functions to access the transmit queue index since it is expected that the + * transmit queue will be eventually defined within struct rte_mbuf and using + * these macros will help with minimizing application impact due to + * a change in how the transmit queue index is specified. + */ + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +#include + +#include "rte_eventdev.h" + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Adapter configuration structure + * + * @see rte_event_eth_tx_adapter_create_ext + * @see rte_event_eth_tx_adapter_conf_cb + */ +struct rte_event_eth_tx_adapter_conf { + uint8_t event_port_id; + /**< Event port identifier, the adapter service function dequeues mbuf +* events from this port. +* @see RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT +
[dpdk-dev] [PATCH v4 3/5] eventdev: add eth Tx adapter implementation
This patch implements the Tx adapter APIs by invoking the corresponding eventdev PMD callbacks and also provides the common rte_service function based implementation when the eventdev PMD support is absent. Signed-off-by: Nikhil Rao --- config/rte_config.h|1 + lib/librte_eventdev/rte_event_eth_tx_adapter.c | 1138 config/common_base |2 +- lib/librte_eventdev/Makefile |2 + lib/librte_eventdev/meson.build|6 +- lib/librte_eventdev/rte_eventdev_version.map | 12 + 6 files changed, 1158 insertions(+), 3 deletions(-) create mode 100644 lib/librte_eventdev/rte_event_eth_tx_adapter.c diff --git a/config/rte_config.h b/config/rte_config.h index ee84f04..73e71af 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -69,6 +69,7 @@ #define RTE_EVENT_TIMER_ADAPTER_NUM_MAX 32 #define RTE_EVENT_ETH_INTR_RING_SIZE 1024 #define RTE_EVENT_CRYPTO_ADAPTER_MAX_INSTANCE 32 +#define RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE 32 /* rawdev defines */ #define RTE_RAWDEV_MAX_DEVS 10 diff --git a/lib/librte_eventdev/rte_event_eth_tx_adapter.c b/lib/librte_eventdev/rte_event_eth_tx_adapter.c new file mode 100644 index 000..aae0378 --- /dev/null +++ b/lib/librte_eventdev/rte_event_eth_tx_adapter.c @@ -0,0 +1,1138 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation. + */ +#include +#include +#include + +#include "rte_eventdev_pmd.h" +#include "rte_event_eth_tx_adapter.h" + +#define TXA_BATCH_SIZE 32 +#define TXA_SERVICE_NAME_LEN 32 +#define TXA_MEM_NAME_LEN 32 +#define TXA_FLUSH_THRESHOLD1024 +#define TXA_RETRY_CNT 100 +#define TXA_MAX_NB_TX 128 +#define TXA_INVALID_DEV_ID INT32_C(-1) +#define TXA_INVALID_SERVICE_ID INT64_C(-1) + +#define txa_evdev(id) (&rte_eventdevs[txa_dev_id_array[(id)]]) + +#define txa_dev_caps_get(id) txa_evdev((id))->dev_ops->eth_tx_adapter_caps_get + +#define txa_dev_adapter_create(t) txa_evdev(t)->dev_ops->eth_tx_adapter_create + +#define txa_dev_adapter_create_ext(t) \ + txa_evdev(t)->dev_ops->eth_tx_adapter_create + +#define txa_dev_adapter_free(t) txa_evdev(t)->dev_ops->eth_tx_adapter_free + +#define txa_dev_queue_add(id) txa_evdev(id)->dev_ops->eth_tx_adapter_queue_add + +#define txa_dev_queue_del(t) txa_evdev(t)->dev_ops->eth_tx_adapter_queue_del + +#define txa_dev_start(t) txa_evdev(t)->dev_ops->eth_tx_adapter_start + +#define txa_dev_stop(t) txa_evdev(t)->dev_ops->eth_tx_adapter_stop + +#define txa_dev_stats_reset(t) txa_evdev(t)->dev_ops->eth_tx_adapter_stats_reset + +#define txa_dev_stats_get(t) txa_evdev(t)->dev_ops->eth_tx_adapter_stats_get + +#define RTE_EVENT_ETH_TX_ADAPTER_ID_VALID_OR_ERR_RET(id, retval) \ +do { \ + if (!txa_valid_id(id)) { \ + RTE_EDEV_LOG_ERR("Invalid eth Rx adapter id = %d", id); \ + return retval; \ + } \ +} while (0) + +#define TXA_CHECK_OR_ERR_RET(id) \ +do {\ + int ret; \ + RTE_EVENT_ETH_TX_ADAPTER_ID_VALID_OR_ERR_RET((id), -EINVAL); \ + ret = txa_init(); \ + if (ret != 0) \ + return ret; \ + if (!txa_adapter_exist((id))) \ + return -EINVAL; \ +} while (0) + +/* Tx retry callback structure */ +struct txa_retry { + /* Ethernet port id */ + uint16_t port_id; + /* Tx queue */ + uint16_t tx_queue; + /* Adapter ID */ + uint8_t id; +}; + +/* Per queue structure */ +struct txa_service_queue_info { + /* Queue has been added */ + uint8_t added; + /* Retry callback argument */ + struct txa_retry txa_retry; + /* Tx buffer */ + struct rte_eth_dev_tx_buffer *tx_buf; +}; + +/* PMD private structure */ +struct txa_service_data { + /* Max mbufs processed in any service function invocation */ + uint32_t max_nb_tx; + /* Number of Tx queues in adapter */ + uint32_t nb_queues; + /* Synchronization with data path */ + rte_spinlock_t tx_lock; + /* Event port ID */ + uint8_t port_id; + /* Event device identifier */ + uint8_t eventdev_id; + /* Highest port id supported + 1 */ + uint16_t dev_count; + /* Loop count to flush Tx buffers */ + int loop_cnt; + /* Per ethernet device structure */ + struct txa_service_ethdev *txa_ethdev; + /* Statistics */ + struct rte_event_eth_tx_adapter_stats stats; + /* Adapter Identifier */ + uint8_t id; + /* Conf arg must be freed */ + uint8_t conf_free; + /* Configuration callback */ + rte_event_eth_tx_adapter_conf_cb conf_cb; + /* Configuration callback argument */ + void *conf_arg; + /* socket id */ + int socket_id; + /* Per adapter EAL service */ + int64_t service_id; + /* Memory allocation name */ + char mem_name[TXA_MEM_NAME_LEN]; +} __rte_ca