Re: [dpdk-dev] [PATCH v2 09/11] mempool/dpaa: prepare to remove register memory area op
On 03/25/2018 07:20 PM, Andrew Rybchenko wrote: Populate mempool driver callback is executed a bit later than register memory area, provides the same information and will substitute the later since it gives more flexibility and in addition to notification about memory area allows to customize how mempool objects are stored in memory. Signed-off-by: Andrew Rybchenko --- v1 -> v2: - fix build error because of prototype mismatch drivers/mempool/dpaa/dpaa_mempool.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/mempool/dpaa/dpaa_mempool.c b/drivers/mempool/dpaa/dpaa_mempool.c index 7b82f4b..0dcb488 100644 --- a/drivers/mempool/dpaa/dpaa_mempool.c +++ b/drivers/mempool/dpaa/dpaa_mempool.c @@ -264,10 +264,9 @@ dpaa_mbuf_get_count(const struct rte_mempool *mp) } static int -dpaa_register_memory_area(const struct rte_mempool *mp, - char *vaddr __rte_unused, - rte_iova_t paddr __rte_unused, - size_t len) +dpaa_populate(struct rte_mempool *mp, unsigned int max_objs, + char *vaddr, rte_iova_t paddr, size_t len, Self NACK, 'void *vaddr' must be above + rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg) { struct dpaa_bp_info *bp_info; unsigned int total_elt_sz; @@ -289,7 +288,9 @@ dpaa_register_memory_area(const struct rte_mempool *mp, if (len >= total_elt_sz * mp->size) bp_info->flags |= DPAA_MPOOL_SINGLE_SEGMENT; - return 0; + return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len, + obj_cb, obj_cb_arg); + } struct rte_mempool_ops dpaa_mpool_ops = { @@ -299,7 +300,7 @@ struct rte_mempool_ops dpaa_mpool_ops = { .enqueue = dpaa_mbuf_free_bulk, .dequeue = dpaa_mbuf_alloc_bulk, .get_count = dpaa_mbuf_get_count, - .register_memory_area = dpaa_register_memory_area, + .populate = dpaa_populate, }; MEMPOOL_REGISTER_OPS(dpaa_mpool_ops);
Re: [dpdk-dev] [PATCH] net/bnxt: convert to SPDX license tag
Hi Ajit/Scott, On 3/26/2018 12:37 AM, Ajit Khaparde wrote: From: Scott Branden Update the license header on bnxt files to be the standard BSD-3-Clause license used for the rest of DPDK, bring the files in compliance with the DPDK licensing policy. Signed-off-by: Scott Branden Signed-off-by: Ajit Khaparde diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index b5a0badfc..967c94d14 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -1,34 +1,7 @@ -/*- - * BSD LICENSE - * - * Copyright(c) Broadcom Limited. +// SPDX-License-Identifier: BSD-3-Clause +/* + * Copyright(c) 2014-2018 Broadcom * All rights reserved. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * * Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * * Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in - * the documentation and/or other materials provided with the - * distribution. - * * Neither the name of Broadcom Corporation nor the names of its - * contributors may be used to endorse or promote products derived - * from this software without specific prior written permission. - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #ifndef _BNXT_H_ diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c index 737bb060a..bb57c6f4f 100644 --- a/drivers/net/bnxt/bnxt_cpr.c +++ b/drivers/net/bnxt/bnxt_cpr.c @@ -1,34 +1,7 @@ -/*- - * BSD LICENSE - * - * Copyright(c) Broadcom Limited. +// SPDX-License-Identifier: BSD-3-Clause +/* + * Copyright(c) 2014-2018 Broadcom * All rights reserved. It is preferred to use "/*" commenting instead of "//" in dpdk for first line SPDX identifer. It will maintain the consistency and help in future automation for license check. Regards, Hemant
Re: [dpdk-dev] [PATCH] net/octeontx: use the new offload APIs
Hi Ferruh, On Wed, Mar 21, 2018 at 07:25:57PM +, Ferruh Yigit wrote: > On 3/8/2018 7:07 PM, Pavan Nikhilesh wrote: > > Use the new Rx/Tx offload APIs and remove the old style offloads. > > > > Signed-off-by: Pavan Nikhilesh > > --- > > > > Checkpatch reports falsepositive for PRIx64 > > > > drivers/net/octeontx/octeontx_ethdev.c | 82 > > ++ > > drivers/net/octeontx/octeontx_ethdev.h | 3 ++ > > 2 files changed, 67 insertions(+), 18 deletions(-) > > > > diff --git a/drivers/net/octeontx/octeontx_ethdev.c > > b/drivers/net/octeontx/octeontx_ethdev.c > > index b739c0b39..0448e3557 100644 > > --- a/drivers/net/octeontx/octeontx_ethdev.c > > +++ b/drivers/net/octeontx/octeontx_ethdev.c > > @@ -262,6 +262,8 @@ octeontx_dev_configure(struct rte_eth_dev *dev) > > struct rte_eth_rxmode *rxmode = &conf->rxmode; > > struct rte_eth_txmode *txmode = &conf->txmode; > > struct octeontx_nic *nic = octeontx_pmd_priv(dev); > > + uint64_t configured_offloads; > > + uint64_t unsupported_offloads; > > int ret; > > > > PMD_INIT_FUNC_TRACE(); > > @@ -283,34 +285,43 @@ octeontx_dev_configure(struct rte_eth_dev *dev) > > return -EINVAL; > > } > > > > - if (!rxmode->hw_strip_crc) { > > + configured_offloads = rxmode->offloads; > > + > > + if (!(configured_offloads & DEV_RX_OFFLOAD_CRC_STRIP)) { > > PMD_INIT_LOG(NOTICE, "can't disable hw crc strip"); > > - rxmode->hw_strip_crc = 1; > > + configured_offloads |= DEV_RX_OFFLOAD_CRC_STRIP; > > } > > Just as a heads up this will be changed in this release [1], CRC_STRIP will be > the default behavior without requiring a flag. Thanks for the heads up, will remove the flag. > > [1] > https://dpdk.org/ml/archives/dev/2018-March/093255.html > > > > > - if (rxmode->hw_ip_checksum) { > > + if (configured_offloads & DEV_RX_OFFLOAD_CHECKSUM) { > > PMD_INIT_LOG(NOTICE, "rxcksum not supported"); > > - rxmode->hw_ip_checksum = 0; > > + configured_offloads &= ~DEV_RX_OFFLOAD_CHECKSUM; > > } > > No need to specific check for DEV_RX_OFFLOAD_CHECKSUM, if it is not announced > as > supported below unsupported_offloads will cover it. > > > > > - if (rxmode->split_hdr_size) { > > - octeontx_log_err("rxmode does not support split header"); > > - return -EINVAL; > > - } > > + unsupported_offloads = configured_offloads & ~OCTEONTX_RX_OFFLOADS; > > > > - if (rxmode->hw_vlan_filter) { > > - octeontx_log_err("VLAN filter not supported"); > > - return -EINVAL; > > + if (unsupported_offloads) { > > + PMD_INIT_LOG(ERR, "Rx offloads 0x%" PRIx64 " are not supported. > > " > > + "Requested 0x%" PRIx64 " supported 0x%" PRIx64 "\n", > > + unsupported_offloads, configured_offloads, > > + (uint64_t)OCTEONTX_RX_OFFLOADS); > > + return -ENOTSUP; > > } > > > > - if (rxmode->hw_vlan_extend) { > > - octeontx_log_err("VLAN extended not supported"); > > - return -EINVAL; > > + configured_offloads = txmode->offloads; > > + > > + if (!(configured_offloads & DEV_TX_OFFLOAD_MT_LOCKFREE)) { > > + PMD_INIT_LOG(NOTICE, "cant disable lockfree tx"); > > + configured_offloads |= DEV_TX_OFFLOAD_MT_LOCKFREE; > > } > > > > - if (rxmode->enable_lro) { > > - octeontx_log_err("LRO not supported"); > > - return -EINVAL; > > + unsupported_offloads = configured_offloads & ~OCTEONTX_TX_OFFLOADS; > > + > > + if (unsupported_offloads) { > > + PMD_INIT_LOG(ERR, "Tx offloads 0x%" PRIx64 " are not supported." > > + "Requested 0x%" PRIx64 " supported 0x%" PRIx64 ".\n", > > + unsupported_offloads, configured_offloads, > > + (uint64_t)OCTEONTX_TX_OFFLOADS); > > + return -ENOTSUP; > > } > > > > if (conf->link_speeds & ETH_LINK_SPEED_FIXED) { > > @@ -750,10 +761,11 @@ octeontx_dev_tx_queue_setup(struct rte_eth_dev *dev, > > uint16_t qidx, > > struct octeontx_txq *txq = NULL; > > uint16_t dq_num; > > int res = 0; > > + uint64_t configured_offloads; > > + uint64_t unsupported_offloads; > > > > RTE_SET_USED(nb_desc); > > RTE_SET_USED(socket_id); > > - RTE_SET_USED(tx_conf); > > > > dq_num = (nic->port_id * PKO_VF_NUM_DQ) + qidx; > > > > @@ -771,6 +783,17 @@ octeontx_dev_tx_queue_setup(struct rte_eth_dev *dev, > > uint16_t qidx, > > dev->data->tx_queues[qidx] = NULL; > > } > > > > + configured_offloads = tx_conf->offloads; > > + > > + unsupported_offloads = configured_offloads & ~OCTEONTX_TX_OFFLOADS; > > + if (unsupported_offloads) { > > + PMD_INIT_LOG(ERR, "Tx offloads 0x%" PRIx64 " are not supported." > > + "Requested 0x%" PRIx64 " supported 0x%" PRIx64 ".\n", > > + unsupported_offloads, configured_offloads, > > +
Re: [dpdk-dev] [PATCH v5 1/2] eal: rename IPC sync request to pending request
On 3/24/2018 8:46 PM, Anatoly Burakov wrote: Signed-off-by: Anatoly Burakov Suggested-by: Jianfeng Tan Acked-by: Jianfeng Tan
Re: [dpdk-dev] [PATCH v4] net/i40e: fix intr callback unregister by adding retry
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhang, Qi Z > Sent: Tuesday, March 20, 2018 6:02 PM > To: wangyunjian; dev@dpdk.org; Rybalchenko, Kirill > Cc: ca...@huawei.com > Subject: Re: [dpdk-dev] [PATCH v4] net/i40e: fix intr callback unregister by > adding retry > > > > > -Original Message- > > From: wangyunjian [mailto:wangyunj...@huawei.com] > > Sent: Tuesday, March 20, 2018 3:01 PM > > To: dev@dpdk.org; Zhang, Qi Z ; Rybalchenko, > > Kirill > > Cc: ca...@huawei.com; Yunjian Wang > > Subject: [dpdk-dev] [PATCH v4] net/i40e: fix intr callback unregister > > by adding retry > > > > From: Yunjian Wang > > > > The nic's interrupt source has some active callbacks, when the port hotplug. > > Add a retry to give more port's a chance to uninit before returning an > > error. > > > > Fixes: d42aaf30008b ("i40e: support port hotplug") > > > > Signed-off-by: Yunjian Wang > > Reviewed-by: Kirill Rybalchenko > > Acked-by: Qi Zhang Applied to dpdk-next-net-intel, thanks! /Helin
[dpdk-dev] [PATCH v3 2/2] octeontx: move mbox to common folder
Move commonly used functions across mempool, event and net devices to a common folder in drivers. Signed-off-by: Pavan Nikhilesh --- drivers/common/Makefile| 4 ++ drivers/common/meson.build | 1 + drivers/common/octeontx/Makefile | 24 drivers/common/octeontx/meson.build| 6 ++ .../{mempool => common}/octeontx/octeontx_mbox.c | 65 +- .../{mempool => common}/octeontx/octeontx_mbox.h | 14 + .../octeontx/rte_common_octeontx_version.map | 9 +++ drivers/event/octeontx/Makefile| 4 +- drivers/event/octeontx/meson.build | 5 +- .../{mempool => event}/octeontx/octeontx_ssovf.c | 20 ++- drivers/mempool/octeontx/Makefile | 5 +- drivers/mempool/octeontx/meson.build | 6 +- drivers/mempool/octeontx/octeontx_fpavf.c | 4 -- drivers/mempool/octeontx/octeontx_pool_logs.h | 9 --- .../octeontx/rte_mempool_octeontx_version.map | 6 -- drivers/net/octeontx/Makefile | 3 +- mk/rte.app.mk | 4 ++ 17 files changed, 144 insertions(+), 45 deletions(-) create mode 100644 drivers/common/octeontx/Makefile create mode 100644 drivers/common/octeontx/meson.build rename drivers/{mempool => common}/octeontx/octeontx_mbox.c (83%) rename drivers/{mempool => common}/octeontx/octeontx_mbox.h (66%) create mode 100644 drivers/common/octeontx/rte_common_octeontx_version.map rename drivers/{mempool => event}/octeontx/octeontx_ssovf.c (92%) diff --git a/drivers/common/Makefile b/drivers/common/Makefile index 192066307..0fd223761 100644 --- a/drivers/common/Makefile +++ b/drivers/common/Makefile @@ -4,4 +4,8 @@ include $(RTE_SDK)/mk/rte.vars.mk +ifeq ($(CONFIG_RTE_LIBRTE_PMD_OCTEONTX_SSOVF)$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL),yy) +DIRS-y += octeontx +endif + include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/common/meson.build b/drivers/common/meson.build index ab774b8ef..5f6341b8f 100644 --- a/drivers/common/meson.build +++ b/drivers/common/meson.build @@ -2,5 +2,6 @@ # Copyright(c) 2018 Cavium, Inc std_deps = ['eal'] +drivers = ['octeontx'] config_flag_fmt = 'RTE_LIBRTE_@0@_COMMON' driver_name_fmt = 'rte_common_@0@' diff --git a/drivers/common/octeontx/Makefile b/drivers/common/octeontx/Makefile new file mode 100644 index 0..dfdb9f196 --- /dev/null +++ b/drivers/common/octeontx/Makefile @@ -0,0 +1,24 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Cavium, Inc +# + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = librte_common_octeontx.a + +CFLAGS += $(WERROR_FLAGS) +EXPORT_MAP := rte_common_octeontx_version.map + +LIBABIVER := 1 + +# +# all source are stored in SRCS-y +# +SRCS-y += octeontx_mbox.c + +LDLIBS += -lrte_eal + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/common/octeontx/meson.build b/drivers/common/octeontx/meson.build new file mode 100644 index 0..8a28ce800 --- /dev/null +++ b/drivers/common/octeontx/meson.build @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Cavium, Inc +# + +sources = files('octeontx_mbox.c' +) diff --git a/drivers/mempool/octeontx/octeontx_mbox.c b/drivers/common/octeontx/octeontx_mbox.c similarity index 83% rename from drivers/mempool/octeontx/octeontx_mbox.c rename to drivers/common/octeontx/octeontx_mbox.c index f8cb6a453..c98e110f3 100644 --- a/drivers/mempool/octeontx/octeontx_mbox.c +++ b/drivers/common/octeontx/octeontx_mbox.c @@ -11,7 +11,6 @@ #include #include "octeontx_mbox.h" -#include "octeontx_pool_logs.h" /* Mbox operation timeout in seconds */ #define MBOX_WAIT_TIME_SEC 3 @@ -60,6 +59,17 @@ struct mbox_ram_hdr { }; }; +int octeontx_logtype_mbox; + +RTE_INIT(otx_init_log); +static void +otx_init_log(void) +{ + octeontx_logtype_mbox = rte_log_register("pmd.octeontx.mbox"); + if (octeontx_logtype_mbox >= 0) + rte_log_set_level(octeontx_logtype_mbox, RTE_LOG_NOTICE); +} + static inline void mbox_msgcpy(volatile uint8_t *d, volatile const uint8_t *s, uint16_t size) { @@ -181,22 +191,49 @@ mbox_send(struct mbox *m, struct octeontx_mbox_hdr *hdr, const void *txmsg, return res; } -static inline int -mbox_setup(struct mbox *m) +int +octeontx_mbox_set_ram_mbox_base(uint8_t *ram_mbox_base) +{ + struct mbox *m = &octeontx_mbox; + + if (m->init_once) + return -EALREADY; + + if (ram_mbox_base == NULL) { + mbox_log_err("Invalid ram_mbox_base=%p", ram_mbox_base); + return -EINVAL; + } + + m->ram_mbox_base = ram_mbox_base; + + if (m->reg != NULL) { + rte_spinlock_init(&m->lock); + m->init_once = 1; + } + + return 0; +} + +int +octeontx_mbox_set_reg(uint8_t *reg) { - if (unlikely(m->init_once == 0)) { + struct
[dpdk-dev] [PATCH v3 1/2] drivers: add common folder
Add driver/common folder and skeleton makefile for adding commonly used functions across mempool, event and net devices. Signed-off-by: Pavan Nikhilesh --- v3 Changes: - Fix common lib naming scheme. v2 Changes: - Removed dependency on bus. Based on discussion on ml http://dpdk.org/ml/archives/dev/2018-March/092822.html http://dpdk.org/ml/archives/dev/2018-March/093271.html drivers/Makefile | 13 +++-- drivers/common/Makefile| 7 +++ drivers/common/meson.build | 6 ++ drivers/meson.build| 11 ++- 4 files changed, 26 insertions(+), 11 deletions(-) create mode 100644 drivers/common/Makefile create mode 100644 drivers/common/meson.build diff --git a/drivers/Makefile b/drivers/Makefile index ee65c87b0..d279c4892 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -3,18 +3,19 @@ include $(RTE_SDK)/mk/rte.vars.mk +DIRS-y += common DIRS-y += bus DIRS-y += mempool -DEPDIRS-mempool := bus +DEPDIRS-mempool := bus common DIRS-y += net -DEPDIRS-net := bus mempool +DEPDIRS-net := bus common mempool DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += bbdev -DEPDIRS-bbdev := bus mempool +DEPDIRS-bbdev := bus common mempool DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += crypto -DEPDIRS-crypto := bus mempool +DEPDIRS-crypto := bus common mempool DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += event -DEPDIRS-event := bus mempool net +DEPDIRS-event := bus common mempool net DIRS-$(CONFIG_RTE_LIBRTE_RAWDEV) += raw -DEPDIRS-raw := bus mempool net event +DEPDIRS-raw := bus common mempool net event include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/common/Makefile b/drivers/common/Makefile new file mode 100644 index 0..192066307 --- /dev/null +++ b/drivers/common/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Cavium, Inc +# + +include $(RTE_SDK)/mk/rte.vars.mk + +include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/common/meson.build b/drivers/common/meson.build new file mode 100644 index 0..ab774b8ef --- /dev/null +++ b/drivers/common/meson.build @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Cavium, Inc + +std_deps = ['eal'] +config_flag_fmt = 'RTE_LIBRTE_@0@_COMMON' +driver_name_fmt = 'rte_common_@0@' diff --git a/drivers/meson.build b/drivers/meson.build index b41a0f18e..5a0b5bc34 100644 --- a/drivers/meson.build +++ b/drivers/meson.build @@ -2,11 +2,12 @@ # Copyright(c) 2017 Intel Corporation # Defines the order in which the drivers are buit. -driver_classes = ['bus', - 'mempool', # depends on bus. - 'net', # depends on bus and mempool. - 'crypto', # depenss on bus, mempool (net in future). - 'event'] # depends on bus, mempool and net. +driver_classes = ['common', + 'bus', + 'mempool', # depends on bus and common. + 'net', # depends on bus, common and mempool. + 'crypto', # depenss on bus, common and mempool (net in future). + 'event'] # depends on bus, common, mempool and net. foreach class:driver_classes drivers = [] -- 2.16.2
Re: [dpdk-dev] [PATCH v3 7/7] MAINTAINERS: add myself as virtio crypto PMD maintainer
> -Original Message- > From: Jay Zhou [mailto:jianjay.z...@huawei.com] > Sent: Sunday, March 25, 2018 9:34 AM > To: dev@dpdk.org > Cc: De Lara Guarch, Pablo ; Zhang, Roy Fan > ; tho...@monjalon.net; > arei.gong...@huawei.com; Zeng, Xin ; > weidong.hu...@huawei.com; wangxinxin.w...@huawei.com; > longpe...@huawei.com; jianjay.z...@huawei.com > Subject: [PATCH v3 7/7] MAINTAINERS: add myself as virtio crypto PMD > maintainer > > Signed-off-by: Jay Zhou > --- > MAINTAINERS | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index a646ca3..be1b394 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -719,6 +719,12 @@ F: drivers/crypto/snow3g/ > F: doc/guides/cryptodevs/snow3g.rst > F: doc/guides/cryptodevs/features/snow3g.ini > > +Virtio > +M: Jay Zhou > +F: drivers/crypto/virtio/ > +F: doc/guides/cryptodevs/virtio.rst > +F: doc/guides/cryptodevs/features/virtio.ini > + > ZUC > M: Pablo de Lara > F: drivers/crypto/zuc/ > -- > 1.8.3.1 > Reviewed-by: Fan Zhang Acked-by: Fan Zhang
Re: [dpdk-dev] [PATCH v2] test/crypto: add MRVL to hash test cases
> -Original Message- > From: Tomasz Duszynski [mailto:t...@semihalf.com] > Sent: Wednesday, March 14, 2018 1:13 PM > To: dev@dpdk.org > Cc: Richardson, Bruce ; De Lara Guarch, Pablo > ; Doherty, Declan > ; Tomasz Duszynski ; > sta...@dpdk.org > Subject: [PATCH v2] test/crypto: add MRVL to hash test cases > > MRVL Crypto PMD supports most of the hash algorithms covered by test > suites thus specific bits should be set in pmd_masks. > > Otherwise blockcipher authonly test returns success even though no real > tests have been executed. > > Fixes: 84e0ded38ac5 ("test/crypto: add mrvl crypto unit tests") > Cc: sta...@dpdk.org > > Signed-off-by: Tomasz Duszynski Acked-by: Pablo de Lara
Re: [dpdk-dev] [PATCH v5] ethdev: return named opaque type instead of void pointer
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Neil Horman > Sent: Saturday, March 24, 2018 2:09 AM > To: Richardson, Bruce > Cc: Yigit, Ferruh ; Mcnamara, John > ; Kovacevic, Marko > ; Thomas Monjalon ; Pattan, > Reshma ; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v5] ethdev: return named opaque type instead > of void pointer > > On Fri, Mar 23, 2018 at 05:00:34PM +, Bruce Richardson wrote: > > On Wed, Mar 21, 2018 at 09:04:01AM -0400, Neil Horman wrote: > > > On Tue, Mar 20, 2018 at 04:34:04PM +, Ferruh Yigit wrote: > > > > "struct rte_eth_rxtx_callback" is defined as internal data structure and > > > > used as named opaque type. > > > > > > > > So the functions that are adding callbacks can return objects in this > > > > type instead of void pointer. > > > > > > > > Also const qualifier added to "struct rte_eth_rxtx_callback *" to > > > > protect it better from application modification. > > > > > > > > Signed-off-by: Ferruh Yigit > > > > --- > > > > v2: > > > > * keep using struct * in parameters, instead add callback functions > > > > return struct rte_eth_rxtx_callback pointer. > > > > > > > > v4: > > > > * Remove deprecation notice. LIBABIVER already increased in this release > > > > > > > > v5: > > > > * add const qualifier to rte_eth_rxtx_callback > > > I still wish we could find a way to remove the inline functions and truly > > > protect that struct, but a const is definately better than nothing > > > > > > Acked-by: Neil Horman > > > > > Actually, I think we should do exactly that - convert the rx and tx burst > > calls into actual function calls (and consider any other inlined functions > > too). The cost would be the overhead of making an additional function call > > per-burst, which is likely to be pretty minimal for most common burst sizes > > e.g. 32. > > > > We did some quick tests here with the i40e driver, and for a burst size of > > 32 saw less than 1% perf drop, and for even a small burst of 8 saw less > > than 5% drop. Note that this is testing with testpmd, which has nothing but > > I/O in the datapath. A real-world app is likely to do far more with the > > packets and therefore see proportionally far less perf hit. > > > > Thoughts? > > > > /Bruce > > > > PS: This un-inlining could probably be applied to other device types too, > > e.g. cryptodev (though probably not eventdev as it tends to have smaller > > bursts in some use-cases). > > > I would be 1000% on board with this conversion. +1 from me too. It would allow to greatly simplify support and further development for ethdev. Konstantin
Re: [dpdk-dev] [PATCH v3 3/7] net/dpaa2: change into dynamic logging
On 3/23/2018 5:34 PM, Shreyansh Jain wrote: Signed-off-by: Shreyansh Jain --- config/common_base| 5 - config/defconfig_arm64-dpaa2-linuxapp-gcc | 8 - doc/guides/nics/dpaa2.rst | 42 +++-- drivers/net/dpaa2/Makefile| 6 - drivers/net/dpaa2/base/dpaa2_hw_dpni.c| 30 ++-- drivers/net/dpaa2/dpaa2_ethdev.c | 290 +++--- drivers/net/dpaa2/dpaa2_pmd_logs.h| 41 + drivers/net/dpaa2/dpaa2_rxtx.c| 59 +++--- 8 files changed, 259 insertions(+), 222 deletions(-) create mode 100644 drivers/net/dpaa2/dpaa2_pmd_logs.h Acked-by: Hemant Agrawal
Re: [dpdk-dev] [PATCH v3 2/7] mempool/dpaa2: change to dynamic logging
On 3/23/2018 5:34 PM, Shreyansh Jain wrote: Signed-off-by: Shreyansh Jain --- drivers/mempool/dpaa2/Makefile| 6 --- drivers/mempool/dpaa2/dpaa2_hw_mempool.c | 60 +-- drivers/mempool/dpaa2/dpaa2_hw_mempool_logs.h | 38 + 3 files changed, 75 insertions(+), 29 deletions(-) create mode 100644 drivers/mempool/dpaa2/dpaa2_hw_mempool_logs.h Acked-by: Hemant Agrawal
Re: [dpdk-dev] [PATCH v3 3/3] net/i40e: enable runtime queue setup
> -Original Message- > From: Ananyev, Konstantin > Sent: Monday, March 26, 2018 3:46 AM > To: Zhang, Qi Z ; tho...@monjalon.net > Cc: dev@dpdk.org; Xing, Beilei ; Wu, Jingjing > ; Lu, Wenzhuo > Subject: RE: [PATCH v3 3/3] net/i40e: enable runtime queue setup > > Hi Qi, > > > > > Expose the runtime queue configuration capability and enhance > > i40e_dev_[rx|tx]_queue_setup to handle the situation when device > > already started. > > > > Signed-off-by: Qi Zhang > > --- > > v3: > > - no queue start/stop in setup/release > > - return fail when required rx/tx function conflict with > > exist setup > > > > drivers/net/i40e/i40e_ethdev.c | 4 +++ > > drivers/net/i40e/i40e_rxtx.c | 64 > ++ > > 2 files changed, 68 insertions(+) > > > > diff --git a/drivers/net/i40e/i40e_ethdev.c > > b/drivers/net/i40e/i40e_ethdev.c index 508b4171c..68960dcaa 100644 > > --- a/drivers/net/i40e/i40e_ethdev.c > > +++ b/drivers/net/i40e/i40e_ethdev.c > > @@ -3197,6 +3197,10 @@ i40e_dev_info_get(struct rte_eth_dev *dev, > struct rte_eth_dev_info *dev_info) > > DEV_TX_OFFLOAD_GRE_TNL_TSO | > > DEV_TX_OFFLOAD_IPIP_TNL_TSO | > > DEV_TX_OFFLOAD_GENEVE_TNL_TSO; > > + dev_info->runtime_queue_setup_capa = > > + DEV_RUNTIME_RX_QUEUE_SETUP | > > + DEV_RUNTIME_TX_QUEUE_SETUP; > > + > > dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) * > > sizeof(uint32_t); > > dev_info->reta_size = pf->hash_lut_size; diff --git > > a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index > > 1217e5a61..9eb009d63 100644 > > --- a/drivers/net/i40e/i40e_rxtx.c > > +++ b/drivers/net/i40e/i40e_rxtx.c > > @@ -1712,6 +1712,7 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev > *dev, > > uint16_t len, i; > > uint16_t reg_idx, base, bsf, tc_mapping; > > int q_offset, use_def_burst_func = 1; > > + int ret = 0; > > > > if (hw->mac.type == I40E_MAC_VF || hw->mac.type == > I40E_MAC_X722_VF) { > > vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); > > @@ -1841,6 +1842,36 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev > *dev, > > rxq->dcb_tc = i; > > } > > > > + if (dev->data->dev_started) { > > + ret = i40e_rx_queue_init(rxq); > > + if (ret != I40E_SUCCESS) { > > + PMD_DRV_LOG(ERR, > > + "Failed to do RX queue initialization"); > > + return ret; > > + } > > We probably also have to do here: > > if (use_def_burst_func != 0 && ad-> rx_bulk_alloc_allowed) {error;} > > and we have to do that before we assign ad-> rx_bulk_alloc_allowed (inside > rx_queue_setup() few lines above). Got your point and agree with all following comments. Thanks Qi > > > > + /* check vector conflict */ > > + if (ad->rx_vec_allowed) { > > + if (i40e_rxq_vec_setup(rxq)) { > > + PMD_DRV_LOG(ERR, "Failed vector rx setup"); > > + i40e_dev_rx_queue_release(rxq); > > + return -EINVAL; > > + } > > + } > > + /* check scatterred conflict */ > > + if (!dev->data->scattered_rx) { > > + uint16_t buf_size = > > + (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) - > > + RTE_PKTMBUF_HEADROOM); > > + > > + if ((rxq->max_pkt_len + 2 * I40E_VLAN_TAG_SIZE) > > > + buf_size) { > > + PMD_DRV_LOG(ERR, "Scattered rx is required"); > > + i40e_dev_rx_queue_release(rxq); > > + return -EINVAL; > > + } > > + } > > + } > > + > > return 0; > > } > > > > @@ -1980,6 +2011,8 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev > *dev, > > const struct rte_eth_txconf *tx_conf) { > > struct i40e_hw *hw = > I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > + struct i40e_adapter *ad = > > + I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private); > > struct i40e_vsi *vsi; > > struct i40e_pf *pf = NULL; > > struct i40e_vf *vf = NULL; > > @@ -1989,6 +2022,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev > *dev, > > uint16_t tx_rs_thresh, tx_free_thresh; > > uint16_t reg_idx, i, base, bsf, tc_mapping; > > int q_offset; > > + int ret = 0; > > > > if (hw->mac.type == I40E_MAC_VF || hw->mac.type == > I40E_MAC_X722_VF) { > > vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); > > @@ -2162,6 +2196,36 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev > *dev, > > txq->dcb_tc = i; > > } > > > > + if (dev->data->dev_started) { > > + ret = i40e_tx_queue_init(txq); > > + if (ret != I40E_SUCCESS) { > > +
Re: [dpdk-dev] [PATCH] crypto/mrvl: add missing library dependencies
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Tomasz Duszynski > Sent: Wednesday, March 21, 2018 9:45 AM > To: dev@dpdk.org > Cc: d...@marvell.com; nsams...@marvell.com; j...@semihalf.com; > jianbo@arm.com; De Lara Guarch, Pablo > ; sta...@dpdk.org; Tomasz Duszynski > > Subject: [dpdk-dev] [PATCH] crypto/mrvl: add missing library dependencies > > While trying to do a shared build one will get linkage error since a couple of > library dependencies are missing from a makefile. > > At some point there was a batch update of all PMDs but mrvl crypto was > missed back then. > > Necessary makefile changes were introduced in commit cbc12b0a96f5 > ("mk: do not generate LDLIBS from directory dependencies") > > Signed-off-by: Tomasz Duszynski Applied to dpdk-next-crypto. Thanks, Pablo
Re: [dpdk-dev] [PATCH v2] test/crypto: add MRVL to hash test cases
> -Original Message- > From: De Lara Guarch, Pablo > Sent: Monday, March 26, 2018 9:29 AM > To: 'Tomasz Duszynski' ; dev@dpdk.org > Cc: Richardson, Bruce ; Doherty, Declan > ; sta...@dpdk.org > Subject: RE: [PATCH v2] test/crypto: add MRVL to hash test cases > > > > > -Original Message- > > From: Tomasz Duszynski [mailto:t...@semihalf.com] > > Sent: Wednesday, March 14, 2018 1:13 PM > > To: dev@dpdk.org > > Cc: Richardson, Bruce ; De Lara Guarch, > > Pablo ; Doherty, Declan > > ; Tomasz Duszynski ; > > sta...@dpdk.org > > Subject: [PATCH v2] test/crypto: add MRVL to hash test cases > > > > MRVL Crypto PMD supports most of the hash algorithms covered by test > > suites thus specific bits should be set in pmd_masks. > > > > Otherwise blockcipher authonly test returns success even though no > > real tests have been executed. > > > > Fixes: 84e0ded38ac5 ("test/crypto: add mrvl crypto unit tests") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Tomasz Duszynski > > Acked-by: Pablo de Lara Applied to dpdk-next-crypto. Thanks, Pablo
[dpdk-dev] [PATCH v4 1/3] ether: support runtime queue setup
The patch let etherdev driver expose the capability flag through rte_eth_dev_info_get when it support runtime queue configuraiton, then base on the flag rte_eth_[rx|tx]_queue_setup could decide continue to setup the queue or just return fail when device already started. Signed-off-by: Qi Zhang --- v3: - not overload deferred start - rename deferred setup to runtime setup v2: - enhance comment doc/guides/nics/features.rst | 8 lib/librte_ether/rte_ethdev.c | 30 ++ lib/librte_ether/rte_ethdev.h | 7 +++ 3 files changed, 33 insertions(+), 12 deletions(-) diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index 1b4fb979f..6983faa4e 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -892,7 +892,15 @@ Documentation describes performance values. See ``dpdk.org/doc/perf/*``. +.. _nic_features_queue_runtime_setup_capabilities: +Queue runtime setup capabilities +- + +Supports queue setup / release after device started. + +* **[provides] rte_eth_dev_info**: ``runtime_queue_config_capa:DEV_RUNTIME_RX_QUEUE_SETUP,DEV_RUNTIME_TX_QUEUE_SETUP``. +* **[related] API**: ``rte_eth_dev_info_get()``. .. _nic_features_other: diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 0590f0c10..343b1a6c0 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id, return -EINVAL; } - if (dev->data->dev_started) { - RTE_PMD_DEBUG_TRACE( - "port %d must be stopped to allow configuration\n", port_id); - return -EBUSY; - } - RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP); RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP); @@ -1474,6 +1468,15 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id, return -EINVAL; } + if (dev->data->dev_started && + !(dev_info.runtime_queue_setup_capa & + DEV_RUNTIME_RX_QUEUE_SETUP)) + return -EBUSY; + + if (dev->data->rx_queue_state[rx_queue_id] != + RTE_ETH_QUEUE_STATE_STOPPED) + return -EBUSY; + rxq = dev->data->rx_queues; if (rxq[rx_queue_id]) { RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release, @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id, return -EINVAL; } - if (dev->data->dev_started) { - RTE_PMD_DEBUG_TRACE( - "port %d must be stopped to allow configuration\n", port_id); - return -EBUSY; - } - RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP); RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup, -ENOTSUP); @@ -1596,6 +1593,15 @@ rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id, return -EINVAL; } + if (dev->data->dev_started && + !(dev_info.runtime_queue_setup_capa & + DEV_RUNTIME_TX_QUEUE_SETUP)) + return -EBUSY; + + if (dev->data->rx_queue_state[tx_queue_id] != + RTE_ETH_QUEUE_STATE_STOPPED) + return -EBUSY; + txq = dev->data->tx_queues; if (txq[tx_queue_id]) { RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release, diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 036153306..4e2088458 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -981,6 +981,11 @@ struct rte_eth_conf { */ #define DEV_TX_OFFLOAD_SECURITY 0x0002 +#define DEV_RUNTIME_RX_QUEUE_SETUP 0x0001 +/**< Deferred setup rx queue */ +#define DEV_RUNTIME_TX_QUEUE_SETUP 0x0002 +/**< Deferred setup tx queue */ + /* * If new Tx offload capabilities are defined, they also must be * mentioned in rte_tx_offload_names in rte_ethdev.c file. @@ -1029,6 +1034,8 @@ struct rte_eth_dev_info { /** Configured number of rx/tx queues */ uint16_t nb_rx_queues; /**< Number of RX queues. */ uint16_t nb_tx_queues; /**< Number of TX queues. */ + uint64_t runtime_queue_setup_capa; + /**< queues can be setup after dev_start (DEV_DEFERRED_). */ }; /** -- 2.13.6
[dpdk-dev] [PATCH v4 0/3] runtime queue setup
According to exist implementation,rte_eth_[rx|tx]_queue_setup will always return fail if device is already started(rte_eth_dev_start). This can't satisfied the usage when application want to deferred setup part of the queues while keep traffic running on those queues already be setup. example: rte_eth_dev_config(nb_rxq = 2, nb_txq =2) rte_eth_rx_queue_setup(idx = 0 ...) rte_eth_rx_queue_setup(idx = 0 ...) rte_eth_dev_start(...) /* [rx|tx]_burst is ready to start on queue 0 */ rte_eth_rx_queue_setup(idx=1 ...) /* fail*/ Basically this is not a general hardware limitation, because for NIC like i40e, ixgbe, it is not necessary to stop the whole device before configure a fresh queue or reconfigure an exist queue with no traffic on it. The patch let etherdev driver expose the capability flag through rte_eth_dev_info_get when it support deferred queue configuraiton, then base on these flag, rte_eth_[rx|tx]_queue_setup could decide continue to setup the queue or just return fail when device already started. v4: - fix i40e rx/tx funciton conflict handle. - no need conflict check for first rx/tx queue at runtime setup. - fix missing offload paramter in testpmd cmdline. v3: - not overload deferred start. - rename deferred setup to runtime setup. - remove unecessary testpmd parameters (patch 2/4 of v2) - add offload support to testpmd queue setup command line - i40e fix: return fail when required rx/tx function conflict with exist setup. v2: - enhance comment in rte_ethdev.h Qi Zhang (3): ether: support runtime queue setup app/testpmd: add command for queue setup net/i40e: enable runtime queue setup app/test-pmd/cmdline.c | 129 ++ doc/guides/nics/features.rst| 8 ++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 + drivers/net/i40e/i40e_ethdev.c | 4 + drivers/net/i40e/i40e_rxtx.c| 195 lib/librte_ether/rte_ethdev.c | 30 +++-- lib/librte_ether/rte_ethdev.h | 7 + 7 files changed, 345 insertions(+), 35 deletions(-) -- 2.13.6
[dpdk-dev] [PATCH v4 2/3] app/testpmd: add command for queue setup
Add new command to setup queue: queue setup (rx|tx) (port_id) (queue_idx) (ring_size) (offloads) rte_eth_[rx|tx]_queue_setup will be called corresponsively Signed-off-by: Qi Zhang --- v4: - fix missing offload in command line. v3: - add offload parameter to queue setup command. - couple code refactory. app/test-pmd/cmdline.c | 129 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 ++ 2 files changed, 136 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index d1dc1de6c..1b0bbd9f4 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -774,6 +774,9 @@ static void cmd_help_long_parsed(void *parsed_result, "port tm hierarchy commit (port_id) (clean_on_fail)\n" " Commit tm hierarchy.\n\n" + "queue setup (rx|tx) (port_id) (queue_idx) (ring_size) (offloads)\n" + " setup a not started queue or re-setup a started queue.\n\n" + , list_pkt_forwarding_modes() ); } @@ -16030,6 +16033,131 @@ cmdline_parse_inst_t cmd_load_from_file = { }, }; +/* Queue Setup */ + +/* Common result structure for queue setup */ +struct cmd_queue_setup_result { + cmdline_fixed_string_t queue; + cmdline_fixed_string_t setup; + cmdline_fixed_string_t rxtx; + portid_t port_id; + uint16_t queue_idx; + uint16_t ring_size; + uint64_t offloads; +}; + +/* Common CLI fields for queue setup */ +cmdline_parse_token_string_t cmd_queue_setup_queue = + TOKEN_STRING_INITIALIZER(struct cmd_queue_setup_result, queue, "queue"); +cmdline_parse_token_string_t cmd_queue_setup_setup = + TOKEN_STRING_INITIALIZER(struct cmd_queue_setup_result, setup, "setup"); +cmdline_parse_token_string_t cmd_queue_setup_rxtx = + TOKEN_STRING_INITIALIZER(struct cmd_queue_setup_result, rxtx, "rx#tx"); +cmdline_parse_token_num_t cmd_queue_setup_port_id = + TOKEN_NUM_INITIALIZER(struct cmd_queue_setup_result, port_id, UINT16); +cmdline_parse_token_num_t cmd_queue_setup_queue_idx = + TOKEN_NUM_INITIALIZER(struct cmd_queue_setup_result, queue_idx, UINT16); +cmdline_parse_token_num_t cmd_queue_setup_ring_size = + TOKEN_NUM_INITIALIZER(struct cmd_queue_setup_result, ring_size, UINT16); +cmdline_parse_token_num_t cmd_queue_setup_offloads = + TOKEN_NUM_INITIALIZER(struct cmd_queue_setup_result, offloads, UINT64); + +static void +cmd_queue_setup_parsed( + void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_queue_setup_result *res = parsed_result; + struct rte_port *port; + struct rte_mempool *mp; + unsigned int socket_id; + uint8_t rx = 1; + int ret; + + if (port_id_is_invalid(res->port_id, ENABLED_WARN)) + return; + + if (!strcmp(res->rxtx, "tx")) + rx = 0; + + if (rx && res->ring_size <= rx_free_thresh) { + printf("Invalid ring_size, must >= rx_free_thresh: %d\n", + rx_free_thresh); + return; + } + + if (rx && res->queue_idx >= nb_rxq) { + printf("Invalid rx queue index, must < nb_rxq: %d\n", + nb_rxq); + return; + } + + if (!rx && res->queue_idx >= nb_txq) { + printf("Invalid tx queue index, must < nb_txq: %d\n", + nb_txq); + return; + } + + port = &ports[res->port_id]; + if (rx) { + struct rte_eth_rxconf rxconf = port->rx_conf; + + rxconf.offloads = res->offloads; + socket_id = rxring_numa[res->port_id]; + if (!numa_support || socket_id == NUMA_NO_CONFIG) + socket_id = port->socket_id; + + mp = mbuf_pool_find(socket_id); + if (mp == NULL) { + printf("Failed to setup RX queue: " + "No mempool allocation" + " on the socket %d\n", + rxring_numa[res->port_id]); + return; + } + ret = rte_eth_rx_queue_setup(res->port_id, +res->queue_idx, +res->ring_size, +socket_id, +&rxconf, +mp); + if (ret) + printf("Failed to setup RX queue\n"); + } else { + struct rte_eth_txconf txconf = port->tx_conf; + + txconf.offloads = res->offloads; + socket_id = txring_numa[res->port_id]; + if (!numa_support || socket_id
[dpdk-dev] [PATCH v4 3/3] net/i40e: enable runtime queue setup
Expose the runtime queue configuration capability and enhance i40e_dev_[rx|tx]_queue_setup to handle the situation when device already started. Signed-off-by: Qi Zhang --- v4: - fix rx/tx conflict check. - no need conflict check for first rx/tx queue at runtime setup. v3: - no queue start/stop in setup/release - return fail when required rx/tx function conflict with exist setup drivers/net/i40e/i40e_ethdev.c | 4 + drivers/net/i40e/i40e_rxtx.c | 195 - 2 files changed, 176 insertions(+), 23 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 508b4171c..68960dcaa 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -3197,6 +3197,10 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) DEV_TX_OFFLOAD_GRE_TNL_TSO | DEV_TX_OFFLOAD_IPIP_TNL_TSO | DEV_TX_OFFLOAD_GENEVE_TNL_TSO; + dev_info->runtime_queue_setup_capa = + DEV_RUNTIME_RX_QUEUE_SETUP | + DEV_RUNTIME_TX_QUEUE_SETUP; + dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) * sizeof(uint32_t); dev_info->reta_size = pf->hash_lut_size; diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 1217e5a61..101c20ba0 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -1692,6 +1692,75 @@ i40e_dev_supported_ptypes_get(struct rte_eth_dev *dev) return NULL; } +static int +i40e_dev_first_rx_queue(struct rte_eth_dev *dev, + uint16_t queue_idx) +{ + uint16_t i; + + for (i = 0; i < dev->data->nb_rx_queues; i++) { + if (i != queue_idx && dev->data->rx_queues[i]) + return 0; + } + + return 1; +} + +static int +i40e_dev_rx_queue_setup_runtime(struct rte_eth_dev *dev, + struct i40e_rx_queue *rxq) +{ + struct i40e_adapter *ad = + I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private); + int use_def_burst_func = + check_rx_burst_bulk_alloc_preconditions(rxq); + uint16_t buf_size = + (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) - + RTE_PKTMBUF_HEADROOM); + int use_scattered_rx = + ((rxq->max_pkt_len + 2 * I40E_VLAN_TAG_SIZE) > buf_size) ? + 1 : 0; + + if (i40e_rx_queue_init(rxq) != I40E_SUCCESS) { + PMD_DRV_LOG(ERR, + "Failed to do RX queue initialization"); + return -EINVAL; + } + + if (i40e_dev_first_rx_queue(dev, rxq->queue_id)) { + /** +* If it is the first queue to setup, +* set all flags to default and call +* i40e_set_rx_function. +*/ + ad->rx_bulk_alloc_allowed = true; + ad->rx_vec_allowed = true; + dev->data->scattered_rx = use_scattered_rx; + if (use_def_burst_func) + ad->rx_bulk_alloc_allowed = false; + i40e_set_rx_function(dev); + return 0; + } + + /* check bulk alloc conflict */ + if (ad->rx_bulk_alloc_allowed && use_def_burst_func) { + PMD_DRV_LOG(ERR, "Can't use default burst."); + return -EINVAL; + } + /* check scatterred conflict */ + if (!dev->data->scattered_rx && use_scattered_rx) { + PMD_DRV_LOG(ERR, "Scattered rx is required."); + return -EINVAL; + } + /* check vector conflict */ + if (ad->rx_vec_allowed && i40e_rxq_vec_setup(rxq)) { + PMD_DRV_LOG(ERR, "Failed vector rx setup."); + return -EINVAL; + } + + return 0; +} + int i40e_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx, @@ -1808,25 +1877,6 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev, i40e_reset_rx_queue(rxq); rxq->q_set = TRUE; - dev->data->rx_queues[queue_idx] = rxq; - - use_def_burst_func = check_rx_burst_bulk_alloc_preconditions(rxq); - - if (!use_def_burst_func) { -#ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC - PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are " -"satisfied. Rx Burst Bulk Alloc function will be " -"used on port=%d, queue=%d.", -rxq->port_id, rxq->queue_id); -#endif /* RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC */ - } else { - PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are " -"not satisfied, Scattered Rx is requested, " -"or RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC is " -"not enabled on port=
Re: [dpdk-dev] [PATCH] assign QAT cryptodev to correct NUMA node
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Trahe, Fiona > Sent: Friday, March 9, 2018 6:18 PM > To: Lee Roberts ; Griffin, John > ; Jain, Deepak K > Cc: dev@dpdk.org; Trahe, Fiona > Subject: Re: [dpdk-dev] [PATCH] assign QAT cryptodev to correct NUMA > node > > Hi Lee, > Thanks for investigating this. > > > -Original Message- > > From: Lee Roberts [mailto:lee.robe...@hpe.com] > > Sent: Friday, March 9, 2018 6:01 PM > > To: Griffin, John ; Trahe, Fiona > > ; Jain, Deepak K > > Cc: dev@dpdk.org; Lee Roberts > > Subject: [PATCH] assign QAT cryptodev to correct NUMA node > > > > rte_cryptodev_pmd_init_params should use NUMA node of the QAT > device > > for its socket_id rather than the socket_id of the initializing process. > > > > Signed-off-by: Lee Roberts > Acked-by: Fiona Trahe Modified title to "crypto/qat: assign device to correct NUMA node" Applied to dpdk-next-crypto. Thanks, Pablo
Re: [dpdk-dev] [PATCH 0/3] add ifcvf driver
Hi Maxime, > -Original Message- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Sunday, March 25, 2018 5:51 PM > To: Wang, Xiao W ; dev@dpdk.org > Cc: Wang, Zhihong ; y...@fridaylinux.org; Liang, > Cunming ; Xu, Rosen ; Chen, > Junjie J ; Daly, Dan > Subject: Re: [PATCH 0/3] add ifcvf driver > > > > On 03/23/2018 11:27 AM, Wang, Xiao W wrote: > > Hi Maxime, > > > >> -Original Message- > >> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > >> Sent: Thursday, March 22, 2018 4:48 AM > >> To: Wang, Xiao W ; dev@dpdk.org > >> Cc: Wang, Zhihong ; y...@fridaylinux.org; Liang, > >> Cunming ; Xu, Rosen ; > Chen, > >> Junjie J ; Daly, Dan > >> Subject: Re: [PATCH 0/3] add ifcvf driver > >> > >> Hi Xiao, > >> > >> On 03/15/2018 05:49 PM, Wang, Xiao W wrote: > >>> Hi Maxime, > >>> > -Original Message- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Sunday, March 11, 2018 2:24 AM > To: Wang, Xiao W ; dev@dpdk.org > Cc: Wang, Zhihong ; y...@fridaylinux.org; > Liang, > Cunming ; Xu, Rosen ; > >> Chen, > Junjie J ; Daly, Dan > Subject: Re: [PATCH 0/3] add ifcvf driver > > Hi Xiao, > > On 03/10/2018 12:08 AM, Xiao Wang wrote: > > This patch set has dependency on > http://dpdk.org/dev/patchwork/patch/35635/ > > (vhost: support selective datapath); > > > > ifc VF is compatible with virtio vring operations, this driver > > implements > > vDPA driver ops which configures ifc VF to be a vhost data path > accelerator. > > > > ifcvf driver uses vdev as a control domain to manage ifc VFs that belong > > to it. It registers vDPA device ops to vhost lib to enable these VFs to > > be > > used as vhost data path accelerator. > > > > Live migration feature is supported by ifc VF and this driver enables > > it based on vhost lib. > > > > vDPA needs to create different containers for different devices, thus > > this > > patch set adds APIs in eal/vfio to support multiple container. > Thanks for this! That will avoind having to duplicate these functions > for every new offload driver. > > > > > > Junjie Chen (1): > > eal/vfio: add support for multiple container > > > > Xiao Wang (2): > > bus/pci: expose sysfs parsing API > > Still, I'm not convinced the offload device should be a virtual device. > It is a real PCI device, why not having a new device type for offload > devices, and let the device to be probed automatically by the existing > device model? > >>> > >>> IFC VFs are generated from SRIOV, with the PF driven by kernel driver. > >>> In DPDK we need to have something to represent PF, to register itself as > >>> a vDPA engine, so a virtual device is used for this purpose. > >> I went through the code, and something is not clear to me. > >> > >> Why do we need to have a representation of the PF in DPDK? > >> Why cannot we just bind at VF level? > > > > 1. With the vdev representation we could use it to talk to PF kernel driver > > to > do flow configuration, we can implement > > flow API on the vdev in future for this purpose. Using a vdev allows > introducing this kind of control plane thing. > > > > 2. When port representor is ready, we would integrate it into ifcvf driver, > then each VF will have a > > Representor port. For now we don’t have port representor, so this patch set > manages VF resource internally. > > Ok, we may need to have a vdev to represent the PF, but we need to be > able to bind at VF level anyway. Device management on VF level is feasible, according to the previous port-representor patch. A tuple of (PF_addr, VF_index) can identify a certain VF, we have vport_mask and device addr to describe a PF, and we can specify a VF index to create a representor port, so , the VF port creation will be on-demand at VF level. +struct port_rep_parameters { + uint64_t vport_mask; + struct { + char bus[RTE_DEV_NAME_MAX_LEN]; + char device[RTE_DEV_NAME_MAX_LEN]; + } parent; +}; +int +rte_representor_port_register(char *pf_addr_str, + uint32_t vport_id, uint16_t *port_id) Besides, IFCVF supports live migration, vDPA exerts IFCVF device better than QEMU (this patch has enabled LM feature). vDPA is the main usage model for IFCVF, and one DPDK application taking control of all the VF resource management is a straightforward usage model. Best Regards, Xiao > > Else, how do you support passing two VFs of the same PF to different > DPDK applications? > Or have some VFs managed by Kernel or QEMU and some by the DPDK > application? My feeling is that current implementation is creating an > artificial constraint. > > Isn't there a possibility to have the virtual representation for the PF > to be probed separately? Or created automatically when the first VF of a > PF is p
Re: [dpdk-dev] [PATCH v3] net/mlx4: support CRC strip toggling
On Sun, Mar 25, 2018 at 08:19:29PM +, Ophir Munk wrote: > Previous to this commit mlx4 CRC stripping was executed by default and > there was no verbs API to disable it. > > Signed-off-by: Ophir Munk > --- > v1: initial version > v2: following internal reviews > v3: following dpdk.org mailing list reviews Except for the remaining extra space mentioned below :) Acked-by: Adrien Mazarguil > diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c > index ee93daf..eea6e93 100644 > --- a/drivers/net/mlx4/mlx4.c > +++ b/drivers/net/mlx4/mlx4.c > @@ -562,7 +562,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct > rte_pci_device *pci_dev) > (device_attr.vendor_part_id == >PCI_DEVICE_ID_MELLANOX_CONNECTX3PRO); > DEBUG("L2 tunnel checksum offloads are %ssupported", > - (priv->hw_csum_l2tun ? "" : "not ")); > + priv->hw_csum_l2tun ? "" : "not "); > priv->hw_rss_sup = device_attr_ex.rss_caps.rx_hash_fields_mask; > if (!priv->hw_rss_sup) { > WARN("no RSS capabilities reported; disabling support" > @@ -578,6 +578,10 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct > rte_pci_device *pci_dev) > } > DEBUG("supported RSS hash fields mask: %016" PRIx64, > priv->hw_rss_sup); > + priv->hw_fcs_strip = !!(device_attr_ex.raw_packet_caps & > + IBV_RAW_PACKET_CAP_SCATTER_FCS); I know the extra space before IBV_RAW_PACKET_CAP_SCATTER_FCS is present in the original mlx5 code, but it's misaligned there also. This line should be aligned with "device_attr_ex.raw_packet_caps" for consistency. -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
On Sun, Mar 25, 2018 at 09:17:20PM +0300, Vladimir Medvedkin wrote: > Hi, > > 2018-03-14 14:09 GMT+03:00 Bruce Richardson : > > > On Wed, Feb 21, 2018 at 09:44:54PM +, Medvedkin Vladimir wrote: > > > RIB is an alternative to current LPM library. > > > It solves the following problems > > > - Increases the speed of control plane operations against lpm such as > > >adding/deleting routes > > > - Adds abstraction from dataplane algorithms, so it is possible to add > > >different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc > > >in addition to current dir24_8 > > > - It is possible to keep user defined application specific additional > > >information in struct rte_rib_node which represents route entry. > > >It can be next hop/set of next hops (i.e. active and feasible), > > >pointers to link rte_rib_node based on some criteria (i.e. next_hop), > > >plenty of additional control plane information. > > > - For dir24_8 implementation it is possible to remove > > rte_lpm_tbl_entry.depth > > >field that helps to save 6 bits. > > > - Also new dir24_8 implementation supports different next_hop sizes > > >(1/2/4/8 bytes per next hop) > > > - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary > > operator. > > >Instead it returns special default value if there is no route. > > > > > > Signed-off-by: Medvedkin Vladimir > > > --- > > > config/common_base | 6 + > > > doc/api/doxy-api.conf | 1 + > > > lib/Makefile | 2 + > > > lib/librte_rib/Makefile| 22 ++ > > > lib/librte_rib/rte_dir24_8.c | 482 ++ > > +++ > > > lib/librte_rib/rte_dir24_8.h | 116 > > > lib/librte_rib/rte_rib.c | 526 ++ > > +++ > > > lib/librte_rib/rte_rib.h | 322 +++ > > > lib/librte_rib/rte_rib_version.map | 18 ++ > > > mk/rte.app.mk | 1 + > > > 10 files changed, 1496 insertions(+) > > > create mode 100644 lib/librte_rib/Makefile > > > create mode 100644 lib/librte_rib/rte_dir24_8.c > > > create mode 100644 lib/librte_rib/rte_dir24_8.h > > > create mode 100644 lib/librte_rib/rte_rib.c > > > create mode 100644 lib/librte_rib/rte_rib.h > > > create mode 100644 lib/librte_rib/rte_rib_version.map > > > > > > > First pass review comments. For now just reviewed the main public header > > file rte_rib.h. Later reviews will cover the other files as best I can. > > > > /Bruce > > > > > > > diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h > > > new file mode 100644 > > > index 000..6eac8fb > > > --- /dev/null > > > +++ b/lib/librte_rib/rte_rib.h > > > @@ -0,0 +1,322 @@ > > > +/* SPDX-License-Identifier: BSD-3-Clause > > > + * Copyright(c) 2018 Vladimir Medvedkin > > > + */ > > > + > > > +#ifndef _RTE_RIB_H_ > > > +#define _RTE_RIB_H_ > > > + > > > +/** > > > + * @file > > > + * Compressed trie implementation for Longest Prefix Match > > > + */ > > > + > > > +/** @internal Macro to enable/disable run-time checks. */ > > > +#if defined(RTE_LIBRTE_RIB_DEBUG) > > > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do {\ > > > + if (cond) \ > > > + return retval; \ > > > +} while (0) > > > +#else > > > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) > > > +#endif > > > > use RTE_ASSERT? > > > it was done just like it was done in the LPM lib. But if you think it > should be RTE_ASSERT so be it. > > > > > > > + > > > +#define RTE_RIB_VALID_NODE 1 > > > > should there be an INVALID_NODE macro? > > > No > > > > > > > +#define RTE_RIB_GET_NXT_ALL 0 > > > +#define RTE_RIB_GET_NXT_COVER1 > > > + > > > +#define RTE_RIB_INVALID_ROUTE0 > > > +#define RTE_RIB_VALID_ROUTE 1 > > > + > > > +/** Max number of characters in RIB name. */ > > > +#define RTE_RIB_NAMESIZE 64 > > > + > > > +/** Maximum depth value possible for IPv4 RIB. */ > > > +#define RTE_RIB_MAXDEPTH 32 > > > > I think we should have IPv4 in the name here. Will it not be extended to > > support IPv6 in future? > > > I think there should be a separate implementation of the library for ipv6 > I can understand the need for a separate LPM implementation, but should they both not be under the same rib library? > > > > > > + > > > +/** > > > + * Macro to check if prefix1 {key1/depth1} > > > + * is covered by prefix2 {key2/depth2} > > > + */ > > > +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2) > > \ > > > + key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) == 0)\ > > > + && (depth1 > depth2)) > > Neat check! > > > > Any particular reason for using UINT64_MAX here rather than UINT32_MAX? > > in case when depth2 = 0 UINT32_MAX shifted left by 32 bit will remain > UINT32_MAX because shift count will be masked to 5 bits. > > I think you can avoi
[dpdk-dev] [PATCH v3 01/10] lib/librte_vhost: add external backend support
This patch adds external backend support to vhost library. The patch provides new APIs for the external backend to register private data, plus pre and post vhost-user message handlers. Signed-off-by: Fan Zhang --- lib/librte_vhost/rte_vhost.h | 45 ++- lib/librte_vhost/vhost.c | 23 +- lib/librte_vhost/vhost.h | 8 ++-- lib/librte_vhost/vhost_user.c | 29 +--- 4 files changed, 98 insertions(+), 7 deletions(-) diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index d332069..591b731 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2017 Intel Corporation + * Copyright(c) 2010-2018 Intel Corporation */ #ifndef _RTE_VHOST_H_ @@ -88,6 +88,33 @@ struct vhost_device_ops { }; /** + * function prototype for external virtio device to handler device specific + * vhost user messages + * + * @param extern_data + * private data for external backend + * @param msg + * Message pointer + * @param payload + * Message payload + * @param require_reply + * If the handler requires sending a reply, this varaible shall be written 1, + * otherwise 0 + * @return + * 0 on success, -1 on failure + */ +typedef int (*rte_vhost_msg_handler)(int vid, void *msg, + uint32_t *require_reply); + +/** + * pre and post vhost user message handlers + */ +struct rte_vhost_user_dev_extern_ops { + rte_vhost_msg_handler pre_vhost_user_msg_handler; + rte_vhost_msg_handler post_vhost_user_msg_handler; +}; + +/** * Convert guest physical address to host virtual address * * @param mem @@ -434,6 +461,22 @@ int rte_vhost_vring_call(int vid, uint16_t vring_idx); */ uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid); +/** + * register external vhost backend + * + * @param vid + * vhost device ID + * @param extern_data + * private data for external backend + * @param ops + * ops that process external vhost user messages + * @return + * 0 on success, -1 on failure + */ +int +rte_vhost_user_register_extern_backend(int vid, void *extern_data, + struct rte_vhost_user_dev_extern_ops *ops); + #ifdef __cplusplus } #endif diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index a407067..0932537 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2016 Intel Corporation + * Copyright(c) 2010-2018 Intel Corporation */ #include @@ -627,3 +627,24 @@ rte_vhost_rx_queue_count(int vid, uint16_t qid) return *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx; } + +int +rte_vhost_user_register_extern_backend(int vid, void *extern_data, + struct rte_vhost_user_dev_extern_ops *ops) +{ + struct virtio_net *dev; + + dev = get_device(vid); + if (dev == NULL) + return -1; + + dev->extern_data = extern_data; + if (ops) { + dev->extern_ops.pre_vhost_user_msg_handler = + ops->pre_vhost_user_msg_handler; + dev->extern_ops.post_vhost_user_msg_handler = + ops->post_vhost_user_msg_handler; + } + + return 0; +} diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index d947bc9..6aaa46c 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2010-2018 Intel Corporation */ #ifndef _VHOST_NET_CDEV_H_ @@ -241,8 +241,12 @@ struct virtio_net { struct guest_page *guest_pages; int slave_req_fd; -} __rte_cache_aligned; + /* private data for external virtio device */ + void*extern_data; + /* pre and post vhost user message handlers for externel backend */ + struct rte_vhost_user_dev_extern_ops extern_ops; +} __rte_cache_aligned; #define VHOST_LOG_PAGE 4096 diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 90ed211..c064cb3 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2016 Intel Corporation + * Copyright(c) 2010-2018 Intel Corporation */ #include @@ -50,6 +50,8 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { [VHOST_USER_NET_SET_MTU] = "VHOST_USER_NET_SET_MTU", [VHOST_USER_SET_SLAVE_REQ_FD] = "VHOST_USER_SET_SLAVE_REQ_FD", [VHOST_USER_IOTLB_MSG] = "VHOST_USER_IOTLB_MSG", + [VHOST_USER_CRYPTO_CREATE_SESS] = "VHOST_USER_CRYPTO_CREATE_SESS", + [VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS", }; static uint64_t @@ -1379,6 +1381,18 @@ vhost_user_msg_hand
[dpdk-dev] [PATCH v3 03/10] lib/librte_vhost: add session message handler
This patch adds session message handler to vhost crypto. Signed-off-by: Fan Zhang --- lib/librte_vhost/vhost_crypto.c | 428 1 file changed, 428 insertions(+) create mode 100644 lib/librte_vhost/vhost_crypto.c diff --git a/lib/librte_vhost/vhost_crypto.c b/lib/librte_vhost/vhost_crypto.c new file mode 100644 index 000..c639b20 --- /dev/null +++ b/lib/librte_vhost/vhost_crypto.c @@ -0,0 +1,428 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2017-2018 Intel Corporation + */ + +#include + +#include +#include +#include +#ifdef RTE_LIBRTE_VHOST_DEBUG +#include +#endif +#include "vhost.h" +#include "vhost_user.h" +#include "rte_vhost_crypto.h" + +#define NB_MEMPOOL_OBJS(1024) +#define NB_CRYPTO_DESCRIPTORS (1024) +#define NB_CACHE_OBJS (128) + +#define SESSION_MAP_ENTRIES(1024) /**< Max nb sessions per vdev */ +#define MAX_KEY_SIZE (32) +#define VHOST_CRYPTO_MAX_IV_LEN(16) +#define MAX_COUNT_DOWN_TIMES (100) + +#define INHDR_LEN (sizeof(struct virtio_crypto_inhdr)) +#define IV_OFFSET (sizeof(struct rte_crypto_op) + \ + sizeof(struct rte_crypto_sym_op)) + +#ifdef RTE_LIBRTE_VHOST_DEBUG +#define VC_LOG_ERR(fmt, args...) \ + RTE_LOG(ERR, USER1, "[%s] %s() line %u: " fmt "\n", \ + "Vhost-Crypto", __func__, __LINE__, ## args) +#define VC_LOG_INFO(fmt, args...) \ + RTE_LOG(INFO, USER1, "[%s] %s() line %u: " fmt "\n",\ + "Vhost-Crypto", __func__, __LINE__, ## args) + +#define VC_LOG_DBG(fmt, args...) \ + RTE_LOG(DEBUG, USER1, "[%s] %s() line %u: " fmt "\n", \ + "Vhost-Crypto", __func__, __LINE__, ## args) +#else +#define VC_LOG_ERR(fmt, args...) \ + RTE_LOG(ERR, USER1, "[VHOST-Crypto]: " fmt "\n", ## args) +#define VC_LOG_INFO(fmt, args...) \ + RTE_LOG(INFO, USER1, "[VHOST-Crypto]: " fmt "\n", ## args) +#define VC_LOG_DBG(fmt, args...) +#endif + +#define VIRTIO_CRYPTO_FEATURES ((1 << VIRTIO_F_NOTIFY_ON_EMPTY) | \ + (1 << VIRTIO_RING_F_INDIRECT_DESC) |\ + (1 << VIRTIO_RING_F_EVENT_IDX) |\ + (1 << VIRTIO_CRYPTO_SERVICE_CIPHER) | \ + (1 << VIRTIO_CRYPTO_SERVICE_HASH) | \ + (1 << VIRTIO_CRYPTO_SERVICE_MAC) | \ + (1 << VIRTIO_CRYPTO_SERVICE_AEAD) | \ + (1 << VIRTIO_NET_F_CTRL_VQ)) + + +#define GPA_TO_VVA(t, m, a)((t)(uintptr_t)rte_vhost_gpa_to_vva(m, a)) + +/* Macro to get the buffer at the end of rte_crypto_op */ +#define REQ_OP_OFFSET (IV_OFFSET + VHOST_CRYPTO_MAX_IV_LEN) + +/** + * 1-to-1 mapping between RTE_CRYPTO_*ALGO* and VIRTIO_CRYPTO_*ALGO*, for + * algorithms not supported by RTE_CRYPTODEV, the -VIRTIO_CRYPTO_NOTSUPP is + * returned. + */ +static int cipher_algo_transform[] = { + RTE_CRYPTO_CIPHER_NULL, + RTE_CRYPTO_CIPHER_ARC4, + RTE_CRYPTO_CIPHER_AES_ECB, + RTE_CRYPTO_CIPHER_AES_CBC, + RTE_CRYPTO_CIPHER_AES_CTR, + -VIRTIO_CRYPTO_NOTSUPP, /* VIRTIO_CRYPTO_CIPHER_DES_ECB */ + RTE_CRYPTO_CIPHER_DES_CBC, + RTE_CRYPTO_CIPHER_3DES_ECB, + RTE_CRYPTO_CIPHER_3DES_CBC, + RTE_CRYPTO_CIPHER_3DES_CTR, + RTE_CRYPTO_CIPHER_KASUMI_F8, + RTE_CRYPTO_CIPHER_SNOW3G_UEA2, + RTE_CRYPTO_CIPHER_AES_F8, + RTE_CRYPTO_CIPHER_AES_XTS, + RTE_CRYPTO_CIPHER_ZUC_EEA3 +}; + +/** + * VIRTIO_CRYTPO_AUTH_* indexes are not sequential, the gaps are filled with + * -VIRTIO_CRYPTO_BADMSG errors. + */ +static int auth_algo_transform[] = { + RTE_CRYPTO_AUTH_NULL, + RTE_CRYPTO_AUTH_MD5_HMAC, + RTE_CRYPTO_AUTH_SHA1_HMAC, + RTE_CRYPTO_AUTH_SHA224_HMAC, + RTE_CRYPTO_AUTH_SHA256_HMAC, + RTE_CRYPTO_AUTH_SHA384_HMAC, + RTE_CRYPTO_AUTH_SHA512_HMAC, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, + -VIRTIO_CRYPTO_BADMSG, +
[dpdk-dev] [PATCH v3 00/10] lib/librte_vhost: introduce new vhost user crypto backend support
This patchset adds crypto backend suppport to vhost_user library, including a proof-of-concept sample application. The implementation follows the virtio-crypto specification and have been tested with qemu 2.11.50 (with several patches applied, detailed later) with Fedora 24 running in the frontend. The vhost_crypto library acts as a "bridge" method that translate the virtio-crypto crypto requests to DPDK crypto operations, so it is purely software implementation. However it does require the user to provide the DPDK Cryptodev ID so it knows how to handle the virtio-crypto session creation and deletion mesages. Currently the implementation supports AES-CBC-128 and HMAC-SHA1 cipher only/chaining modes and does not support sessionless mode yet. The guest can use standard virtio-crypto driver to set up session and sends encryption/decryption requests to backend. The vhost-crypto sample application provided in this patchset will do the actual crypto work. To make this patchset working, a few tweaks need to be done: In the host: 1. Download the qemu source code. 2. Configure and compile your qemu with vhost-crypto option enabled. 3. Apply this patchset to latest DPDK code and recompile DPDK. 4. Compile and run vhost-crypto sample application. ./examples/vhost_crypto/build/vhost-crypto -l 11,12,13 -w :86:01.0 \ --socket-mem 2048,2048 Where :86:01.0 is the QAT PCI address. You may use AES-NI-MB if it is not available. The sample application requires 2 lcores: 1 master and 1 worker. The application will create a UNIX socket file /tmp/vhost_crypto1.socket. It is possible to use other flags like --zero-copy and --guest-polling. 5. Start your qemu application. Here is my command: qemu/x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -cpu host \ -smp 2 -m 1G -hda ~/path-to-your/image.qcow \ -object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on \ -mem-prealloc -numa node,memdev=mem -chardev \ socket,id=charcrypto0,path=/tmp/vhost_crypto1.socket \ -object cryptodev-vhost-user,id=cryptodev0,chardev=charcrypto0 \ -device virtio-crypto-pci,id=crypto0,cryptodev=cryptodev0 6. Once guest is booted. The Linux virtio_crypto kernel module is loaded by default. You shall see the following logs in your demsg: [ 17.611044] virtio_crypto: loading out-of-tree module taints kernel. [ 17.611083] virtio_crypto: module verification failed: signature and/or ... [ 17.611723] virtio_crypto virtio0: max_queues: 1, max_cipher_key_len: ... [ 17.612156] virtio_crypto virtio0: will run requests pump with realtime ... [ 18.376100] virtio_crypto virtio0: Accelerator is ready The virtio_crypto driver in the guest is now up and running. 7. The rest steps can be as same as the Testing section in https://wiki.qemu.org/Features/VirtioCrypto 8. It is possible to use DPDK Virtio Crypto PMD (https://dpdk.org/dev/patchwork/patch/36471/) in the guest to work with this patchset to achieve optimal performance. v3: - Changed external vhost backend private data and message handling - Added experimental tag to rte_vhost_crypto_set_zero_copy() v2: - Moved vhost_crypto_data_req data from crypto op to source mbuf. - Removed ZERO-COPY flag from config option and make it run-timely changeable. - Guest-polling mode possible. - Simplified vring descriptor access procedure. - Work with both LKCF and DPDK Virtio-Crypto PMD guest drivers. Fan Zhang (10): lib/librte_vhost: add external backend support lib/librte_vhost: add virtio-crypto user message structure lib/librte_vhost: add session message handler lib/librte_vhost: add request handler lib/librte_vhost: add head file lib/librte_vhost: add public function implementation lib/librte_vhost: update version map lib/librte_vhost: update makefile examples/vhost_crypto: add vhost crypto sample application doc: update prog guide and sample app guide doc/guides/prog_guide/vhost_lib.rst | 21 + doc/guides/sample_app_ug/index.rst|1 + doc/guides/sample_app_ug/vhost_crypto.rst | 84 ++ examples/vhost_crypto/Makefile| 32 + examples/vhost_crypto/main.c | 598 ++ lib/librte_vhost/Makefile |6 +- lib/librte_vhost/meson.build |8 +- lib/librte_vhost/rte_vhost.h | 45 +- lib/librte_vhost/rte_vhost_crypto.h | 125 +++ lib/librte_vhost/rte_vhost_version.map| 11 + lib/librte_vhost/vhost.c | 23 +- lib/librte_vhost/vhost.h |8 +- lib/librte_vhost/vhost_crypto.c | 1269 + lib/librte_vhost/vhost_user.c | 29 +- lib/librte_vhost/vhost_user.h | 36 +- 15 files changed, 2279 insertions(+), 17 deletions(-) create mode 100644 doc/guides/sample_app_ug/vhost_crypto.rst create mode 100644 examples/vhost_crypto/Makefile create mode 100644 examples/vhost_crypto/main.c create mode 100644 lib/librte_vhost/rte_vhost_crypto.h create mo
[dpdk-dev] [PATCH v3 05/10] lib/librte_vhost: add head file
This patch adds public head file API for vhost crypto. Signed-off-by: Fan Zhang --- lib/librte_vhost/rte_vhost_crypto.h | 125 1 file changed, 125 insertions(+) create mode 100644 lib/librte_vhost/rte_vhost_crypto.h diff --git a/lib/librte_vhost/rte_vhost_crypto.h b/lib/librte_vhost/rte_vhost_crypto.h new file mode 100644 index 000..339a939 --- /dev/null +++ b/lib/librte_vhost/rte_vhost_crypto.h @@ -0,0 +1,125 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2017-2018 Intel Corporation + */ + +#ifndef _VHOST_CRYPTO_H_ +#define _VHOST_CRYPTO_H_ + +#include +#include +#include +#include +#include "rte_vhost.h" + +#ifndef MAX_DATA_QUEUES +#define MAX_DATA_QUEUES(1) +#endif + +#define VIRTIO_CRYPTO_CTRL_QUEUE (0) +#define VIRTIO_CRYPTO_MAX_NUM_DEVS (64) +#define VIRTIO_CRYPTO_MAX_NUM_BURST_VQS(64) + +/** Feature bits */ +#define VIRTIO_CRYPTO_F_CIPHER_SESSION_MODE(1) +#define VIRTIO_CRYPTO_F_HASH_SESSION_MODE (2) +#define VIRTIO_CRYPTO_F_MAC_SESSION_MODE (3) +#define VIRTIO_CRYPTO_F_AEAD_SESSION_MODE (4) + +#define VHOST_CRYPTO_MBUF_POOL_SIZE(8192) +#define VHOST_CRYPTO_MAX_BURST_SIZE(64) + +enum rte_vhost_crypto_zero_copy { + RTE_VHOST_CRYPTO_ZERO_COPY_DISABLE = 0, + RTE_VHOST_CRYPTO_ZERO_COPY_ENABLE, + RTE_VHOST_CRYPTO_MAX_ZERO_COPY_OPTIONS +}; + +/** + * Create Vhost-crypto instance + * + * @param vid + * The identifier of the vhost device. + * @param cryptodev_id + * The identifier of DPDK Cryptodev, the same cryptodev_id can be assigned to + * multiple Vhost-crypto devices. + * @param sess_pool + * The pointer to the created cryptodev session pool with the private data size + * matches the target DPDK Cryptodev. + * @param socket_id + * NUMA Socket ID to allocate resources on. * + * @return + * 0 if the Vhost Crypto Instance is created successfully. + * Negative integer if otherwise + */ +int +rte_vhost_crypto_create(int vid, uint8_t cryptodev_id, + struct rte_mempool *sess_pool, int socket_id); + +/** + * Free the Vhost-crypto instance + * + * @param vid + * The identifier of the vhost device. + * @return + * 0 if the Vhost Crypto Instance is created successfully. + * Negative integer if otherwise. + */ +int +rte_vhost_crypto_free(int vid); + +/** + * Enable or disable zero copy feature + * + * @param vid + * The identifier of the vhost device. + * @param option + * Flag of zero copy feature. + * @return + * 0 if completed successfully. + * Negative integer if otherwise. + */ +int __rte_experimental +rte_vhost_crypto_set_zero_copy(int vid, enum rte_vhost_crypto_zero_copy option); + +/** + * Fetch a number of vring descriptors from virt-queue and translate to DPDK + * crypto operations. After this function is executed, the user can enqueue + * the processed ops to the target cryptodev. + * + * @param vid + * The identifier of the vhost device. + * @param qid + * Virtio queue index. + * @param ops + * The address of an array of pointers to *rte_crypto_op* structures that must + * be large enough to store *nb_ops* pointers in it. + * @param nb_ops + * The maximum number of operations to be fetched and translated. + * @return + * The number of fetched and processed vhost crypto request operations. + */ +uint16_t +rte_vhost_crypto_fetch_requests(int vid, uint32_t qid, + struct rte_crypto_op **ops, uint16_t nb_ops); +/** + * Finalize the dequeued crypto ops. After the translated crypto ops are + * dequeued from the cryptodev, this function shall be called to write the + * processed data back to the vring descriptor (if no-copy is turned off). + * + * @param ops + * The address of an array of *rte_crypto_op* structure that was dequeued + * from cryptodev. + * @param nb_ops + * The number of operations contained in the array. + * @callfds + * The callfd number(s) contained in this burst + * @nb_callfds + * The number of call_fd numbers exist in the callfds + * @return + * The number of ops processed. + */ +uint16_t +rte_vhost_crypto_finalize_requests(struct rte_crypto_op **ops, + uint16_t nb_ops, int *callfds, uint16_t *nb_callfds); + +#endif /**< _VHOST_CRYPTO_H_ */ -- 2.7.4
[dpdk-dev] [PATCH v3 02/10] lib/librte_vhost: add virtio-crypto user message structure
This patch adds virtio-crypto spec user message structure to vhost_user. Signed-off-by: Fan Zhang --- lib/librte_vhost/vhost_user.h | 36 1 file changed, 32 insertions(+), 4 deletions(-) diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h index d4bd604..48cdb24 100644 --- a/lib/librte_vhost/vhost_user.h +++ b/lib/librte_vhost/vhost_user.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2010-2018 Intel Corporation */ #ifndef _VHOST_NET_USER_H @@ -20,13 +20,15 @@ #define VHOST_USER_PROTOCOL_F_REPLY_ACK3 #define VHOST_USER_PROTOCOL_F_NET_MTU 4 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 +#define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 #define VHOST_USER_PROTOCOL_FEATURES ((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \ (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\ (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \ (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \ (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU) | \ -(1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ)) +(1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \ +(1ULL << VHOST_USER_PROTOCOL_F_CRYPTO_SESSION)) typedef enum VhostUserRequest { VHOST_USER_NONE = 0, @@ -52,7 +54,9 @@ typedef enum VhostUserRequest { VHOST_USER_NET_SET_MTU = 20, VHOST_USER_SET_SLAVE_REQ_FD = 21, VHOST_USER_IOTLB_MSG = 22, - VHOST_USER_MAX + VHOST_USER_CRYPTO_CREATE_SESS = 26, + VHOST_USER_CRYPTO_CLOSE_SESS = 27, + VHOST_USER_MAX = 28 } VhostUserRequest; typedef enum VhostUserSlaveRequest { @@ -79,6 +83,30 @@ typedef struct VhostUserLog { uint64_t mmap_offset; } VhostUserLog; +/* Comply with Cryptodev-Linux */ +#define VHOST_USER_CRYPTO_MAX_HMAC_KEY_LENGTH 512 +#define VHOST_USER_CRYPTO_MAX_CIPHER_KEY_LENGTH64 + +/* Same structure as vhost-user backend session info */ +typedef struct VhostUserCryptoSessionParam { + int64_t session_id; + uint32_t op_code; + uint32_t cipher_algo; + uint32_t cipher_key_len; + uint32_t hash_algo; + uint32_t digest_len; + uint32_t auth_key_len; + uint32_t aad_len; + uint8_t op_type; + uint8_t dir; + uint8_t hash_mode; + uint8_t chaining_dir; + uint8_t *ciphe_key; + uint8_t *auth_key; + uint8_t cipher_key_buf[VHOST_USER_CRYPTO_MAX_CIPHER_KEY_LENGTH]; + uint8_t auth_key_buf[VHOST_USER_CRYPTO_MAX_HMAC_KEY_LENGTH]; +} VhostUserCryptoSessionParam; + typedef struct VhostUserMsg { union { VhostUserRequest master; @@ -99,6 +127,7 @@ typedef struct VhostUserMsg { VhostUserMemory memory; VhostUserLoglog; struct vhost_iotlb_msg iotlb; + VhostUserCryptoSessionParam crypto_session; } payload; int fds[VHOST_MEMORY_MAX_NREGIONS]; } __attribute((packed)) VhostUserMsg; @@ -108,7 +137,6 @@ typedef struct VhostUserMsg { /* The version of the protocol we support */ #define VHOST_USER_VERSION0x1 - /* vhost_user.c */ int vhost_user_msg_handler(int vid, int fd); int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm); -- 2.7.4
[dpdk-dev] [PATCH v3 04/10] lib/librte_vhost: add request handler
This patch adds the implementation that parses virtio crypto request to dpdk crypto operation. Signed-off-by: Fan Zhang --- lib/librte_vhost/vhost_crypto.c | 593 1 file changed, 593 insertions(+) diff --git a/lib/librte_vhost/vhost_crypto.c b/lib/librte_vhost/vhost_crypto.c index c639b20..1d85829 100644 --- a/lib/librte_vhost/vhost_crypto.c +++ b/lib/librte_vhost/vhost_crypto.c @@ -426,3 +426,596 @@ vhost_crypto_msg_post_handler(int vid, void *msg, uint32_t *require_reply) return ret; } + +static __rte_always_inline struct vring_desc * +find_write_desc(struct vring_desc *head, struct vring_desc *desc) +{ + if (desc->flags & VRING_DESC_F_WRITE) + return desc; + + while (desc->flags & VRING_DESC_F_NEXT) { + desc = &head[desc->next]; + if (desc->flags & VRING_DESC_F_WRITE) + return desc; + } + + return NULL; +} + +static struct virtio_crypto_inhdr * +reach_inhdr(struct vring_desc *head, struct rte_vhost_memory *mem, + struct vring_desc *desc) +{ + while (desc->flags & VRING_DESC_F_NEXT) + desc = &head[desc->next]; + + return GPA_TO_VVA(struct virtio_crypto_inhdr *, mem, desc->addr); +} + +static __rte_always_inline int +move_desc(struct vring_desc *head, struct vring_desc **cur_desc, + uint32_t size) +{ + struct vring_desc *desc = *cur_desc; + int left = size; + + rte_prefetch0(&head[desc->next]); + left -= desc->len; + + while ((desc->flags & VRING_DESC_F_NEXT) && left > 0) { + desc = &head[desc->next]; + rte_prefetch0(&head[desc->next]); + left -= desc->len; + } + + if (unlikely(left < 0)) { + VC_LOG_ERR("Incorrect virtio descriptor"); + return -1; + } + + *cur_desc = &head[desc->next]; + return 0; +} + +static int +copy_data(void *dst_data, struct vring_desc *head, struct rte_vhost_memory *mem, + struct vring_desc **cur_desc, uint32_t size) +{ + struct vring_desc *desc = *cur_desc; + uint32_t to_copy; + uint8_t *data = dst_data; + uint8_t *src; + int left = size; + + rte_prefetch0(&head[desc->next]); + to_copy = RTE_MIN(desc->len, (uint32_t)left); + src = GPA_TO_VVA(uint8_t *, mem, desc->addr); + rte_memcpy((uint8_t *)data, src, to_copy); + left -= to_copy; + + while ((desc->flags & VRING_DESC_F_NEXT) && left > 0) { + desc = &head[desc->next]; + rte_prefetch0(&head[desc->next]); + to_copy = RTE_MIN(desc->len, (uint32_t)left); + src = GPA_TO_VVA(uint8_t *, mem, desc->addr); + rte_memcpy(data + size - left, src, to_copy); + left -= to_copy; + } + + if (unlikely(left < 0)) { + VC_LOG_ERR("Incorrect virtio descriptor"); + return -1; + } + + *cur_desc = &head[desc->next]; + + return 0; +} + +static __rte_always_inline void * +get_data_ptr(struct vring_desc *head, struct rte_vhost_memory *mem, + struct vring_desc **cur_desc, uint32_t size) +{ + void *data; + + data = GPA_TO_VVA(void *, mem, (*cur_desc)->addr); + if (unlikely(!data)) { + VC_LOG_ERR("Failed to get object"); + return NULL; + } + + if (unlikely(move_desc(head, cur_desc, size) < 0)) + return NULL; + + return data; +} + +static int +write_back_data(struct rte_crypto_op *op, struct vhost_crypto_data_req *vc_req) +{ + struct rte_mbuf *mbuf = op->sym->m_dst; + struct vring_desc *head = vc_req->head; + struct rte_vhost_memory *mem = vc_req->mem; + struct vring_desc *desc = vc_req->wb_desc; + int left = vc_req->wb_len; + uint32_t to_write; + uint8_t *src_data = mbuf->buf_addr, *dst; + + rte_prefetch0(&head[desc->next]); + to_write = RTE_MIN(desc->len, (uint32_t)left); + dst = GPA_TO_VVA(uint8_t *, mem, desc->addr); + rte_memcpy(dst, src_data, to_write); + left -= to_write; + src_data += to_write; + +#ifdef RTE_LIBRTE_VHOST_DEBUG + printf("desc addr %llu len %u:", desc->addr, desc->len); + rte_hexdump(stdout, "", dst, to_write); +#endif + + while ((desc->flags & VRING_DESC_F_NEXT) && left > 0) { + desc = &head[desc->next]; + rte_prefetch0(&head[desc->next]); + to_write = RTE_MIN(desc->len, (uint32_t)left); + dst = GPA_TO_VVA(uint8_t *, mem, desc->addr); + rte_memcpy(dst, src_data, to_write); +#ifdef RTE_LIBRTE_VHOST_DEBUG + printf("desc addr %llu len %u:", desc->addr, desc->len); + rte_hexdump(stdout, "DST:", dst, to_write); +#endif + left -= to_write; + src_data += to_write; + } + + if (unlikel
[dpdk-dev] [PATCH v3 06/10] lib/librte_vhost: add public function implementation
This patch adds public API implementation to vhost crypto. Signed-off-by: Fan Zhang --- lib/librte_vhost/vhost_crypto.c | 248 1 file changed, 248 insertions(+) diff --git a/lib/librte_vhost/vhost_crypto.c b/lib/librte_vhost/vhost_crypto.c index 1d85829..f64261a 100644 --- a/lib/librte_vhost/vhost_crypto.c +++ b/lib/librte_vhost/vhost_crypto.c @@ -1019,3 +1019,251 @@ vhost_crypto_complete_one_vm_requests(struct rte_crypto_op **ops, return processed; } + +int +rte_vhost_crypto_create(int vid, uint8_t cryptodev_id, + struct rte_mempool *sess_pool, int socket_id) +{ + struct virtio_net *dev = get_device(vid); + struct rte_vhost_user_dev_extern_ops ops; + struct rte_hash_parameters params = {0}; + struct vhost_crypto *vcrypto; + char name[128]; + int ret; + + if (vid >= VIRTIO_CRYPTO_MAX_NUM_DEVS || !dev) { + VC_LOG_ERR("Invalid vid %i", vid); + return -EINVAL; + } + + ret = rte_vhost_driver_set_features(dev->ifname, + VIRTIO_CRYPTO_FEATURES); + if (ret < 0) { + VC_LOG_ERR("Error setting features"); + return -1; + } + + vcrypto = rte_zmalloc_socket(NULL, sizeof(*vcrypto), + RTE_CACHE_LINE_SIZE, socket_id); + if (!vcrypto) { + VC_LOG_ERR("Insufficient memory"); + return -ENOMEM; + } + + vcrypto->sess_pool = sess_pool; + vcrypto->cid = cryptodev_id; + vcrypto->cache_session_id = UINT64_MAX; + vcrypto->last_session_id = 1; + vcrypto->dev = dev; + vcrypto->option = RTE_VHOST_CRYPTO_ZERO_COPY_DISABLE; + + snprintf(name, 127, "HASH_VHOST_CRYPT_%u", (uint32_t)vid); + params.name = name; + params.entries = SESSION_MAP_ENTRIES; + params.hash_func = rte_jhash; + params.key_len = sizeof(uint64_t); + params.socket_id = socket_id; + vcrypto->session_map = rte_hash_create(¶ms); + if (!vcrypto->session_map) { + VC_LOG_ERR("Failed to creath session map"); + ret = -ENOMEM; + goto error_exit; + } + + snprintf(name, 127, "MBUF_POOL_VM_%u", (uint32_t)vid); + vcrypto->mbuf_pool = rte_pktmbuf_pool_create(name, + VHOST_CRYPTO_MBUF_POOL_SIZE, 512, + sizeof(struct vhost_crypto_data_req), + RTE_MBUF_DEFAULT_DATAROOM * 2 + RTE_PKTMBUF_HEADROOM, + rte_socket_id()); + if (!vcrypto->mbuf_pool) { + VC_LOG_ERR("Failed to creath mbuf pool"); + ret = -ENOMEM; + goto error_exit; + } + + ops.pre_vhost_user_msg_handler = NULL; + ops.post_vhost_user_msg_handler = vhost_crypto_msg_post_handler; + + if (rte_vhost_user_register_extern_backend(dev->vid, (void *)vcrypto, + &ops) < 0) { + VC_LOG_ERR("Failed to register device"); + goto error_exit; + } + + return 0; + +error_exit: + if (vcrypto->session_map) + rte_hash_free(vcrypto->session_map); + if (vcrypto->mbuf_pool) + rte_mempool_free(vcrypto->mbuf_pool); + + rte_free(vcrypto); + + return ret; +} + +int +rte_vhost_crypto_free(int vid) +{ + struct virtio_net *dev = get_device(vid); + struct vhost_crypto *vcrypto; + + if (unlikely(dev == NULL)) { + VC_LOG_ERR("Invalid vid %i", vid); + return -EINVAL; + } + + vcrypto = dev->extern_data; + if (unlikely(vcrypto == NULL)) { + VC_LOG_ERR("Cannot find required data, is it initialized?"); + return -ENOENT; + } + + rte_hash_free(vcrypto->session_map); + rte_mempool_free(vcrypto->mbuf_pool); + rte_free(vcrypto); + + dev->extern_data = NULL; + dev->extern_ops.post_vhost_user_msg_handler = NULL; + dev->extern_ops.pre_vhost_user_msg_handler = NULL; + + return 0; +} + +int __rte_experimental +rte_vhost_crypto_set_zero_copy(int vid, enum rte_vhost_crypto_zero_copy option) +{ + struct virtio_net *dev = get_device(vid); + struct vhost_crypto *vcrypto; + + if (unlikely(dev == NULL)) { + VC_LOG_ERR("Invalid vid %i", vid); + return -EINVAL; + } + + if (unlikely(option < 0 || option >= + RTE_VHOST_CRYPTO_MAX_ZERO_COPY_OPTIONS)) { + VC_LOG_ERR("Invalid option %i", option); + return -EINVAL; + } + + vcrypto = (struct vhost_crypto *)dev->extern_data; + if (unlikely(vcrypto == NULL)) { + VC_LOG_ERR("Cannot find required data, is it initialized?"); + return -ENOENT; + } + + if (vcrypto->option == (uint8_t)option) + return 0; + + if (!(rte_mempool_full(vcrypto->mbuf_po
[dpdk-dev] [PATCH v3 08/10] lib/librte_vhost: update makefile
This patch updates the Makefile of vhost library to enable vhost crypto compiling. Signed-off-by: Fan Zhang --- lib/librte_vhost/Makefile| 6 -- lib/librte_vhost/meson.build | 8 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 5d6c6ab..95a6a93 100644 --- a/lib/librte_vhost/Makefile +++ b/lib/librte_vhost/Makefile @@ -23,8 +23,10 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev -lrte_net # all source are stored in SRCS-y SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \ vhost_user.c virtio_net.c +ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV), y) +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_crypto.c +endif # install includes -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h - +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vhost_crypto.h include $(RTE_SDK)/mk/rte.lib.mk diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build index 9e8c0e7..36f1e27 100644 --- a/lib/librte_vhost/meson.build +++ b/lib/librte_vhost/meson.build @@ -1,5 +1,5 @@ # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017 Intel Corporation +# Copyright(c) 2017-2018 Intel Corporation if host_machine.system() != 'linux' build = false @@ -10,6 +10,6 @@ endif version = 4 allow_experimental_apis = true sources = files('fd_man.c', 'iotlb.c', 'socket.c', 'vhost.c', 'vhost_user.c', - 'virtio_net.c') -headers = files('rte_vhost.h') -deps += ['ethdev'] + 'virtio_net.c', 'virtio_crypto.c') +headers = files('rte_vhost.h', 'rte_vhost_crypto.h') +deps += ['ethdev', 'cryptodev'] -- 2.7.4
[dpdk-dev] [PATCH v3 07/10] lib/librte_vhost: update version map
Signed-off-by: Fan Zhang --- lib/librte_vhost/rte_vhost_version.map | 11 +++ 1 file changed, 11 insertions(+) diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index df01031..935eebd 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -59,3 +59,14 @@ DPDK_18.02 { rte_vhost_vring_call; } DPDK_17.08; + +DPDK_18.05 { + global: + + rte_vhost_user_register_extern_backend; + rte_vhost_crypto_create; + rte_vhost_crypto_free; + rte_vhost_crypto_fetch_requests; + rte_vhost_crypto_finalize_requests; + +} DPDK_18.02; -- 2.7.4
Re: [dpdk-dev] [PATCH v3 1/2] Add RIB library
On Sun, Mar 25, 2018 at 09:35:35PM +0300, Vladimir Medvedkin wrote: > 2018-03-15 17:27 GMT+03:00 Bruce Richardson : > > > On Thu, Feb 22, 2018 at 10:50:55PM +, Medvedkin Vladimir wrote: > > > RIB is an alternative to current LPM library. > > > It solves the following problems > > > - Increases the speed of control plane operations against lpm such as > > >adding/deleting routes > > > - Adds abstraction from dataplane algorithms, so it is possible to add > > >different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc > > >in addition to current dir24_8 > > > - It is possible to keep user defined application specific additional > > >information in struct rte_rib_node which represents route entry. > > >It can be next hop/set of next hops (i.e. active and feasible), > > >pointers to link rte_rib_node based on some criteria (i.e. next_hop), > > >plenty of additional control plane information. > > > - For dir24_8 implementation it is possible to remove > > >rte_lpm_tbl_entry.depth field that helps to save 6 bits. > > > - Also new dir24_8 implementation supports different next_hop sizes > > >(1/2/4/8 bytes per next hop) > > > - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate > > >ternary operator. > > >Instead it returns special default value if there is no route. > > > > > > Signed-off-by: Medvedkin Vladimir > > > > More comments inline below. Mostly for rte_rib.c file. > > > > /Bruce > > > > > > > + while (cur != NULL) { > > > + if ((cur->key == key) && (cur->depth == depth) && > > > + (cur->flag & RTE_RIB_VALID_NODE)) > > > + return cur; > > > + if ((cur->depth > depth) || > > > + (((uint64_t)key >> (32 - cur->depth)) != > > > + ((uint64_t)cur->key >> (32 - cur->depth > > > + break; > > > + cur = RTE_RIB_GET_NXT_NODE(cur, key); > > > + } > > > + return NULL; > > > +} > > > + > > > +struct rte_rib_node * > > > +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, > > > + uint8_t depth, struct rte_rib_node *cur, int flag) > > > +{ > > > + struct rte_rib_node *tmp, *prev = NULL; > > > + > > > + if (cur == NULL) { > > > + tmp = rib->trie; > > > + while ((tmp) && (tmp->depth < depth)) > > > + tmp = RTE_RIB_GET_NXT_NODE(tmp, key); > > > + } else { > > > + tmp = cur; > > > + while ((tmp->parent != NULL) && > > (RTE_RIB_IS_RIGHT_NODE(tmp) || > > > + (tmp->parent->right == NULL))) { > > > + tmp = tmp->parent; > > > + if ((tmp->flag & RTE_RIB_VALID_NODE) && > > > + (RTE_RIB_IS_COVERED(tmp->key, tmp->depth, > > > + key, depth))) > > > + return tmp; > > > + } > > > + tmp = (tmp->parent) ? tmp->parent->right : NULL; > > > + } > > > + while (tmp) { > > > + if ((tmp->flag & RTE_RIB_VALID_NODE) && > > > + (RTE_RIB_IS_COVERED(tmp->key, tmp->depth, > > > + key, depth))) { > > > + prev = tmp; > > > + if (flag == RTE_RIB_GET_NXT_COVER) > > > + return prev; > > > + } > > > + tmp = (tmp->left) ? tmp->left : tmp->right; > > > + } > > > + return prev; > > > +} > > > > I think this function could do with some comments explaining the logic > > behind it. > > > This function traverses on subtree and retrieves more specific routes for a > given in args key/depth prefix (treat it like a top of the subtree). > Traverse without using recursion but using some kind of stack. It uses *cur > argument like a pointer to the last returned node to resume retrieval after > cur node. > Yes. Please add such an explanation into the code for future patches. [Same with the other additional explanations in your reply email]. Thanks, /Bruce
[dpdk-dev] [PATCH v3 09/10] examples/vhost_crypto: add vhost crypto sample application
This patch adds vhost_crypto sample application to DPDK. Signed-off-by: Fan Zhang --- examples/vhost_crypto/Makefile | 32 +++ examples/vhost_crypto/main.c | 598 + 2 files changed, 630 insertions(+) create mode 100644 examples/vhost_crypto/Makefile create mode 100644 examples/vhost_crypto/main.c diff --git a/examples/vhost_crypto/Makefile b/examples/vhost_crypto/Makefile new file mode 100644 index 000..1bb65e8 --- /dev/null +++ b/examples/vhost_crypto/Makefile @@ -0,0 +1,32 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2017-2018 Intel Corporation + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overridden by command line or environment +RTE_TARGET ?= x86_64-native-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +ifneq ($(CONFIG_RTE_EXEC_ENV),"linuxapp") +$(info This application can only operate in a linuxapp environment, \ +please change the definition of the RTE_TARGET environment variable) +all: +else + +# binary name +APP = vhost-crypto + +# all source are stored in SRCS-y +SRCS-y := main.c + +CFLAGS += -DALLOW_EXPERIMENTAL_API +CFLAGS += -O2 -D_FILE_OFFSET_BITS=64 +CFLAGS += $(WERROR_FLAGS) +CFLAGS += -D_GNU_SOURCE + +include $(RTE_SDK)/mk/rte.extapp.mk + +endif diff --git a/examples/vhost_crypto/main.c b/examples/vhost_crypto/main.c new file mode 100644 index 000..d5d1525 --- /dev/null +++ b/examples/vhost_crypto/main.c @@ -0,0 +1,598 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2017-2018 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#define NB_VIRTIO_QUEUES (1) +#define MAX_PKT_BURST (64) +#define MAX_IV_LEN (32) +#define NB_MEMPOOL_OBJS(8192) +#define NB_CRYPTO_DESCRIPTORS (4096) +#define NB_CACHE_OBJS (128) +#define SESSION_MAP_ENTRIES(1024) +#define REFRESH_TIME_SEC (3) + +#define MAX_NB_SOCKETS (32) +#define DEF_SOCKET_FILE"/tmp/vhost_crypto1.socket" + +struct vhost_crypto_options { + char *socket_files[MAX_NB_SOCKETS]; + uint32_t nb_sockets; + uint8_t cid; + uint16_t qid; + uint32_t zero_copy; + uint32_t guest_polling; +} options; + +struct vhost_crypto_info { + int vids[MAX_NB_SOCKETS]; + struct rte_mempool *sess_pool; + struct rte_mempool *cop_pool; + uint32_t lcore_id; + uint8_t cid; + uint32_t qid; + uint32_t nb_vids; + volatile uint32_t initialized[MAX_NB_SOCKETS]; + +} info; + +#define SOCKET_FILE_KEYWORD"socket-file" +#define CRYPTODEV_ID_KEYWORD "cdev-id" +#define CRYPTODEV_QUEUE_KEYWORD"cdev-queue-id" +#define ZERO_COPY_KEYWORD "zero-copy" +#define POLLING_KEYWORD"guest-polling" + +uint64_t vhost_cycles[2], last_v_cycles[2]; +uint64_t outpkt_amount; + +/** support *SOCKET_FILE_PATH:CRYPTODEV_ID* format */ +static int +parse_socket_arg(char *arg) +{ + uint32_t nb_sockets = options.nb_sockets; + size_t len = strlen(arg); + + if (nb_sockets >= MAX_NB_SOCKETS) { + RTE_LOG(ERR, USER1, "Too many socket files!\n"); + return -ENOMEM; + } + + options.socket_files[nb_sockets] = rte_malloc(NULL, len, 0); + if (!options.socket_files[nb_sockets]) { + RTE_LOG(ERR, USER1, "Insufficient memory\n"); + return -ENOMEM; + } + + rte_memcpy(options.socket_files[nb_sockets], arg, len); + + options.nb_sockets++; + + return 0; +} + +static int +parse_cryptodev_id(const char *q_arg) +{ + char *end = NULL; + uint64_t pm; + + /* parse decimal string */ + pm = strtoul(q_arg, &end, 10); + if ((pm == '\0') || (end == NULL) || (*end != '\0')) { + RTE_LOG(ERR, USER1, "Invalid Cryptodev ID %s\n", q_arg); + return -1; + } + + if (pm > rte_cryptodev_count()) { + RTE_LOG(ERR, USER1, "Invalid Cryptodev ID %s\n", q_arg); + return -1; + } + + options.cid = (uint8_t)pm; + + return 0; +} + +static int +parse_cdev_queue_id(const char *q_arg) +{ + char *end = NULL; + uint64_t pm; + + /* parse decimal string */ + pm = strtoul(q_arg, &end, 10); + if (pm == UINT64_MAX) { + RTE_LOG(ERR, USER1, "Invalid Cryptodev Queue ID %s\n", q_arg); + return -1; + } + + options.qid = (uint16_t)pm; + + return 0; +} + +static void +vhost_crypto_usage(const char *prgname) +{ + printf("%s [EAL options] --\n" + " --%s SOCKET-FILE-PATH\n" + " --%s CRYPTODEV_ID: crypto device id\n" + " --%s CDEV_QUEUE_ID: crypto device q
[dpdk-dev] [PATCH v3 10/10] doc: update prog guide and sample app guide
Signed-off-by: Fan Zhang --- doc/guides/prog_guide/vhost_lib.rst | 21 doc/guides/sample_app_ug/index.rst| 1 + doc/guides/sample_app_ug/vhost_crypto.rst | 84 +++ 3 files changed, 106 insertions(+) create mode 100644 doc/guides/sample_app_ug/vhost_crypto.rst diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 18227b6..9d6680c 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -160,6 +160,27 @@ The following is an overview of some key Vhost API functions: Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``. +* ``rte_vhost_crypto_create(vid, cryptodev_id, sess_mempool, socket_id)`` + + As an extension of new_device(), this function adds virtio-crypto workload + acceleration capability to the device. All crypto workload is processed by + DPDK cryptodev with the device ID of ``cryptodev_id``. + +* ``rte_vhost_crypto_fetch_requests(vid, queue_id, ops, nb_ops)`` + + Receives (dequeues) ``nb_ops`` virtio-crypto requests from guest, parses + them to DPDK Crypto Operations, and fills the ``ops`` with parsing results. + +* ``rte_vhost_crypto_finalize_requests(queue_id, ops, nb_ops)`` + + After the ``ops`` are dequeued from Cryptodev, finalizes the jobs and + notifies the guest(s). + +* ``rte_vhost_user_register_extern_backend(vid, extern_data, ops)`` + + This function register private data, and pre and post vhost-user message + handlers. + Vhost-user Implementations -- diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst index e87afda..57e8354 100644 --- a/doc/guides/sample_app_ug/index.rst +++ b/doc/guides/sample_app_ug/index.rst @@ -44,6 +44,7 @@ Sample Applications User Guides vmdq_dcb_forwarding vhost vhost_scsi +vhost_crypto netmap_compatibility ip_pipeline test_pipeline diff --git a/doc/guides/sample_app_ug/vhost_crypto.rst b/doc/guides/sample_app_ug/vhost_crypto.rst new file mode 100644 index 000..8e6cb2a --- /dev/null +++ b/doc/guides/sample_app_ug/vhost_crypto.rst @@ -0,0 +1,84 @@ +.. SPDX-License-Identifier: BSD-3-Clause +Copyright(c) 2017-2018 Intel Corporation. + + +Vhost_Crypto Sample Application +=== + +The vhost_crypto sample application implemented a simple Crypto device, +which used as the backend of Qemu vhost-user-crypto device. Similar with +vhost-user-net and vhost-user-scsi device, the sample application used +domain socket to communicate with Qemu, and the virtio ring was processed +by vhost_crypto sample application. + +Testing steps +- + +This section shows the steps how to start a VM with the crypto device as +fast data path for critical application. + +Compiling the Application +- + +To compile the sample application see :doc:`compiling`. + +The application is located in the ``examples`` sub-directory. + +Start the vhost_crypto example +~~ + +.. code-block:: console + +./vhost_crypto [EAL options] -- [--socket-file PATH] +[--cdev-id ID] [--cdev-queue-id ID] [--zero-copy] [--guest-polling] + +where, + +* socket-file PATH: the path of UNIX socket file to be created, multiple + instances of this config item is supported. Upon absence of this item, + the default socket-file `/tmp/vhost_crypto1.socket` is used. + +* cdev-id ID: the target DPDK Cryptodev's ID to process the actual crypto + workload. Upon absence of this item the default value of `0` will be used. + For details of DPDK Cryptodev, please refer to DPDK Cryptodev Library + Programmers' Guide. + +* cdev-queue-id ID: the target DPDK Cryptodev's queue ID to process the + actual crypto workload. Upon absence of this item the default value of `0` + will be used. For details of DPDK Cryptodev, please refer to DPDK Cryptodev + Library Programmers' Guide. + +* zero-copy: the presence of this item means the ZERO-COPY feature will be + enabled. Otherwise it is disabled. PLEASE NOTE the ZERO-COPY feature is still + in experimental stage and may cause the problem like segmentation fault. If + the user wants to use LKCF in the guest, this feature shall be turned off. + +* guest-polling: the presence of this item means the application assumes the + guest works in polling mode, thus will NOT notify the guest completion of + processing. + +The application requires that crypto devices capable of performing +the specified crypto operation are available on application initialization. +This means that HW crypto device/s must be bound to a DPDK driver or +a SW crypto device/s (virtual crypto PMD) must be created (using --vdev). + +.. _vhost_crypto_app_run_vm: + +Start the VM + + +.. code-block:: console + +qemu-system-x86_64 -machine accel=kvm \ +-m $mem -object memory-backend-file,id=mem,size=$mem,\ +mem-path=/dev/hugep
[dpdk-dev] [PATCH 1/2] net/mlx5: enforce RSS key length limitation
RSS hash key must be 40 Bytes long. Cc: sta...@dpdk.org Signed-off-by: Shahaf Shuler --- drivers/net/mlx5/mlx5_ethdev.c | 3 ++- drivers/net/mlx5/mlx5_rss.c| 7 +++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index f5511ce70..365101af9 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -329,7 +329,8 @@ mlx5_dev_configure(struct rte_eth_dev *dev) if (use_app_rss_key && (dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key_len != rss_hash_default_key_len)) { - /* MLX5 RSS only support 40bytes key. */ + DRV_LOG(ERR, "port %u RSS key len must be %zu Bytes long", + dev->data->port_id, rss_hash_default_key_len); rte_errno = EINVAL; return -rte_errno; } diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c index 5ac650163..ceaa570ef 100644 --- a/drivers/net/mlx5/mlx5_rss.c +++ b/drivers/net/mlx5/mlx5_rss.c @@ -48,6 +48,13 @@ mlx5_rss_hash_update(struct rte_eth_dev *dev, return -rte_errno; } if (rss_conf->rss_key && rss_conf->rss_key_len) { + if (rss_conf->rss_key_len != rss_hash_default_key_len) { + DRV_LOG(ERR, + "port %u RSS key len must be %zu Bytes long", + dev->data->port_id, rss_hash_default_key_len); + rte_errno = ENOTSUP; + return -rte_errno; + } priv->rss_conf.rss_key = rte_realloc(priv->rss_conf.rss_key, rss_conf->rss_key_len, 0); if (!priv->rss_conf.rss_key) { -- 2.12.0
Re: [dpdk-dev] [PATCH 06/17] net/virtio-user: add option to use packed queues
On Mon, Mar 19, 2018 at 04:33:27PM +0800, Tiwei Bie wrote: On Fri, Mar 16, 2018 at 04:21:09PM +0100, Jens Freimann wrote: [...] diff --git a/drivers/net/virtio/virtio_user_ethdev.c b/drivers/net/virtio/virtio_user_ethdev.c index 2636490..ee291b3 100644 --- a/drivers/net/virtio/virtio_user_ethdev.c +++ b/drivers/net/virtio/virtio_user_ethdev.c @@ -278,6 +278,8 @@ VIRTIO_USER_ARG_QUEUE_SIZE, #define VIRTIO_USER_ARG_INTERFACE_NAME "iface" VIRTIO_USER_ARG_INTERFACE_NAME, +#define VIRTIO_USER_ARG_VERSION_1_1 "version_1_1" + VIRTIO_USER_ARG_VERSION_1_1, Maybe we can enable packed-ring by default for virtio-user. If we really need a flag to enable it, the devarg name should be packed_ring instead of version_1_1. Thinking about it, we should probably just get rid of this patch and let feature negotiation do its thing. Thanks! regards, Jens
[dpdk-dev] [PATCH 2/2] net/mlx5: fix RSS key len query
The RSS key length returned by rte_eth_dev_info_get command was taken from the PMD private structure. This structure initialization was done only after the port configuration. Considering Mellanox device supports only 40B long RSS key, reporting the fixed number instead. Fixes: 29c1d8bb3e79 ("net/mlx5: handle a single RSS hash key for all protocols") Cc: sta...@dpdk.org Cc: nelio.laranje...@6wind.com Signed-off-by: Shahaf Shuler --- drivers/net/mlx5/mlx5_ethdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 365101af9..b6f5101cf 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -428,7 +428,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info) info->if_index = if_nametoindex(ifname); info->reta_size = priv->reta_idx_n ? priv->reta_idx_n : config->ind_table_max_size; - info->hash_key_size = priv->rss_conf.rss_key_len; + info->hash_key_size = rss_hash_default_key_len; info->speed_capa = priv->link_speed_capa; info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK; } -- 2.12.0
[dpdk-dev] [PATCH V16 0/3] add device event monitor framework
About hot plug in dpdk, We already have proactive way to add/remove devices through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver to offload the fail-safe work from the app user. But there are still lack of a general mechanism to monitor hotplug event for all driver, now the hotplug interrupt event is diversity between each device and driver, such as mlx4, pci driver and others. Use the hot removal event for example, pci drivers not all exposure the remove interrupt, so in order to make user to easy use the hot plug feature for pci driver, something must be done to detect the remove event at the kernel level and offer a new line of interrupt to the user land. Base on the uevent of kobject mechanism in kernel, we could use it to benefit for monitoring the hot plug status of the device which not only uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices. The idea is comming as bellow. a.The uevent message form FD monitoring which will be useful. remove@/devices/pci:80/:80:02.2/:82:00.0/:83:03.0/:84:00.2/uio/uio2 ACTION=remove DEVPATH=/devices/pci:80/:80:02.2/:82:00.0/:83:03.0/:84:00.2/uio/uio2 SUBSYSTEM=uio MAJOR=243 MINOR=2 DEVNAME=uio2 SEQNUM=11366 b.add uevent monitoring machanism: add several general api to enable uevent monitoring. c.add common uevent handler and uevent failure handler uevent of device should be handler at bus or device layer, and the memory read and write failure when hot removal should be handle correctly before detach behaviors. d.show example how to use uevent monitor enable uevent monitoring in testpmd or fail-safe to show usage. patchset history: v16->v15: 1.remove some linux related code out of eal common layer 2.fix some uneasy readble issue. v15->v14: 1.use exist eal interrupt epoll to replace of rte service usage for monitor thread, 2.add new device event handle type in eal interrupt. 3.remove the uevent type check and any policy from eal, let it check and management in user's callback. 4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature. v14->v13: 1.add __rte_experimental on function defind and fix bsd build issue v13->v12: 1.fix some logic issue and null check issue 2.fix monitor stop func issue v12->v11: 1.identify null param in callback for monitor all devices uevent v11->v10: 1:modify some typo and add experimental tag in new file. 2:modify callback register calling. v10->v9: 1.fix prefix issue. 2.use a common callback lists for all device and all type to replace add callback parameter into device struct. 3.delete some unuse part. v9->v8: split the patch set into small and explicit patch v8->v7: 1.use rte_service to replace pthread management. 2.fix defind issue and copyright issue 3.fix some lock issue v7->v6: 1.modify vdev part according to the vdev rework 2.re-define and split the func into common and bus specific code 3.fix some incorrect issue. 4.fix the system hung after send packcet issue. v6->v5: 1.add hot plug policy, in eal, default handle to prepare hot plug work for all pci device, then let app to manage to deside which device need to hot plug. 2.modify to manage event callback in each device. 3.fix some system hung issue when igb_uio release. 4.modify the pci part to the bus-pci base on the bus rework. 5.add hot plug policy in app, show example to use hotplug list to manage to deside which device need to hot plug. v5->v4: 1.Move uevent monitor epolling from eal interrupt to eal device layer. 2.Redefine the eal device API for common, and distinguish between linux and bsd 3.Add failure handler helper api in bus layer.Add function of find device by name. 4.Replace of individual fd bind with single device, use a common fd to polling all device. 5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device 6.Refine some coding style and typos issue 7.add new callback to process hot insertion v4->v3: 1.move uevent monitor api from eal interrupt to eal device layer. 2.create uevent type and struct in eal device. 3.move uevent handler for each driver to eal layer. 4.add uevent failure handler to process signal fault issue. 5.add example for request and use uevent monitoring in testpmd. v3->v2: 1.refine some return error 2.refine the string searching logic to avoid memory issue v2->v1: 1.remove global variables of hotplug_fd, add uevent_fd in rte_intr_handle to let each pci device self maintain it fd, to fix dual device fd issue. 2.refine some typo error. Jeff Guo (3): eal: add device event handle in interrupt thread eal: add device event monitor framework app/testpmd: enable device hotplug monitoring app/test-pmd/parameters.c | 5 +- app/test-pmd/testpmd.c | 195 - app/test-pmd/testpmd.h | 11 ++ lib/librte_eal/bsdapp/eal/Makefile | 1 + lib/libr
[dpdk-dev] [PATCH V16 3/3] app/testpmd: enable device hotplug monitoring
Use testpmd for example, to show an application how to use device event mechanism to monitor the hotplug event, involve both hot removal event and the hot insertion event. The process is that, testpmd first enable hotplug monitoring and register the user's callback, when device being hotplug insertion or hotplug removal, the eal monitor the event and call user's callbacks, the application according their hot plug policy to detach or attach the device from the bus. Signed-off-by: Jeff Guo --- v16->v15: 1.modify log and patch description. --- app/test-pmd/parameters.c | 5 +- app/test-pmd/testpmd.c| 195 +- app/test-pmd/testpmd.h| 11 +++ 3 files changed, 209 insertions(+), 2 deletions(-) diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index 97d22b8..825d602 100644 --- a/app/test-pmd/parameters.c +++ b/app/test-pmd/parameters.c @@ -186,6 +186,7 @@ usage(char* progname) printf(" --flow-isolate-all: " "requests flow API isolated mode on all ports at initialization time.\n"); printf(" --tx-offloads=0x: hexadecimal bitmask of TX queue offloads\n"); + printf(" --hot-plug: enalbe hot plug for device.\n"); } #ifdef RTE_LIBRTE_CMDLINE @@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv) { "print-event",1, 0, 0 }, { "mask-event", 1, 0, 0 }, { "tx-offloads",1, 0, 0 }, + { "hot-plug", 0, 0, 0 }, { 0, 0, 0, 0 }, }; @@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv) rte_exit(EXIT_FAILURE, "invalid mask-event argument\n"); } - + if (!strcmp(lgopts[opt_idx].name, "hot-plug")) + hot_plug = 1; break; case 'h': usage(argv[0]); diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 4c0e258..bb1ac8f 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -284,6 +285,9 @@ uint8_t lsc_interrupt = 1; /* enabled by default */ */ uint8_t rmv_interrupt = 1; /* enabled by default */ + +uint8_t hot_plug = 0; /**< hotplug disabled by default. */ + /* * Display or mask ether events * Default to all events except VF_MBOX @@ -384,6 +388,8 @@ uint8_t bitrate_enabled; struct gro_status gro_ports[RTE_MAX_ETHPORTS]; uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES; +static struct hotplug_request_list hp_list; + /* Forward function declarations */ static void map_port_queue_stats_mapping_registers(portid_t pi, struct rte_port *port); @@ -391,6 +397,14 @@ static void check_all_ports_link_status(uint32_t port_mask); static int eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param, void *ret_param); +static int eth_dev_event_callback(char *device_name, + enum rte_dev_event_type type, + void *param); +static int eth_dev_event_callback_register(portid_t port_id); +static bool in_hotplug_list(const char *dev_name); + +static int hotplug_list_add(struct rte_device *device, + enum rte_kernel_driver device_kdrv); /* * Check if all the ports are started. @@ -1853,6 +1867,27 @@ reset_port(portid_t pid) printf("Done\n"); } +static int +eth_dev_event_callback_register(portid_t port_id) +{ + int diag; + char device_name[128]; + + snprintf(device_name, sizeof(device_name), + "%s", rte_eth_devices[port_id].device->name); + + /* register the dev_event callback */ + + diag = rte_dev_callback_register(device_name, + eth_dev_event_callback, (void *)(intptr_t)port_id); + if (diag) { + printf("Failed to setup dev_event callback\n"); + return -1; + } + + return 0; +} + void attach_port(char *identifier) { @@ -1869,6 +1904,8 @@ attach_port(char *identifier) if (rte_eth_dev_attach(identifier, &pi)) return; + eth_dev_event_callback_register(pi); + socket_id = (unsigned)rte_eth_dev_socket_id(pi); /* if socket_id is invalid, set to 0 */ if (check_socket_id(socket_id) < 0) @@ -1880,6 +1917,12 @@ attach_port(char *identifier) ports[pi].port_status = RTE_PORT_STOPPED; + if (hot_plug) { + hotplug_list_add(rte_eth_devices[pi].device, +rte_eth_devices[pi].data->kdrv); + eth_dev_event_callback_register(pi); + } + printf("Port %d
[dpdk-dev] [PATCH V16 2/3] eal: add device event monitor framework
This patch aims to add a general device event monitor mechanism at EAL device layer, for device hotplug awareness and actions adopted accordingly. It could also expand for all other type of device event monitor, but not in this scope at the stage. To get started, users firstly register or unregister callbacks through the new added APIs. Callbacks can be some device specific, or for all devices. -rte_dev_callback_register -rte_dev_callback_unregister Then application shall call below new added APIs to enable/disable the mechanism: - rte_dev_event_monitor_start - rte_dev_event_monitor_stop Use hotplug case for example, when device hotplug insertion or hotplug removal, we will get notified from kernel, then call user's callbacks accordingly to handle it, such as detach or attach the device from the bus, and could be benifit for futher fail-safe or live-migration. Signed-off-by: Jeff Guo --- v16->v15: 1.remove some linux related code out of eal common layer 2.fix some uneasy readble issue. --- lib/librte_eal/bsdapp/eal/Makefile | 1 + lib/librte_eal/bsdapp/eal/eal_dev.c | 19 + lib/librte_eal/common/eal_common_dev.c | 145 lib/librte_eal/common/eal_private.h | 24 ++ lib/librte_eal/common/include/rte_dev.h | 92 lib/librte_eal/linuxapp/eal/Makefile| 1 + lib/librte_eal/linuxapp/eal/eal_dev.c | 20 + 7 files changed, 302 insertions(+) create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile index dd455e6..c0921dd 100644 --- a/lib/librte_eal/bsdapp/eal/Makefile +++ b/lib/librte_eal/bsdapp/eal/Makefile @@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c +SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c # from common dir SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c new file mode 100644 index 000..ad606b3 --- /dev/null +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#include + +int __rte_experimental +rte_dev_event_monitor_start(void) +{ + RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); + return -1; +} + +int __rte_experimental +rte_dev_event_monitor_stop(void) +{ + RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); + return -1; +} diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index cd07144..3a1bbb6 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -14,9 +14,34 @@ #include #include #include +#include +#include #include "eal_private.h" +/* spinlock for device callbacks */ +static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER; + +/** + * The device event callback description. + * + * It contains callback address to be registered by user application, + * the pointer to the parameters for callback, and the device name. + */ +struct dev_event_callback { + TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */ + rte_dev_event_cb_fn cb_fn;/**< Callback address */ + void *cb_arg; /**< Callback parameter */ + char *dev_name; /**< Callback devcie name, NULL is for all device */ + uint32_t active;/**< Callback is executing */ +}; + +/** @internal Structure to keep track of registered callbacks */ +TAILQ_HEAD(dev_event_cb_list, dev_event_callback); + +/* The device event callback list for all registered callbacks. */ +static struct dev_event_cb_list dev_event_cbs; + static int cmp_detached_dev_name(const struct rte_device *dev, const void *_name) { @@ -207,3 +232,123 @@ rte_eal_hotplug_remove(const char *busname, const char *devname) rte_eal_devargs_remove(busname, devname); return ret; } + +static struct dev_event_callback * __rte_experimental +dev_event_cb_find(const char *device_name, rte_dev_event_cb_fn cb_fn, + void *cb_arg) +{ + struct dev_event_callback *event_cb = NULL; + + TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) { + if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) { + if (device_name == NULL && event_cb->dev_name == NULL) + break; + if (device_name == NULL || event_cb->dev_name == NULL) + continue; + if (!strcmp(event_cb->dev_name, device_name)) + break; +
[dpdk-dev] [PATCH V16 1/3] eal: add device event handle in interrupt thread
Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for device event interrupt monitor. Signed-off-by: Jeff Guo --- v16->v15: split into small patch base on the function --- lib/librte_eal/common/include/rte_eal_interrupts.h | 1 + lib/librte_eal/linuxapp/eal/eal_interrupts.c | 5 - 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h index 3f792a9..6eb4932 100644 --- a/lib/librte_eal/common/include/rte_eal_interrupts.h +++ b/lib/librte_eal/common/include/rte_eal_interrupts.h @@ -34,6 +34,7 @@ enum rte_intr_handle_type { RTE_INTR_HANDLE_ALARM,/**< alarm handle */ RTE_INTR_HANDLE_EXT, /**< external handler */ RTE_INTR_HANDLE_VDEV, /**< virtual device */ + RTE_INTR_HANDLE_DEV_EVENT,/**< device event handle */ RTE_INTR_HANDLE_MAX /**< count of elements */ }; diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index f86f22f..842acaa 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -674,7 +674,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) bytes_read = 0; call = true; break; - + case RTE_INTR_HANDLE_DEV_EVENT: + bytes_read = 0; + call = true; + break; default: bytes_read = 1; break; -- 2.7.4
[dpdk-dev] [PATCH v2] net/octeontx: use the new offload APIs
Use the new Rx/Tx offload APIs and remove the old style offloads. Signed-off-by: Pavan Nikhilesh Acked-by: Santosh Shukla --- drivers/net/octeontx/octeontx_ethdev.c | 82 +- drivers/net/octeontx/octeontx_ethdev.h | 3 ++ 2 files changed, 65 insertions(+), 20 deletions(-) diff --git a/drivers/net/octeontx/octeontx_ethdev.c b/drivers/net/octeontx/octeontx_ethdev.c index b739c0b39..3eb765eb1 100644 --- a/drivers/net/octeontx/octeontx_ethdev.c +++ b/drivers/net/octeontx/octeontx_ethdev.c @@ -262,6 +262,8 @@ octeontx_dev_configure(struct rte_eth_dev *dev) struct rte_eth_rxmode *rxmode = &conf->rxmode; struct rte_eth_txmode *txmode = &conf->txmode; struct octeontx_nic *nic = octeontx_pmd_priv(dev); + uint64_t configured_offloads; + uint64_t unsupported_offloads; int ret; PMD_INIT_FUNC_TRACE(); @@ -283,34 +285,38 @@ octeontx_dev_configure(struct rte_eth_dev *dev) return -EINVAL; } - if (!rxmode->hw_strip_crc) { + configured_offloads = rxmode->offloads; + + if (!(configured_offloads & DEV_RX_OFFLOAD_CRC_STRIP)) { PMD_INIT_LOG(NOTICE, "can't disable hw crc strip"); - rxmode->hw_strip_crc = 1; + configured_offloads |= DEV_RX_OFFLOAD_CRC_STRIP; } - if (rxmode->hw_ip_checksum) { - PMD_INIT_LOG(NOTICE, "rxcksum not supported"); - rxmode->hw_ip_checksum = 0; - } + unsupported_offloads = configured_offloads & ~OCTEONTX_RX_OFFLOADS; - if (rxmode->split_hdr_size) { - octeontx_log_err("rxmode does not support split header"); - return -EINVAL; + if (unsupported_offloads) { + PMD_INIT_LOG(ERR, "Rx offloads 0x%" PRIx64 " are not supported. " + "Requested 0x%" PRIx64 " supported 0x%" PRIx64 "\n", + unsupported_offloads, configured_offloads, + (uint64_t)OCTEONTX_RX_OFFLOADS); + return -ENOTSUP; } - if (rxmode->hw_vlan_filter) { - octeontx_log_err("VLAN filter not supported"); - return -EINVAL; - } + configured_offloads = txmode->offloads; - if (rxmode->hw_vlan_extend) { - octeontx_log_err("VLAN extended not supported"); - return -EINVAL; + if (!(configured_offloads & DEV_TX_OFFLOAD_MT_LOCKFREE)) { + PMD_INIT_LOG(NOTICE, "cant disable lockfree tx"); + configured_offloads |= DEV_TX_OFFLOAD_MT_LOCKFREE; } - if (rxmode->enable_lro) { - octeontx_log_err("LRO not supported"); - return -EINVAL; + unsupported_offloads = configured_offloads & ~OCTEONTX_TX_OFFLOADS; + + if (unsupported_offloads) { + PMD_INIT_LOG(ERR, "Tx offloads 0x%" PRIx64 " are not supported." + "Requested 0x%" PRIx64 " supported 0x%" PRIx64 ".\n", + unsupported_offloads, configured_offloads, + (uint64_t)OCTEONTX_TX_OFFLOADS); + return -ENOTSUP; } if (conf->link_speeds & ETH_LINK_SPEED_FIXED) { @@ -630,6 +636,7 @@ octeontx_dev_info(struct rte_eth_dev *dev, dev_info->default_rxconf = (struct rte_eth_rxconf) { .rx_free_thresh = 0, .rx_drop_en = 0, + .offloads = OCTEONTX_RX_OFFLOADS, }; dev_info->default_txconf = (struct rte_eth_txconf) { @@ -750,10 +757,11 @@ octeontx_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t qidx, struct octeontx_txq *txq = NULL; uint16_t dq_num; int res = 0; + uint64_t configured_offloads; + uint64_t unsupported_offloads; RTE_SET_USED(nb_desc); RTE_SET_USED(socket_id); - RTE_SET_USED(tx_conf); dq_num = (nic->port_id * PKO_VF_NUM_DQ) + qidx; @@ -771,6 +779,22 @@ octeontx_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t qidx, dev->data->tx_queues[qidx] = NULL; } + configured_offloads = tx_conf->offloads; + + if (!(configured_offloads & DEV_TX_OFFLOAD_MT_LOCKFREE)) { + PMD_INIT_LOG(NOTICE, "cant disable lockfree tx"); + configured_offloads |= DEV_TX_OFFLOAD_MT_LOCKFREE; + } + + unsupported_offloads = configured_offloads & ~OCTEONTX_TX_OFFLOADS; + if (unsupported_offloads) { + PMD_INIT_LOG(ERR, "Tx offloads 0x%" PRIx64 " are not supported." + "Requested 0x%" PRIx64 " supported 0x%" PRIx64 ".\n", + unsupported_offloads, configured_offloads, + (uint64_t)OCTEONTX_TX_OFFLOADS); + return -ENOTSUP; + } + /* Allocating tx queue data structure */ txq = rte_zmalloc_socket("ethdev TX queue", sizeof(struct octeontx_txq), RTE_CACHE_LINE_SIZE, n
Re: [dpdk-dev] [PATCH 1/2] net/mlx5: enforce RSS key length limitation
On Mon, Mar 26, 2018 at 01:12:18PM +0300, Shahaf Shuler wrote: > RSS hash key must be 40 Bytes long. > > Cc: sta...@dpdk.org > > Signed-off-by: Shahaf Shuler > --- > drivers/net/mlx5/mlx5_ethdev.c | 3 ++- > drivers/net/mlx5/mlx5_rss.c| 7 +++ > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c > index f5511ce70..365101af9 100644 > --- a/drivers/net/mlx5/mlx5_ethdev.c > +++ b/drivers/net/mlx5/mlx5_ethdev.c > @@ -329,7 +329,8 @@ mlx5_dev_configure(struct rte_eth_dev *dev) > if (use_app_rss_key && > (dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key_len != >rss_hash_default_key_len)) { > - /* MLX5 RSS only support 40bytes key. */ > + DRV_LOG(ERR, "port %u RSS key len must be %zu Bytes long", > + dev->data->port_id, rss_hash_default_key_len); > rte_errno = EINVAL; > return -rte_errno; > } > diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c > index 5ac650163..ceaa570ef 100644 > --- a/drivers/net/mlx5/mlx5_rss.c > +++ b/drivers/net/mlx5/mlx5_rss.c > @@ -48,6 +48,13 @@ mlx5_rss_hash_update(struct rte_eth_dev *dev, > return -rte_errno; > } > if (rss_conf->rss_key && rss_conf->rss_key_len) { > + if (rss_conf->rss_key_len != rss_hash_default_key_len) { > + DRV_LOG(ERR, > + "port %u RSS key len must be %zu Bytes long", > + dev->data->port_id, rss_hash_default_key_len); > + rte_errno = ENOTSUP; Should be EINVAL when values are incorrect. > + return -rte_errno; > + } > priv->rss_conf.rss_key = rte_realloc(priv->rss_conf.rss_key, >rss_conf->rss_key_len, 0); > if (!priv->rss_conf.rss_key) { > -- > 2.12.0 Thanks, -- Nélio Laranjeiro 6WIND
[dpdk-dev] [PATCH V16 0/4] add device event monitor framework
About hot plug in dpdk, We already have proactive way to add/remove devices through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver to offload the fail-safe work from the app user. But there are still lack of a general mechanism to monitor hotplug event for all driver, now the hotplug interrupt event is diversity between each device and driver, such as mlx4, pci driver and others. Use the hot removal event for example, pci drivers not all exposure the remove interrupt, so in order to make user to easy use the hot plug feature for pci driver, something must be done to detect the remove event at the kernel level and offer a new line of interrupt to the user land. Base on the uevent of kobject mechanism in kernel, we could use it to benefit for monitoring the hot plug status of the device which not only uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices. The idea is comming as bellow. a.The uevent message form FD monitoring which will be useful. remove@/devices/pci:80/:80:02.2/:82:00.0/:83:03.0/:84:00.2/uio/uio2 ACTION=remove DEVPATH=/devices/pci:80/:80:02.2/:82:00.0/:83:03.0/:84:00.2/uio/uio2 SUBSYSTEM=uio MAJOR=243 MINOR=2 DEVNAME=uio2 SEQNUM=11366 b.add uevent monitoring machanism: add several general api to enable uevent monitoring. c.add common uevent handler and uevent failure handler uevent of device should be handler at bus or device layer, and the memory read and write failure when hot removal should be handle correctly before detach behaviors. d.show example how to use uevent monitor enable uevent monitoring in testpmd or fail-safe to show usage. patchset history: v16->v15: 1.remove some linux related code out of eal common layer 2.fix some uneasy readble issue. v15->v14: 1.use exist eal interrupt epoll to replace of rte service usage for monitor thread, 2.add new device event handle type in eal interrupt. 3.remove the uevent type check and any policy from eal, let it check and management in user's callback. 4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature. v14->v13: 1.add __rte_experimental on function defind and fix bsd build issue v13->v12: 1.fix some logic issue and null check issue 2.fix monitor stop func issue v12->v11: 1.identify null param in callback for monitor all devices uevent v11->v10: 1:modify some typo and add experimental tag in new file. 2:modify callback register calling. v10->v9: 1.fix prefix issue. 2.use a common callback lists for all device and all type to replace add callback parameter into device struct. 3.delete some unuse part. v9->v8: split the patch set into small and explicit patch v8->v7: 1.use rte_service to replace pthread management. 2.fix defind issue and copyright issue 3.fix some lock issue v7->v6: 1.modify vdev part according to the vdev rework 2.re-define and split the func into common and bus specific code 3.fix some incorrect issue. 4.fix the system hung after send packcet issue. v6->v5: 1.add hot plug policy, in eal, default handle to prepare hot plug work for all pci device, then let app to manage to deside which device need to hot plug. 2.modify to manage event callback in each device. 3.fix some system hung issue when igb_uioome typo error.release. 4.modify the pci part to the bus-pci base on the bus rework. 5.add hot plug policy in app, show example to use hotplug list to manage to deside which device need to hot plug. v5->v4: 1.Move uevent monitor epolling from eal interrupt to eal device layer. 2.Redefine the eal device API for common, and distinguish between linux and bsd 3.Add failure handler helper api in bus layer.Add function of find device by name. 4.Replace of individual fd bind with single device, use a common fd to polling all device. 5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device 6.Refine some coding style and typos issue 7.add new callback to process hot insertion v4->v3: 1.move uevent monitor api from eal interrupt to eal device layer. 2.create uevent type and struct in eal device. 3.move uevent handler for each driver to eal layer. 4.add uevent failure handler to process signal fault issue. 5.add example for request and use uevent monitoring in testpmd. v3->v2: 1.refine some return error 2.refine the string searching logic to avoid memory issue v2->v1: 1.remove global variables of hotplug_fd, add uevent_fd in rte_intr_handle to let each pci device self maintain it fd, to fix dual device fd issue. 2.refine some typo error. Jeff Guo (4): eal: add device event handle in interrupt thread eal: add device event monitor framework eal/linux: uevent parse and process app/testpmd: enable device hotplug monitoring app/test-pmd/parameters.c | 5 +- app/test-pmd/testpmd.c | 195 +- app/test-pmd/testpmd.h | 11 + lib/librte_eal/bsdap
[dpdk-dev] [PATCH V16 4/4] app/testpmd: enable device hotplug monitoring
Use testpmd for example, to show an application how to use device event mechanism to monitor the hotplug event, involve both hot removal event and the hot insertion event. The process is that, testpmd first enable hotplug monitoring and register the user's callback, when device being hotplug insertion or hotplug removal, the eal monitor the event and call user's callbacks, the application according their hot plug policy to detach or attach the device from the bus. Signed-off-by: Jeff Guo --- 1.modify log and patch description. --- app/test-pmd/parameters.c | 5 +- app/test-pmd/testpmd.c| 195 +- app/test-pmd/testpmd.h| 11 +++ 3 files changed, 209 insertions(+), 2 deletions(-) diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index 97d22b8..825d602 100644 --- a/app/test-pmd/parameters.c +++ b/app/test-pmd/parameters.c @@ -186,6 +186,7 @@ usage(char* progname) printf(" --flow-isolate-all: " "requests flow API isolated mode on all ports at initialization time.\n"); printf(" --tx-offloads=0x: hexadecimal bitmask of TX queue offloads\n"); + printf(" --hot-plug: enalbe hot plug for device.\n"); } #ifdef RTE_LIBRTE_CMDLINE @@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv) { "print-event",1, 0, 0 }, { "mask-event", 1, 0, 0 }, { "tx-offloads",1, 0, 0 }, + { "hot-plug", 0, 0, 0 }, { 0, 0, 0, 0 }, }; @@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv) rte_exit(EXIT_FAILURE, "invalid mask-event argument\n"); } - + if (!strcmp(lgopts[opt_idx].name, "hot-plug")) + hot_plug = 1; break; case 'h': usage(argv[0]); diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 4c0e258..bb1ac8f 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -284,6 +285,9 @@ uint8_t lsc_interrupt = 1; /* enabled by default */ */ uint8_t rmv_interrupt = 1; /* enabled by default */ + +uint8_t hot_plug = 0; /**< hotplug disabled by default. */ + /* * Display or mask ether events * Default to all events except VF_MBOX @@ -384,6 +388,8 @@ uint8_t bitrate_enabled; struct gro_status gro_ports[RTE_MAX_ETHPORTS]; uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES; +static struct hotplug_request_list hp_list; + /* Forward function declarations */ static void map_port_queue_stats_mapping_registers(portid_t pi, struct rte_port *port); @@ -391,6 +397,14 @@ static void check_all_ports_link_status(uint32_t port_mask); static int eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param, void *ret_param); +static int eth_dev_event_callback(char *device_name, + enum rte_dev_event_type type, + void *param); +static int eth_dev_event_callback_register(portid_t port_id); +static bool in_hotplug_list(const char *dev_name); + +static int hotplug_list_add(struct rte_device *device, + enum rte_kernel_driver device_kdrv); /* * Check if all the ports are started. @@ -1853,6 +1867,27 @@ reset_port(portid_t pid) printf("Done\n"); } +static int +eth_dev_event_callback_register(portid_t port_id) +{ + int diag; + char device_name[128]; + + snprintf(device_name, sizeof(device_name), + "%s", rte_eth_devices[port_id].device->name); + + /* register the dev_event callback */ + + diag = rte_dev_callback_register(device_name, + eth_dev_event_callback, (void *)(intptr_t)port_id); + if (diag) { + printf("Failed to setup dev_event callback\n"); + return -1; + } + + return 0; +} + void attach_port(char *identifier) { @@ -1869,6 +1904,8 @@ attach_port(char *identifier) if (rte_eth_dev_attach(identifier, &pi)) return; + eth_dev_event_callback_register(pi); + socket_id = (unsigned)rte_eth_dev_socket_id(pi); /* if socket_id is invalid, set to 0 */ if (check_socket_id(socket_id) < 0) @@ -1880,6 +1917,12 @@ attach_port(char *identifier) ports[pi].port_status = RTE_PORT_STOPPED; + if (hot_plug) { + hotplug_list_add(rte_eth_devices[pi].device, +rte_eth_devices[pi].data->kdrv); + eth_dev_event_callback_register(pi); + } + printf("Port %d is attach
[dpdk-dev] [PATCH V16 1/4] eal: add device event handle in interrupt thread
Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for device event interrupt monitor. Signed-off-by: Jeff Guo --- v16->v15: split into small patch base on the function --- lib/librte_eal/common/include/rte_eal_interrupts.h | 1 + lib/librte_eal/linuxapp/eal/eal_interrupts.c | 5 - 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h index 3f792a9..6eb4932 100644 --- a/lib/librte_eal/common/include/rte_eal_interrupts.h +++ b/lib/librte_eal/common/include/rte_eal_interrupts.h @@ -34,6 +34,7 @@ enum rte_intr_handle_type { RTE_INTR_HANDLE_ALARM,/**< alarm handle */ RTE_INTR_HANDLE_EXT, /**< external handler */ RTE_INTR_HANDLE_VDEV, /**< virtual device */ + RTE_INTR_HANDLE_DEV_EVENT,/**< device event handle */ RTE_INTR_HANDLE_MAX /**< count of elements */ }; diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index f86f22f..842acaa 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -674,7 +674,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) bytes_read = 0; call = true; break; - + case RTE_INTR_HANDLE_DEV_EVENT: + bytes_read = 0; + call = true; + break; default: bytes_read = 1; break; -- 2.7.4
[dpdk-dev] [PATCH V16 2/4] eal: add device event monitor framework
This patch aims to add a general device event monitor mechanism at EAL device layer, for device hotplug awareness and actions adopted accordingly. It could also expand for all other type of device event monitor, but not in this scope at the stage. To get started, users firstly register or unregister callbacks through the new added APIs. Callbacks can be some device specific, or for all devices. -rte_dev_callback_register -rte_dev_callback_unregister Then application shall call below new added APIs to enable/disable the mechanism: - rte_dev_event_monitor_start - rte_dev_event_monitor_stop Use hotplug case for example, when device hotplug insertion or hotplug removal, we will get notified from kernel, then call user's callbacks accordingly to handle it, such as detach or attach the device from the bus, and could be benifit for futher fail-safe or live-migration. Signed-off-by: Jeff Guo --- v16->v15: 1.remove some linux related code out of eal common layer 2.fix some uneasy readble issue. --- lib/librte_eal/bsdapp/eal/Makefile | 1 + lib/librte_eal/bsdapp/eal/eal_dev.c | 19 + lib/librte_eal/common/eal_common_dev.c | 145 lib/librte_eal/common/eal_private.h | 24 ++ lib/librte_eal/common/include/rte_dev.h | 92 lib/librte_eal/linuxapp/eal/Makefile| 1 + lib/librte_eal/linuxapp/eal/eal_dev.c | 20 + 7 files changed, 302 insertions(+) create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile index dd455e6..c0921dd 100644 --- a/lib/librte_eal/bsdapp/eal/Makefile +++ b/lib/librte_eal/bsdapp/eal/Makefile @@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c +SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c # from common dir SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c new file mode 100644 index 000..ad606b3 --- /dev/null +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#include + +int __rte_experimental +rte_dev_event_monitor_start(void) +{ + RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); + return -1; +} + +int __rte_experimental +rte_dev_event_monitor_stop(void) +{ + RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n"); + return -1; +} diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index cd07144..3a1bbb6 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -14,9 +14,34 @@ #include #include #include +#include +#include #include "eal_private.h" +/* spinlock for device callbacks */ +static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER; + +/** + * The device event callback description. + * + * It contains callback address to be registered by user application, + * the pointer to the parameters for callback, and the device name. + */ +struct dev_event_callback { + TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */ + rte_dev_event_cb_fn cb_fn;/**< Callback address */ + void *cb_arg; /**< Callback parameter */ + char *dev_name; /**< Callback devcie name, NULL is for all device */ + uint32_t active;/**< Callback is executing */ +}; + +/** @internal Structure to keep track of registered callbacks */ +TAILQ_HEAD(dev_event_cb_list, dev_event_callback); + +/* The device event callback list for all registered callbacks. */ +static struct dev_event_cb_list dev_event_cbs; + static int cmp_detached_dev_name(const struct rte_device *dev, const void *_name) { @@ -207,3 +232,123 @@ rte_eal_hotplug_remove(const char *busname, const char *devname) rte_eal_devargs_remove(busname, devname); return ret; } + +static struct dev_event_callback * __rte_experimental +dev_event_cb_find(const char *device_name, rte_dev_event_cb_fn cb_fn, + void *cb_arg) +{ + struct dev_event_callback *event_cb = NULL; + + TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) { + if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) { + if (device_name == NULL && event_cb->dev_name == NULL) + break; + if (device_name == NULL || event_cb->dev_name == NULL) + continue; + if (!strcmp(event_cb->dev_name, device_name)) + break; +
[dpdk-dev] [PATCH V16 3/4] eal/linux: uevent parse and process
In order to handle the uevent which have been detected from the kernel side, add uevent parse and process function to translate the uevent into device event, which user has subscribe to monitor. Signed-off-by: Jeff Guo --- 1.move all linux specific together --- lib/librte_eal/linuxapp/eal/eal_dev.c | 214 +- 1 file changed, 211 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c index 5ab5830..90094c0 100644 --- a/lib/librte_eal/linuxapp/eal/eal_dev.c +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c @@ -2,19 +2,227 @@ * Copyright(c) 2018 Intel Corporation */ -#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include #include +#include +#include +#include +#include + +#include "eal_private.h" +#include "eal_thread.h" + +static struct rte_intr_handle intr_handle = {.fd = -1 }; +static bool monitor_not_started = true; + +#define EAL_UEV_MSG_LEN 4096 +#define EAL_UEV_MSG_ELEM_LEN 128 + +/* identify the system layer which event exposure from */ +enum eal_dev_event_subsystem { + EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */ + EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */ + EAL_DEV_EVENT_SUBSYSTEM_MAX +}; + +static int +dev_uev_monitor_fd_new(void) +{ + int uevent_fd; + + uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC | + SOCK_NONBLOCK, + NETLINK_KOBJECT_UEVENT); + if (uevent_fd < 0) { + RTE_LOG(ERR, EAL, "create uevent fd failed\n"); + return -1; + } + return uevent_fd; +} + +static int +dev_uev_monitor_create(int netlink_fd) +{ + struct sockaddr_nl addr; + int ret; + int size = 64 * 1024; + int nonblock = 1; + + memset(&addr, 0, sizeof(addr)); + addr.nl_family = AF_NETLINK; + addr.nl_pid = 0; + addr.nl_groups = 0x; + + if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) { + RTE_LOG(ERR, EAL, "bind failed\n"); + goto err; + } + + setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size)); + + ret = ioctl(netlink_fd, FIONBIO, &nonblock); + if (ret != 0) { + RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n"); + goto err; + } + return 0; +err: + close(netlink_fd); + return -1; +} + +static void +dev_uev_parse(const char *buf, struct rte_dev_event *event, int length) +{ + char action[EAL_UEV_MSG_ELEM_LEN]; + char subsystem[EAL_UEV_MSG_ELEM_LEN]; + char dev_path[EAL_UEV_MSG_ELEM_LEN]; + char pci_slot_name[EAL_UEV_MSG_ELEM_LEN]; + int i = 0; + + memset(action, 0, EAL_UEV_MSG_ELEM_LEN); + memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN); + memset(dev_path, 0, EAL_UEV_MSG_ELEM_LEN); + memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN); + + while (i < length) { + for (; i < length; i++) { + if (*buf) + break; + buf++; + } + if (!strncmp(buf, "ACTION=", 7)) { + buf += 7; + i += 7; + snprintf(action, sizeof(action), "%s", buf); + } else if (!strncmp(buf, "DEVPATH=", 8)) { + buf += 8; + i += 8; + snprintf(dev_path, sizeof(dev_path), "%s", buf); + } else if (!strncmp(buf, "SUBSYSTEM=", 10)) { + buf += 10; + i += 10; + snprintf(subsystem, sizeof(subsystem), "%s", buf); + } else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) { + buf += 14; + i += 14; + snprintf(pci_slot_name, sizeof(subsystem), "%s", buf); + event->devname = pci_slot_name; + } + for (; i < length; i++) { + if (*buf == '\0') + break; + buf++; + } + } + + if (!strncmp(subsystem, "uio", 3)) + event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO; + else if (!strncmp(subsystem, "pci", 3)) + event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI; + if (!strncmp(action, "add", 3)) + event->type = RTE_DEV_EVENT_ADD; + if (!strncmp(action, "remove", 6)) + event->type = RTE_DEV_EVENT_REMOVE; +} + +static int +dev_uev_receive(int fd, struct rte_dev_event *uevent) +{ + int ret; + char buf[EAL_UEV_MSG_LEN]; + + memset(uevent, 0, sizeof(struct rte_dev_event)); + memset(buf, 0, EAL_UEV_MSG_LEN); + + ret = recv(fd, buf, EAL_UE
Re: [dpdk-dev] [PATCH v2 04/18] eal: add lightweight kvarg parsing utility
On Fri, Mar 23, 2018 at 02:12:36PM +0100, Gaëtan Rivet wrote: > On Fri, Mar 23, 2018 at 07:54:11AM -0400, Neil Horman wrote: > > On Fri, Mar 23, 2018 at 10:31:22AM +0100, Gaëtan Rivet wrote: > > > On Thu, Mar 22, 2018 at 08:53:49PM -0400, Neil Horman wrote: > > > > On Thu, Mar 22, 2018 at 05:27:51PM +0100, Gaëtan Rivet wrote: > > > > > On Thu, Mar 22, 2018 at 10:10:37AM -0400, Neil Horman wrote: > > > > > > On Wed, Mar 21, 2018 at 05:32:24PM +, Wiles, Keith wrote: > > > > > > > > > > > > > > > > > > > > > > On Mar 21, 2018, at 12:15 PM, Gaetan Rivet > > > > > > > > wrote: > > > > > > > > > > > > > > > > This library offers a quick way to parse parameters passed with > > > > > > > > a > > > > > > > > key=value syntax. > > > > > > > > > > > > > > > > A single function is needed and finds the relevant element > > > > > > > > within the > > > > > > > > text. No dynamic allocation is performed. It is possible to > > > > > > > > chain the > > > > > > > > parsing of each pairs for quickly scanning a list. > > > > > > > > > > > > > > > > This utility is private to the EAL and should allow avoiding > > > > > > > > having to > > > > > > > > move around the more complete librte_kvargs. > > > > > > > > > > > > > > What is the big advantage with this code and the librte_kvargs > > > > > > > code. Is it just no allocation, rte_kvargs needs to be build > > > > > > > before parts of EAL or what? > > > > > > > > > > > > > > My concern is we have now two flavors one in EAL and one in > > > > > > > librte_kvargs, would it not be more reasonable to improve > > > > > > > rte_kvargs to remove your objections? I am all for fast, better, > > > > > > > stronger code :-) > > > > > > > > > > > > > +1, this really doesn't make much sense to me. Two parsing > > > > > > routines seems like > > > > > > its just asking for us to have to fix parsing bugs in two places. > > > > > > If allocation > > > > > > is a concern, I don't see why you can't just change the malloc in > > > > > > rte_kvargs_parse to an automatic allocation on the stack, or a > > > > > > preallocation set > > > > > > of kvargs that can be shared from init time. > > > > > > > > > > I think the existing allocation scheme is fine for other usages (in > > > > > drivers and so on). Not for what I wanted to do. > > > > > > > > > Ok, but thats an adressable issue. you can bifurcate the parse > > > > function to an > > > > internal function that accepts any preallocated kvargs struct, and > > > > export two > > > > wrapper functions, one which allocates the struct from the heap, > > > > another which > > > > allocated automatically on the stack. > > > > > > > > > > Sure, everything is possible. > > > > > Ok. > > > > > > > > librte_kvargs isn't > > > > > > necessecarily > > > > > > the best parsing library ever, but its not bad, and it just seems > > > > > > wrong to go > > > > > > re-inventing the wheel. > > > > > > > > > > > > > > > > It serves a different purpose than the one I'm pursuing. > > > > > > > > > > This helper is lightweight and private. If I wanted to integrate my > > > > > needs with librte_kvargs, I would be adding new functionalities, > > > > > making > > > > > it more complex, and for a use-case that is useless for the vast > > > > > majority of users of the lib. > > > > > > > > > Ok, to that end: > > > > > > > > 1) Privacy is not an issue (at least from my understanding of what your > > > > doing). > > > > If we start with the assumption that librte_kvargs is capable of > > > > satisfying your > > > > needs (even if its not done in an optimal way), the fact that your > > > > version of > > > > the function is internal to the library doesn't seem overly relevant, > > > > unless > > > > theres something critical to that privacy that I'm missing. > > > > > > > > > > Privacy is only a point I brought up to say that the impact of this > > > function is minimal. People looking to parse their kvargs should not > > > have any ambiguity regarding how they should do so. Only librte_kvargs > > > is available. > > > > > Ok, would you also council others developing dpdk apps to write their own > > parsing routines when what they needed was trivial for the existing library? > > You are people too :) > > > > > > 2) Lightweight function seems like something that can be integrated > > > > with > > > > librte_kvargs. Looking at it, what may I ask in librte_kvargs is > > > > insufficiently > > > > non-performant for your needs, specifically? We talked about the heap > > > > allocation above, is there something else? The string duplication > > > > perhaps? > > > > > > > > > > > > > > Mostly the way to use it. > > > The filter strings are > > > bus=value,.../class=value,... > > > > > > where either bus= list or class= list can be omitted, but at least one > > > must appear. > > > > > Ok, so whats the problem with using librte_kvargs for that? Is it that the > > list >
Re: [dpdk-dev] [PATCH 2/2] net/mlx5: fix RSS key len query
On Mon, Mar 26, 2018 at 01:12:19PM +0300, Shahaf Shuler wrote: > The RSS key length returned by rte_eth_dev_info_get command was taken > from the > PMD private structure. This structure initialization was done only after > the port configuration. > > Considering Mellanox device supports only 40B long RSS key, reporting > the fixed number instead. > > Fixes: 29c1d8bb3e79 ("net/mlx5: handle a single RSS hash key for all > protocols") > Cc: sta...@dpdk.org > Cc: nelio.laranje...@6wind.com > > Signed-off-by: Shahaf Shuler > --- > drivers/net/mlx5/mlx5_ethdev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c > index 365101af9..b6f5101cf 100644 > --- a/drivers/net/mlx5/mlx5_ethdev.c > +++ b/drivers/net/mlx5/mlx5_ethdev.c > @@ -428,7 +428,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct > rte_eth_dev_info *info) > info->if_index = if_nametoindex(ifname); > info->reta_size = priv->reta_idx_n ? > priv->reta_idx_n : config->ind_table_max_size; > - info->hash_key_size = priv->rss_conf.rss_key_len; > + info->hash_key_size = rss_hash_default_key_len; > info->speed_capa = priv->link_speed_capa; > info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK; > } > -- > 2.12.0 Acked-by: Nelio Laranjeiro Regards, -- Nélio Laranjeiro 6WIND
Re: [dpdk-dev] [PATCH v3] net/mlx4: support CRC strip toggling
On 3/25/2018 9:19 PM, Ophir Munk wrote: > Previous to this commit mlx4 CRC stripping was executed by default and > there was no verbs API to disable it. Are you aware of the discussion about CRC [1]? Is this patch compatible with plans? [1] https://dpdk.org/dev/patchwork/patch/36415/ > Signed-off-by: Ophir Munk > --- > v1: initial version > v2: following internal reviews > v3: following dpdk.org mailing list reviews <...>
Re: [dpdk-dev] [PATCH 2/2] dev: use rte_kvargs
On Fri, Mar 23, 2018 at 07:45:03PM +0100, Gaetan Rivet wrote: > Signed-off-by: Gaetan Rivet > --- > > Cc: Neil Horman > I'm actually ok with this, but as Keith noted, I'm not sure why you didn't just: 1) Add the ability to create a grouping key, so that key value pairs could contain a list of comma separated values (something like '{}' to denote that everything between the characters was the value in a kv pair, regardless of other tokenizing characters in the value). 2) Add the ability to recursively parse the value into a list of tokens 3) Layer your functionality on top of (1) and (2), as Keith noted Neil > I find using rte_parse_kv cleaner. > The function rte_dev_iterator_init is already ugly enough as it is. > This is really not helping. > > lib/librte_eal/common/eal_common_dev.c | 127 > + > lib/librte_eal/linuxapp/eal/Makefile | 1 + > 2 files changed, 83 insertions(+), 45 deletions(-) > > diff --git a/lib/librte_eal/common/eal_common_dev.c > b/lib/librte_eal/common/eal_common_dev.c > index 21703b777..9f1a0ebda 100644 > --- a/lib/librte_eal/common/eal_common_dev.c > +++ b/lib/librte_eal/common/eal_common_dev.c > @@ -15,6 +15,7 @@ > #include > #include > #include > +#include > #include > > #include "eal_private.h" > @@ -270,12 +271,15 @@ rte_eal_hotplug_remove(const char *busname, const char > *devname) > } > > int __rte_experimental > -rte_dev_iterator_init(struct rte_dev_iterator *it, const char *str) > +rte_dev_iterator_init(struct rte_dev_iterator *it, > + const char *devstr) > { > - struct rte_bus *bus = NULL; > + struct rte_kvargs *kvlist = NULL; > struct rte_class *cls = NULL; > - struct rte_kvarg kv; > - char *slash; > + struct rte_bus *bus = NULL; > + struct rte_kvargs_pair *kv; > + char *slash = NULL; > + char *str = NULL; > > /* Having both busstr and clsstr NULL is illegal, >* marking this iterator as invalid unless > @@ -283,98 +287,131 @@ rte_dev_iterator_init(struct rte_dev_iterator *it, > const char *str) >*/ > it->busstr = NULL; > it->clsstr = NULL; > + str = strdup(devstr); > + if (str == NULL) { > + rte_errno = ENOMEM; > + goto get_out; > + } > + slash = strchr(str, '/'); > + if (slash != NULL) { > + slash[0] = '\0'; > + slash = strchr(devstr, '/') + 1; > + } > /* Safety checks and prep-work */ > - if (rte_parse_kv(str, &kv)) { > + kvlist = rte_kvargs_parse(str, NULL); > + if (kvlist == NULL) { > RTE_LOG(ERR, EAL, "Could not parse: %s\n", str); > rte_errno = EINVAL; > - return -rte_errno; > + goto get_out; > } > it->device = NULL; > it->class_device = NULL; > - if (strcmp(kv.key, "bus") == 0) { > - bus = rte_bus_find_by_name(kv.value); > + kv = &kvlist->pairs[0]; > + if (strcmp(kv->key, "bus") == 0) { > + bus = rte_bus_find_by_name(kv->value); > if (bus == NULL) { > RTE_LOG(ERR, EAL, "Could not find bus \"%s\"\n", > - kv.value); > + kv->value); > rte_errno = EFAULT; > - return -rte_errno; > + goto get_out; > } > - slash = strchr(str, '/'); > if (slash != NULL) { > - if (rte_parse_kv(slash + 1, &kv)) { > + rte_kvargs_free(kvlist); > + kvlist = rte_kvargs_parse(slash, NULL); > + if (kvlist == NULL) { > RTE_LOG(ERR, EAL, "Could not parse: %s\n", > - slash + 1); > + slash); > rte_errno = EINVAL; > - return -rte_errno; > + goto get_out; > } > - cls = rte_class_find_by_name(kv.value); > + kv = &kvlist->pairs[0]; > + if (strcmp(kv->key, "class")) { > + RTE_LOG(ERR, EAL, "Additional layer must be a > class\n"); > + rte_errno = EINVAL; > + goto get_out; > + } > + cls = rte_class_find_by_name(kv->value); > if (cls == NULL) { > RTE_LOG(ERR, EAL, "Could not find class > \"%s\"\n", > - kv.value); > + kv->value); > rte_errno = EFAULT; > - return -rte_errno; > + goto get_out; > } > } > - } else if (strcmp(kv.key, "class") == 0) { > -
Re: [dpdk-dev] [PATCH] net/mlx5: add supported hash function check
On Thu, Mar 22, 2018 at 10:42:44AM +, Xueming(Steven) Li wrote: > Just remind, denying unsupported hash function in rte_eth_dev_configure() > might > impact some user app using PMD that simply ignoring them silently. If the default behavior from other devices is to use only possible values, this device should to the same instead of refusing it. > Testpmd command "port config rss all" should be updated as well > to 'all' supported values from rte_eth_dev_info, I'll include this change in > next version. > > > -Original Message- > > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] > > Sent: Monday, March 19, 2018 4:30 PM > > To: Xueming(Steven) Li > > Cc: Adrien Mazarguil ; Shahaf Shuler > > ; dev@dpdk.org > > Subject: Re: [PATCH] net/mlx5: add supported hash function check > > > > On Sun, Mar 18, 2018 at 03:37:20PM +0800, Xueming Li wrote: > > > Add supported RSS hash function check in device configuration to have > > > better error verbosity for application developers. > > > > > > Signed-off-by: Xueming Li > > > --- > > > drivers/net/mlx5/mlx5_ethdev.c | 8 > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c > > > b/drivers/net/mlx5/mlx5_ethdev.c index b73cb53..175a1ff 100644 > > > --- a/drivers/net/mlx5/mlx5_ethdev.c > > > +++ b/drivers/net/mlx5/mlx5_ethdev.c > > > @@ -346,6 +346,14 @@ struct ethtool_link_settings { > > > rx_offloads, supp_rx_offloads); > > > return ENOTSUP; > > > } > > > + if (dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf & > > > + MLX5_RSS_HF_MASK) { > > > + ERROR("Some RSS hash function not supported " > > > + "requested 0x%" PRIx64 " supported 0x%" PRIx64, > > > + dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf, > > > + (uint64_t)(~MLX5_RSS_HF_MASK)); > > > + return ENOTSUP; > > > + } > > > if (use_app_rss_key && > > > (dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key_len != > > >rss_hash_default_key_len)) { > > > -- > > > 1.8.3.1 > > > > > > > I would answer than an application should not try to configure something > > not advertise by the device. > > This information is present in struct rte_eth_dev_info returned by > > mlx5_dev_infos_get() and thus the devops of the device. > > > > Seems rte_eth_dev_configure() should be fixed to avoid configuring wrong > > values. > > > > Regards, > > > > -- > > Nélio Laranjeiro > > 6WIND -- Nélio Laranjeiro 6WIND
Re: [dpdk-dev] [PATCH] compressdev: implement API
> -Original Message- > From: Thomas Monjalon [mailto:tho...@monjalon.net] > Sent: Saturday, March 24, 2018 1:02 AM > To: Trahe, Fiona > Cc: dev@dpdk.org; ahmed.mans...@nxp.com; shally.ve...@cavium.com; De Lara > Guarch, Pablo > ; fiona.tr...@gmail.com > Subject: Re: [dpdk-dev] [PATCH] compressdev: implement API > > 23/03/2018 19:08, Trahe, Fiona: > > From: Thomas Monjalon [mailto:tho...@monjalon.net] > > > 02/02/2018 19:25, Fiona Trahe: > > > > lib/librte_compressdev/rte_comp.h | 503 > > > > > > Why rte_comp.h instead of the more consistent rte_compress.h? > > [Fiona] I did originally... but ran into difficulty with horribly names > > like > > RTE_COMPRESS_COMPRESS > > RTE_COMPRESS_DECOMPRESS > > rte_compress_compress_xform > > rte_compress_decompress_xform > > So compress is both the module prefix and the name of one of the actions. > > I could have used compressdev - but names were very long. > > So decided to opt for using > > _compressdev_ in names to do with the device and > > _comp_ in names to do with the compression service > > > > Also I could have used compdev instead of compressdev, > > but I felt compress should be in the lib name > > I understand your concerns. > I don't like "comp" very much because it sounds like "comparison". > However, I don't have a better idea. > Sometimes naming is more difficult than coding :) [Fiona] True :)
Re: [dpdk-dev] [PATCH v3] net/mlx4: support CRC strip toggling
On Mon, Mar 26, 2018 at 12:38:22PM +0100, Ferruh Yigit wrote: > On 3/25/2018 9:19 PM, Ophir Munk wrote: > > Previous to this commit mlx4 CRC stripping was executed by default and > > there was no verbs API to disable it. > > Are you aware of the discussion about CRC [1]? Is this patch compatible with > plans? > > [1] > https://dpdk.org/dev/patchwork/patch/36415/ I wasn't aware of this notice. Looks like it makes this patch unnecessary since mlx4 always strip by default; this patch makes it configurable at will and only exposes the capability when HW supports its configuration (i.e. the ability to leave CRC inside mbuf). We'd just need mlx4 to not expose DEV_RX_OFFLOAD_CRC_STRIP at all in mlx5_get_rx_queue_offloads() in order to fully comply. I leave it up to you, I don't mind if we include this patch only to revert it later when we finally get rid of DEV_RX_OFFLOAD_CRC_STRIP. > > Signed-off-by: Ophir Munk > > --- > > v1: initial version > > v2: following internal reviews > > v3: following dpdk.org mailing list reviews > > <...> -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH] event/opdl: fix atomic queue race condition issue
> From: Ma, Liang J > Sent: Tuesday, March 13, 2018 11:34 AM > To: jerin.ja...@caviumnetworks.com > Cc: dev@dpdk.org; Van Haaren, Harry ; Jain, Deepak > K ; Geary, John ; Mccarthy, > Peter > Subject: [PATCH] event/opdl: fix atomic queue race condition issue > > If application link one atomic queue to multiple ports, > and each worker core update flow_id, there will have a > chance to hit race condition issue and lead to double processing > same event. This fix solve the problem and eliminate > the race condition issue. > > Fixes: 4236ce9bf5bf ("event/opdl: add OPDL ring infrastructure library") > General notes - Spaces around & % << >> and other bitwise manipulations (https://dpdk.org/doc/guides/contributing/coding_style.html#operators) - I've noted a few below, but there are more - Usually checkpatch flags these - I'm curious why it didn't in this case It would be nice if we didn't have to rely on __atomic_load_n() and friends, however I don't see a better alternative. Given other DPDK components are also using __atomic_* functions, no objection here. > @@ -520,7 +528,17 @@ opdl_stage_claim_singlethread(struct opdl_stage *s, void > *entries, > > for (j = 0; j < num_entries; j++) { > ev = (struct rte_event *)get_slot(t, s->head+j); > - if ((ev->flow_id%s->nb_instance) == s->instance_id) { Spaces around the % > + > + event = __atomic_load_n(&(ev->event), > + __ATOMIC_ACQUIRE); > + > + opa_id = OPDL_OPA_MASK&(event>>OPDL_OPA_OFFSET); Spaces & > + flow_id = OPDL_FLOWID_MASK&event; Spaces & > + > + if (opa_id >= s->queue_id) > + continue; > + > + if ((flow_id%s->nb_instance) == s->instance_id) { Spaces % Will re-review v2. Cheers, -Harry
Re: [dpdk-dev] [PATCH v3] net/mlx4: support CRC strip toggling
26/03/2018 13:54, Adrien Mazarguil: > On Mon, Mar 26, 2018 at 12:38:22PM +0100, Ferruh Yigit wrote: > > On 3/25/2018 9:19 PM, Ophir Munk wrote: > > > Previous to this commit mlx4 CRC stripping was executed by default and > > > there was no verbs API to disable it. > > > > Are you aware of the discussion about CRC [1]? Is this patch compatible > > with plans? > > > > [1] > > https://dpdk.org/dev/patchwork/patch/36415/ > > I wasn't aware of this notice. Looks like it makes this patch unnecessary > since mlx4 always strip by default; this patch makes it configurable at will > and only exposes the capability when HW supports its configuration (i.e. the > ability to leave CRC inside mbuf). > > We'd just need mlx4 to not expose DEV_RX_OFFLOAD_CRC_STRIP at all in > mlx5_get_rx_queue_offloads() in order to fully comply. > > I leave it up to you, I don't mind if we include this patch only to revert > it later when we finally get rid of DEV_RX_OFFLOAD_CRC_STRIP. A new flag to keep CRC will be introduced. We will need toggling anyway.
[dpdk-dev] DPDK techboard minutes of February 28
Meeting notes for the DPDK technical board meeting held on 2018-02-28 Attendees: - Bruce Richardson - Ferruh Yigit - Hemant Agrawal - Konstantin Ananyev - Olivier Matz - Stephen Hemminger - Thomas Monjalon - Yuanhan Liu 1/ There was an approval to publish the draft for Windows port in https://dpdk.org/browse/draft/dpdk-draft-windows/ 2/ The default policy is to host draft code on GitHub, unless the technical board decides an exception when the work is significant enough to warrant hosting on dpdk.org. It is possible to pin 6 GitHub projects from https://github.com/dpdk. It will allow to give visibility to draft trees. The naming of such tree should be prefixed: "dpdk-draft-X" The project description in GitHub should highlight its temporary purpose: "devel only tree for drafting X feature" 3/ There is an agreement to have 1 active maintainer per dpdk.org tree, plus 1 backup maintainer. The trees and their maintainers should be listed at the top of the MAINTAINERS file. 4/ The progress for the deprecation of the offloads API was discussed. It was confirmed to try converting all PMDs for 18.05. The PMD which are late in this conversion could be disabled. More emails were planned to be sent to make sure nobody missed it.
[dpdk-dev] DPDK techboard minutes of March 14
Meeting notes for the DPDK technical board meeting held on 2018-03-14 Attendees: - Bruce Richardson - Ferruh Yigit - Hemant Agrawal - Jerin Jacob - Konstantin Ananyev - Olivier Matz - Stephen Hemminger - Thomas Monjalon - Yuanhan Liu 0/ Check action items from last meetings - list of files and initial authors for SPDX compliance Hemant will send the list soon. - list of gaps in the new build system Bruce has not documented them yet. - old patches in patchwork Thomas should work with patchwork community to get more tooling. We must send comments in old patch threads and close some. - backup maintainer guidelines Some sub-trees (net, crypto, event) should find a backup. - MAINTAINERS file for trees Thomas will send a patch to sort trees at the beginning of the file. 1/ Stable releases roadmap: http://dpdk.org/ml/archives/web/2018-March/000615.html - stable maintainers Kevin Traynor was approved as maintainer of the LTS branch 18.11. - release schedule It looks difficult to have an early first stable release. The earlier can be after the next RC1 validation tests are passed. Suggestions should be sent in the email thread. 2) Check progress of few changes: - memory rework This big change is under testing and still on track for 18.05. - offload API Patches are missing for some PMDs. Update: there are commitments for all of them in 18.05. The deprecation notice will be updated to remove only the old driver interface in 18.05. The application interface (API) should be removed in 18.08. - uevent hotplug Still a lot of opens to discuss but not enough interest in emails. - devargs syntax Gaetan sent some patches which must be reviewed quickly. - CRC_STRIP offload Default should be CRC stripping. If a PMD lacks the capability, it can be emulated. A new flag to keep the CRC must be introduced. - minimal kernel requirement Not discussed because of time out.
[dpdk-dev] [PATCH 3/3] net/sfc: add device parameter to choose FW variant
From: Roman Zhukov Add support of choice the preferred firmware variant to use in device parameters. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko --- doc/guides/nics/sfc_efx.rst | 15 ++ drivers/net/sfc/sfc.c| 123 ++- drivers/net/sfc/sfc_ethdev.c | 1 + drivers/net/sfc/sfc_kvargs.c | 1 + drivers/net/sfc/sfc_kvargs.h | 12 + 5 files changed, 151 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst index 2e4c3d8..2bd29cc 100644 --- a/doc/guides/nics/sfc_efx.rst +++ b/doc/guides/nics/sfc_efx.rst @@ -330,6 +330,21 @@ boolean parameters value. firmware version is 6.2.1.1033 or higher, otherwise any positive value will select a fixed update period of **1000** milliseconds +- ``fw_variant`` [dont-care|full-feature|ultra-low-latency| + capture-packed-stream] (default **dont-care**) + + Choose the preferred firmware variant to use. In order for the selected + option to have an effect, the **sfboot** utility must be configured with the + **auto** firmware-variant option. The preferred firmware variant applies to + all ports on the NIC. + **dont-care** ensures that the driver can attach to an unprivileged function. + The datapath firmware type to use is controlled by the **sfboot** + utility. + **full-feature** chooses full featured firmware. + **ultra-low-latency** chooses firmware with fewer features but lower latency. + **capture-packed-stream** chooses firmware for SolarCapture packed stream + mode. + Dynamic Logging Parameters ~~ diff --git a/drivers/net/sfc/sfc.c b/drivers/net/sfc/sfc.c index 2a326fc..e2ba720 100644 --- a/drivers/net/sfc/sfc.c +++ b/drivers/net/sfc/sfc.c @@ -20,6 +20,7 @@ #include "sfc_ev.h" #include "sfc_rx.h" #include "sfc_tx.h" +#include "sfc_kvargs.h" int @@ -740,6 +741,126 @@ sfc_detach(struct sfc_adapter *sa) sa->state = SFC_ADAPTER_UNINITIALIZED; } +static int +sfc_kvarg_fv_variant_handler(__rte_unused const char *key, +const char *value_str, void *opaque) +{ + uint32_t *value = opaque; + + if (strcasecmp(value_str, SFC_KVARG_FW_VARIANT_DONT_CARE) == 0) + *value = EFX_FW_VARIANT_DONT_CARE; + else if (strcasecmp(value_str, SFC_KVARG_FW_VARIANT_FULL_FEATURED) == 0) + *value = EFX_FW_VARIANT_FULL_FEATURED; + else if (strcasecmp(value_str, SFC_KVARG_FW_VARIANT_LOW_LATENCY) == 0) + *value = EFX_FW_VARIANT_LOW_LATENCY; + else if (strcasecmp(value_str, SFC_KVARG_FW_VARIANT_PACKED_STREAM) == 0) + *value = EFX_FW_VARIANT_PACKED_STREAM; + else + return -EINVAL; + + return 0; +} + +static int +sfc_get_fw_variant(struct sfc_adapter *sa, efx_fw_variant_t *efv) +{ + efx_nic_fw_info_t enfi; + int rc; + + rc = efx_nic_get_fw_version(sa->nic, &enfi); + if (rc != 0) + return rc; + else if (!enfi.enfi_dpcpu_fw_ids_valid) + return ENOTSUP; + + /* +* Firmware variant can be uniquely identified by the RxDPCPU +* firmware id +*/ + switch (enfi.enfi_rx_dpcpu_fw_id) { + case EFX_RXDP_FULL_FEATURED_FW_ID: + *efv = EFX_FW_VARIANT_FULL_FEATURED; + break; + + case EFX_RXDP_LOW_LATENCY_FW_ID: + *efv = EFX_FW_VARIANT_LOW_LATENCY; + break; + + case EFX_RXDP_PACKED_STREAM_FW_ID: + *efv = EFX_FW_VARIANT_PACKED_STREAM; + break; + + default: + /* +* Other firmware variants are not considered, since they are +* not supported in the device parameters +*/ + *efv = EFX_FW_VARIANT_DONT_CARE; + break; + } + + return 0; +} + +static const char * +sfc_fw_variant2str(efx_fw_variant_t efv) +{ + switch (efv) { + case EFX_RXDP_FULL_FEATURED_FW_ID: + return SFC_KVARG_FW_VARIANT_FULL_FEATURED; + case EFX_RXDP_LOW_LATENCY_FW_ID: + return SFC_KVARG_FW_VARIANT_LOW_LATENCY; + case EFX_RXDP_PACKED_STREAM_FW_ID: + return SFC_KVARG_FW_VARIANT_PACKED_STREAM; + default: + return "unknown"; + } +} + +static int +sfc_nic_probe(struct sfc_adapter *sa) +{ + efx_nic_t *enp = sa->nic; + efx_fw_variant_t preferred_efv; + efx_fw_variant_t efv; + int rc; + + preferred_efv = EFX_FW_VARIANT_DONT_CARE; + rc = sfc_kvargs_process(sa, SFC_KVARG_FW_VARIANT, + sfc_kvarg_fv_variant_handler, + &preferred_efv); + if (rc != 0) { + sfc_err(sa, "invalid %s parameter value", SFC_KVARG_FW_VARIANT); + return rc; + } + + rc = efx_nic_probe(enp, preferred_efv); + if (rc == EACCES) { + /* Unp
[dpdk-dev] [PATCH 2/3] net/sfc/base: add values for RxDPCPU firmware id recognition
From: Roman Zhukov Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko --- drivers/net/sfc/base/efx.h | 7 +++ drivers/net/sfc/base/efx_nic.c | 12 2 files changed, 19 insertions(+) diff --git a/drivers/net/sfc/base/efx.h b/drivers/net/sfc/base/efx.h index 4994745..fd9f059 100644 --- a/drivers/net/sfc/base/efx.h +++ b/drivers/net/sfc/base/efx.h @@ -1294,6 +1294,13 @@ extern const efx_nic_cfg_t * efx_nic_cfg_get( __inefx_nic_t *enp); +/* RxDPCPU firmware id values by which FW variant can be identified */ +#defineEFX_RXDP_FULL_FEATURED_FW_ID0x0 +#defineEFX_RXDP_LOW_LATENCY_FW_ID 0x1 +#defineEFX_RXDP_PACKED_STREAM_FW_ID0x2 +#defineEFX_RXDP_RULES_ENGINE_FW_ID 0x5 +#defineEFX_RXDP_DPDK_FW_ID 0x6 + typedef struct efx_nic_fw_info_s { /* Basic FW version information */ uint16_tenfi_mc_fw_version[4]; diff --git a/drivers/net/sfc/base/efx_nic.c b/drivers/net/sfc/base/efx_nic.c index 3be32ad..8014dee 100644 --- a/drivers/net/sfc/base/efx_nic.c +++ b/drivers/net/sfc/base/efx_nic.c @@ -604,6 +604,18 @@ efx_nic_get_fw_version( EFSYS_ASSERT3U(enp->en_mod_flags, &, EFX_MOD_MCDI); EFSYS_ASSERT3U(enp->en_features, &, EFX_FEATURE_MCDI); + /* Ensure RXDP_FW_ID codes match with MC_CMD_GET_CAPABILITIES codes */ + EFX_STATIC_ASSERT(EFX_RXDP_FULL_FEATURED_FW_ID == + MC_CMD_GET_CAPABILITIES_OUT_RXDP); + EFX_STATIC_ASSERT(EFX_RXDP_LOW_LATENCY_FW_ID == + MC_CMD_GET_CAPABILITIES_OUT_RXDP_LOW_LATENCY); + EFX_STATIC_ASSERT(EFX_RXDP_PACKED_STREAM_FW_ID == + MC_CMD_GET_CAPABILITIES_OUT_RXDP_PACKED_STREAM); + EFX_STATIC_ASSERT(EFX_RXDP_RULES_ENGINE_FW_ID == + MC_CMD_GET_CAPABILITIES_OUT_RXDP_RULES_ENGINE); + EFX_STATIC_ASSERT(EFX_RXDP_DPDK_FW_ID == + MC_CMD_GET_CAPABILITIES_OUT_RXDP_DPDK); + rc = efx_mcdi_version(enp, mc_fw_version, NULL, NULL); if (rc != 0) goto fail2; -- 2.7.4
[dpdk-dev] [PATCH 1/3] net/sfc/base: add support to choose firmware variant
From: Gautam Dawar Signed-off-by: Gautam Dawar Signed-off-by: Andrew Rybchenko --- drivers/net/sfc/base/efx.h | 15 ++- drivers/net/sfc/base/efx_impl.h | 1 + drivers/net/sfc/base/efx_mcdi.c | 14 ++ drivers/net/sfc/base/efx_nic.c | 23 ++- drivers/net/sfc/sfc.c | 2 +- 5 files changed, 48 insertions(+), 7 deletions(-) diff --git a/drivers/net/sfc/base/efx.h b/drivers/net/sfc/base/efx.h index bb903e5..4994745 100644 --- a/drivers/net/sfc/base/efx.h +++ b/drivers/net/sfc/base/efx.h @@ -129,9 +129,22 @@ efx_nic_create( __inefsys_lock_t *eslp, __deref_out efx_nic_t **enpp); +/* EFX_FW_VARIANT codes map one to one on MC_CMD_FW codes */ +typedef enum efx_fw_variant_e { + EFX_FW_VARIANT_FULL_FEATURED, + EFX_FW_VARIANT_LOW_LATENCY, + EFX_FW_VARIANT_PACKED_STREAM, + EFX_FW_VARIANT_HIGH_TX_RATE, + EFX_FW_VARIANT_PACKED_STREAM_HASH_MODE_1, + EFX_FW_VARIANT_RULES_ENGINE, + EFX_FW_VARIANT_DPDK, + EFX_FW_VARIANT_DONT_CARE = 0x +} efx_fw_variant_t; + extern __checkReturn efx_rc_t efx_nic_probe( - __inefx_nic_t *enp); + __inefx_nic_t *enp, + __inefx_fw_variant_t efv); extern __checkReturn efx_rc_t efx_nic_init( diff --git a/drivers/net/sfc/base/efx_impl.h b/drivers/net/sfc/base/efx_impl.h index a1bd03d..b1d4f57 100644 --- a/drivers/net/sfc/base/efx_impl.h +++ b/drivers/net/sfc/base/efx_impl.h @@ -647,6 +647,7 @@ struct efx_nic_s { const efx_ev_ops_t *en_eevop; const efx_tx_ops_t *en_etxop; const efx_rx_ops_t *en_erxop; + efx_fw_variant_tefv; #if EFSYS_OPT_FILTER efx_filter_ten_filter; const efx_filter_ops_t *en_efop; diff --git a/drivers/net/sfc/base/efx_mcdi.c b/drivers/net/sfc/base/efx_mcdi.c index a78a226..d8b4598 100644 --- a/drivers/net/sfc/base/efx_mcdi.c +++ b/drivers/net/sfc/base/efx_mcdi.c @@ -1264,13 +1264,19 @@ efx_mcdi_drv_attach( req.emr_out_length = MC_CMD_DRV_ATTACH_EXT_OUT_LEN; /* -* Use DONT_CARE for the datapath firmware type to ensure that the -* driver can attach to an unprivileged function. The datapath firmware -* type to use is controlled by the 'sfboot' utility. +* Typically, client drivers use DONT_CARE for the datapath firmware +* type to ensure that the driver can attach to an unprivileged +* function. The datapath firmware type to use is controlled by the +* 'sfboot' utility. +* If a client driver wishes to attach with a specific datapath firmware +* type, that can be passed in second argument of efx_nic_probe API. One +* such example is the ESXi native driver that attempts attaching with +* FULL_FEATURED datapath firmware type first and fall backs to +* DONT_CARE datapath firmware type if MC_CMD_DRV_ATTACH fails. */ MCDI_IN_SET_DWORD(req, DRV_ATTACH_IN_NEW_STATE, attach ? 1 : 0); MCDI_IN_SET_DWORD(req, DRV_ATTACH_IN_UPDATE, 1); - MCDI_IN_SET_DWORD(req, DRV_ATTACH_IN_FIRMWARE_ID, MC_CMD_FW_DONT_CARE); + MCDI_IN_SET_DWORD(req, DRV_ATTACH_IN_FIRMWARE_ID, enp->efv); efx_mcdi_execute(enp, &req); diff --git a/drivers/net/sfc/base/efx_nic.c b/drivers/net/sfc/base/efx_nic.c index 35e84e3..3be32ad 100644 --- a/drivers/net/sfc/base/efx_nic.c +++ b/drivers/net/sfc/base/efx_nic.c @@ -290,7 +290,8 @@ efx_nic_create( __checkReturn efx_rc_t efx_nic_probe( - __inefx_nic_t *enp) + __inefx_nic_t *enp, + __inefx_fw_variant_t efv) { const efx_nic_ops_t *enop; efx_rc_t rc; @@ -301,7 +302,27 @@ efx_nic_probe( #endif /* EFSYS_OPT_MCDI */ EFSYS_ASSERT(!(enp->en_mod_flags & EFX_MOD_PROBE)); + /* Ensure FW variant codes match with MC_CMD_FW codes */ + EFX_STATIC_ASSERT(EFX_FW_VARIANT_FULL_FEATURED == + MC_CMD_FW_FULL_FEATURED); + EFX_STATIC_ASSERT(EFX_FW_VARIANT_LOW_LATENCY == + MC_CMD_FW_LOW_LATENCY); + EFX_STATIC_ASSERT(EFX_FW_VARIANT_PACKED_STREAM == + MC_CMD_FW_PACKED_STREAM); + EFX_STATIC_ASSERT(EFX_FW_VARIANT_HIGH_TX_RATE == + MC_CMD_FW_HIGH_TX_RATE); + EFX_STATIC_ASSERT(EFX_FW_VARIANT_PACKED_STREAM_HASH_MODE_1 == + MC_CMD_FW_PACKED_STREAM_HASH_MODE_1); + EFX_STATIC_ASSERT(EFX_FW_VARIANT_RULES_ENGINE == + MC_CMD_FW_RULES_ENGINE); + EFX_STATIC_ASSERT(EFX_FW_VARIANT_DPDK == + MC_CMD_FW_DPDK); + EFX_STATIC_ASSERT(EFX_FW_VARIANT_DONT_CARE == + (int)MC_CMD_FW_DONT_CARE); + enop = enp->en_enop; + enp->efv = efv; + if ((rc = enop->eno_probe(enp)) != 0) goto fail1; diff --git a/drivers/net/sfc/sfc.c b/drivers/net/sfc/sfc.c index 681e117..2a326fc 100644 --- a/drivers/net/sfc/sfc.c +++ b/drivers/net
[dpdk-dev] [PATCH 0/3] net/sfc: add device parameter to choose FW variant
Patch 'net/sfc: add device parameter to choose FW variant' has checkpatches.sh warning since positive errno is used inside the driver. Gautam Dawar (1): net/sfc/base: add support to choose firmware variant Roman Zhukov (2): net/sfc/base: add values for RxDPCPU firmware id recognition net/sfc: add device parameter to choose FW variant doc/guides/nics/sfc_efx.rst | 15 + drivers/net/sfc/base/efx.h | 22 ++- drivers/net/sfc/base/efx_impl.h | 1 + drivers/net/sfc/base/efx_mcdi.c | 14 +++-- drivers/net/sfc/base/efx_nic.c | 35 +++- drivers/net/sfc/sfc.c | 123 +++- drivers/net/sfc/sfc_ethdev.c| 1 + drivers/net/sfc/sfc_kvargs.c| 1 + drivers/net/sfc/sfc_kvargs.h| 12 9 files changed, 217 insertions(+), 7 deletions(-) -- 2.7.4
Re: [dpdk-dev] [PATCH 0/3] add ifcvf driver
On 03/26/2018 11:05 AM, Wang, Xiao W wrote: Hi Maxime, -Original Message- From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] Sent: Sunday, March 25, 2018 5:51 PM To: Wang, Xiao W ; dev@dpdk.org Cc: Wang, Zhihong ; y...@fridaylinux.org; Liang, Cunming ; Xu, Rosen ; Chen, Junjie J ; Daly, Dan Subject: Re: [PATCH 0/3] add ifcvf driver On 03/23/2018 11:27 AM, Wang, Xiao W wrote: Hi Maxime, -Original Message- From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] Sent: Thursday, March 22, 2018 4:48 AM To: Wang, Xiao W ; dev@dpdk.org Cc: Wang, Zhihong ; y...@fridaylinux.org; Liang, Cunming ; Xu, Rosen ; Chen, Junjie J ; Daly, Dan Subject: Re: [PATCH 0/3] add ifcvf driver Hi Xiao, On 03/15/2018 05:49 PM, Wang, Xiao W wrote: Hi Maxime, -Original Message- From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] Sent: Sunday, March 11, 2018 2:24 AM To: Wang, Xiao W ; dev@dpdk.org Cc: Wang, Zhihong ; y...@fridaylinux.org; Liang, Cunming ; Xu, Rosen ; Chen, Junjie J ; Daly, Dan Subject: Re: [PATCH 0/3] add ifcvf driver Hi Xiao, On 03/10/2018 12:08 AM, Xiao Wang wrote: This patch set has dependency on http://dpdk.org/dev/patchwork/patch/35635/ (vhost: support selective datapath); ifc VF is compatible with virtio vring operations, this driver implements vDPA driver ops which configures ifc VF to be a vhost data path accelerator. ifcvf driver uses vdev as a control domain to manage ifc VFs that belong to it. It registers vDPA device ops to vhost lib to enable these VFs to be used as vhost data path accelerator. Live migration feature is supported by ifc VF and this driver enables it based on vhost lib. vDPA needs to create different containers for different devices, thus this patch set adds APIs in eal/vfio to support multiple container. Thanks for this! That will avoind having to duplicate these functions for every new offload driver. Junjie Chen (1): eal/vfio: add support for multiple container Xiao Wang (2): bus/pci: expose sysfs parsing API Still, I'm not convinced the offload device should be a virtual device. It is a real PCI device, why not having a new device type for offload devices, and let the device to be probed automatically by the existing device model? IFC VFs are generated from SRIOV, with the PF driven by kernel driver. In DPDK we need to have something to represent PF, to register itself as a vDPA engine, so a virtual device is used for this purpose. I went through the code, and something is not clear to me. Why do we need to have a representation of the PF in DPDK? Why cannot we just bind at VF level? 1. With the vdev representation we could use it to talk to PF kernel driver to do flow configuration, we can implement flow API on the vdev in future for this purpose. Using a vdev allows introducing this kind of control plane thing. 2. When port representor is ready, we would integrate it into ifcvf driver, then each VF will have a Representor port. For now we don’t have port representor, so this patch set manages VF resource internally. Ok, we may need to have a vdev to represent the PF, but we need to be able to bind at VF level anyway. Device management on VF level is feasible, according to the previous port-representor patch. A tuple of (PF_addr, VF_index) can identify a certain VF, we have vport_mask and device addr to describe a PF, and we can specify a VF index to create a representor port, so , the VF port creation will be on-demand at VF level. +struct port_rep_parameters { + uint64_t vport_mask; + struct { + char bus[RTE_DEV_NAME_MAX_LEN]; + char device[RTE_DEV_NAME_MAX_LEN]; + } parent; +}; +int +rte_representor_port_register(char *pf_addr_str, + uint32_t vport_id, uint16_t *port_id) IIUC, even with this using port representor, we'll still have the problem of having the VFs probed first as Virtio driver, right? In my opinion, the IFCVF devices in offload mode are to be managed differently than having a way to represent on host side VFs assigned to a VM. In offload case, you have a real device to deal with, else we wouldn't have to bind it with VFIO. Maybe we could have a real device probed as proposed yesterday [0], and this device gets registered to the port representor for the PF? Thanks, Maxime Besides, IFCVF supports live migration, vDPA exerts IFCVF device better than QEMU (this patch has enabled LM feature). vDPA is the main usage model for IFCVF, and one DPDK application taking control of all the VF resource management is a straightforward usage model. Best Regards, Xiao Else, how do you support passing two VFs of the same PF to different DPDK applications? Or have some VFs managed by Kernel or QEMU and some by the DPDK application? My feeling is that current implementation is creating an artificial constraint. Isn't there a possibility to have the virtual representation for the PF to be probed sepa
Re: [dpdk-dev] [PATCH] event/opdl: fix atomic queue race condition issue
On 26 Mar 05:29, Van Haaren, Harry wrote: > > From: Ma, Liang J > > Sent: Tuesday, March 13, 2018 11:34 AM > > To: jerin.ja...@caviumnetworks.com > > Cc: dev@dpdk.org; Van Haaren, Harry ; Jain, > > Deepak > > K ; Geary, John ; Mccarthy, > > Peter > > Subject: [PATCH] event/opdl: fix atomic queue race condition issue > > > > If application link one atomic queue to multiple ports, > > and each worker core update flow_id, there will have a > > chance to hit race condition issue and lead to double processing > > same event. This fix solve the problem and eliminate > > the race condition issue. > > > > Fixes: 4236ce9bf5bf ("event/opdl: add OPDL ring infrastructure library") > > > > General notes > - Spaces around & % << >> and other bitwise manipulations > (https://dpdk.org/doc/guides/contributing/coding_style.html#operators) > - I've noted a few below, but there are more > - Usually checkpatch flags these - I'm curious why it didn't in this case > > It would be nice if we didn't have to rely on __atomic_load_n() and friends, > however I don't see a better alternative. Given other DPDK components are > also using __atomic_* functions, no objection here. > > > > > > @@ -520,7 +528,17 @@ opdl_stage_claim_singlethread(struct opdl_stage *s, > > void > > *entries, > > > > for (j = 0; j < num_entries; j++) { > > ev = (struct rte_event *)get_slot(t, s->head+j); > > - if ((ev->flow_id%s->nb_instance) == s->instance_id) { > > Spaces around the % > > > + > > + event = __atomic_load_n(&(ev->event), > > + __ATOMIC_ACQUIRE); > > + > > + opa_id = OPDL_OPA_MASK&(event>>OPDL_OPA_OFFSET); > > Spaces & > > > + flow_id = OPDL_FLOWID_MASK&event; > > Spaces & > > > + > > + if (opa_id >= s->queue_id) > > + continue; > > + > > + if ((flow_id%s->nb_instance) == s->instance_id) { > > Spaces % > > > > > Will re-review v2. Cheers, -Harry Thanks, I will send V2 later.
Re: [dpdk-dev] [PATCH 2/2] dev: use rte_kvargs
Hi Neil, On Mon, Mar 26, 2018 at 07:38:19AM -0400, Neil Horman wrote: > On Fri, Mar 23, 2018 at 07:45:03PM +0100, Gaetan Rivet wrote: > > Signed-off-by: Gaetan Rivet > > --- > > > > Cc: Neil Horman > > Keep in mind that all of this is to achieve the trivial task I was doing in 20 lines or so. > I'm actually ok with this but as Keith noted, I'm not sure why you didn't > just: > > 1) Add the ability to create a grouping key, so that key value pairs could > contain a list of comma separated values (something like '{}' to denote that > everything between the characters was the value in a kv pair, regardless of > other tokenizing characters in the value). > > 2) Add the ability to recursively parse the value into a list of tokens > I don't need a recursive construct or a tree-like structure. I only need an alternative to '\0' to signify "end-of-list". This seems like an edge-case to librte_kvargs that would only be useful to a specific case. It does not seem a wise addition. So maybe I did not understand your suggestion. Can you give an example of inputs? I need to parse something like "bus=pci,vendor_id=0x8086/class=eth" (and I only care about bus=pci and class=eth). how can grouping help? My issue is that librte_kvargs would parse key:vendor_id value:0x8086/class and would then stumble on the unexpected '='. > 3) Layer your functionality on top of (1) and (2), as Keith noted The stack allocator seems like a nice-to-have that would interest people using librte_kvargs. I find librte_kvargs to be cumbersome. I cannot rewrite it from scratch, unless I update everything that relies on it as well. So I do not touch it, because I don't care *that* much. Why not simply leave my helper alongside? If people care enough about it and would prefer to use it over librte_kvargs, then maybe we could think about doing the effort of exposing it cleanly (or maybe they could). Right now, I see only me needing it and I do not see this effort as worth it. Regards, -- Gaëtan Rivet 6WIND
Re: [dpdk-dev] [PATCH v5 2/2] eal: add asynchronous request API to DPDK IPC
On 3/24/2018 8:46 PM, Anatoly Burakov wrote: This API is similar to the blocking API that is already present, but reply will be received in a separate callback by the caller (callback specified at the time of request, rather than registering for it in advance). Under the hood, we create a separate thread to deal with replies to asynchronous requests, that will just wait to be notified by the main thread, or woken up on a timer. Signed-off-by: Anatoly Burakov Generally, it looks great to me except some trivial nits, so Acked-by: Jianfeng Tan --- Notes: v5: - addressed review comments from Jianfeng - split into two patches to avoid rename noise - do not mark ignored message as processed v4: - rebase on top of latest IPC Improvements patchset [2] v3: - added support for MP_IGN messages introduced in IPC improvements v5 patchset v2: - fixed deadlocks and race conditions by not calling callbacks while iterating over sync request list - fixed use-after-free by making a copy of request - changed API to also give user a copy of original request, so that they know to which message the callback is a reply to - fixed missing .map file entries This patch is dependent upon previously published patchsets for IPC fixes [1] and improvements [2]. rte_mp_action_unregister and rte_mp_async_reply_unregister do the same thing - should we perhaps make it one function? [1] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Fixes/ [2] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Improvements/ lib/librte_eal/common/eal_common_proc.c | 455 +++- lib/librte_eal/common/include/rte_eal.h | 36 +++ lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 479 insertions(+), 13 deletions(-) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index 52b6ab2..c86252c 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "eal_private.h" #include "eal_filesystem.h" @@ -60,13 +61,32 @@ struct mp_msg_internal { struct rte_mp_msg msg; }; +struct async_request_param { + rte_mp_async_reply_t clb; + struct rte_mp_reply user_reply; + struct timespec end; + int n_responses_processed; +}; + struct pending_request { TAILQ_ENTRY(pending_request) next; - int reply_received; + enum { + REQUEST_TYPE_SYNC, + REQUEST_TYPE_ASYNC + } type; char dst[PATH_MAX]; struct rte_mp_msg *request; struct rte_mp_msg *reply; - pthread_cond_t cond; + int reply_received; + RTE_STD_C11 + union { + struct { + struct async_request_param *param; + } async; + struct { + pthread_cond_t cond; + } sync; + }; }; TAILQ_HEAD(pending_request_list, pending_request); @@ -74,9 +94,12 @@ TAILQ_HEAD(pending_request_list, pending_request); static struct { struct pending_request_list requests; pthread_mutex_t lock; + pthread_cond_t async_cond; } pending_requests = { .requests = TAILQ_HEAD_INITIALIZER(pending_requests.requests), - .lock = PTHREAD_MUTEX_INITIALIZER + .lock = PTHREAD_MUTEX_INITIALIZER, + .async_cond = PTHREAD_COND_INITIALIZER + /**< used in async requests only */ }; /* forward declarations */ @@ -273,7 +296,12 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un *s) memcpy(sync_req->reply, msg, sizeof(*msg)); /* -1 indicates that we've been asked to ignore */ sync_req->reply_received = m->type == MP_REP ? 1 : -1; - pthread_cond_signal(&sync_req->cond); + + if (sync_req->type == REQUEST_TYPE_SYNC) + pthread_cond_signal(&sync_req->sync.cond); + else if (sync_req->type == REQUEST_TYPE_ASYNC) + pthread_cond_signal( + &pending_requests.async_cond); } else RTE_LOG(ERR, EAL, "Drop mp reply: %s\n", msg->name); pthread_mutex_unlock(&pending_requests.lock); @@ -320,6 +348,189 @@ mp_handle(void *arg __rte_unused) } static int +timespec_cmp(const struct timespec *a, const struct timespec *b) +{ + if (a->tv_sec < b->tv_sec) + return -1; + if (a->tv_sec > b->tv_sec) + return 1; + if (a->tv_nsec < b->tv_nsec) + return -1; + if (a->tv_nsec > b->tv_nsec) + return 1; + return 0; +} + +enum
Re: [dpdk-dev] [PATCH v5 2/2] eal: add asynchronous request API to DPDK IPC
On 26-Mar-18 3:15 PM, Tan, Jianfeng wrote: On 3/24/2018 8:46 PM, Anatoly Burakov wrote: This API is similar to the blocking API that is already present, but reply will be received in a separate callback by the caller (callback specified at the time of request, rather than registering for it in advance). Under the hood, we create a separate thread to deal with replies to asynchronous requests, that will just wait to be notified by the main thread, or woken up on a timer. Signed-off-by: Anatoly Burakov Generally, it looks great to me except some trivial nits, so Acked-by: Jianfeng Tan Thanks! +static void +trigger_async_action(struct pending_request *sr) +{ + struct async_request_param *param; + struct rte_mp_reply *reply; + + param = sr->async.param; + reply = ¶m->user_reply; + + param->clb(sr->request, reply); + + /* clean up */ + free(sr->async.param->user_reply.msgs); How about simple "free(reply->msgs);"? I would prefer leaving it as is, as it makes it clear that i'm freeing everything to do with sync request. + + sync_req->type = REQUEST_TYPE_ASYNC; + strcpy(sync_req->dst, dst); + sync_req->request = req; + sync_req->reply = reply_msg; + sync_req->async.param = param; + + /* queue already locked by caller */ + + exist = find_sync_request(dst, req->name); + if (!exist) + TAILQ_INSERT_TAIL(&pending_requests.requests, sync_req, next); + if (exist) { else? Will fix in v6 @@ -744,9 +1027,155 @@ rte_mp_request(struct rte_mp_msg *req, struct rte_mp_reply *reply, } int __rte_experimental -rte_mp_reply(struct rte_mp_msg *msg, const char *peer) +rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts, + rte_mp_async_reply_t clb) { + struct rte_mp_msg *copy; + struct pending_request *dummy; + struct async_request_param *param = NULL; No need to assign it to NULL. Will fix in v6. + /* we have to lock the request queue here, as we will be adding a bunch + * of requests to the queue at once, and some of the replies may arrive + * before we add all of the requests to the queue. + */ + pthread_mutex_lock(&pending_requests.lock); + + /* we have to ensure that callback gets triggered even if we don't send + * anything, therefore earlier we have allocated a dummy request. put it + * on the queue and fill it. we will remove it once we know we sent + * something. + */ Or we can add this dummy at last if it's necessary, instead of adding firstly and remove if not necessary? No strong option here. Yep, sure, will fix in v6. -- Thanks, Anatoly
[dpdk-dev] [PATCH v2] net/mrvl: rename PMD driver as mvpp2
From: Natalie Samsonov The name "mrvl" for Marvell PMD driver for PPv2 Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter is too generic and causes problem for adding new PMD drivers for other Marvell devices. Changed to "mvpp2" for specific Marvell PPv2 PMD. This patch doesn't introduce any change except renaming. Signed-off-by: Natalie Samsonov --- v2: prepare patch with --find-renames MAINTAINERS| 8 config/common_base | 2 +- doc/guides/cryptodevs/mrvl.rst | 4 ++-- doc/guides/nics/features/{mrvl.ini => mvpp2.ini} | 2 +- doc/guides/nics/index.rst | 2 +- doc/guides/nics/{mrvl.rst => mvpp2.rst}| 22 +++--- drivers/net/Makefile | 4 ++-- drivers/net/{mrvl => mvpp2}/Makefile | 8 drivers/net/{mrvl => mvpp2}/mrvl_ethdev.c | 4 ++-- drivers/net/{mrvl => mvpp2}/mrvl_ethdev.h | 0 drivers/net/{mrvl => mvpp2}/mrvl_flow.c| 0 drivers/net/{mrvl => mvpp2}/mrvl_qos.c | 0 drivers/net/{mrvl => mvpp2}/mrvl_qos.h | 0 .../net/{mrvl => mvpp2}/rte_pmd_mrvl_version.map | 0 mk/rte.app.mk | 2 +- 15 files changed, 29 insertions(+), 29 deletions(-) rename doc/guides/nics/features/{mrvl.ini => mvpp2.ini} (90%) rename doc/guides/nics/{mrvl.rst => mvpp2.rst} (96%) rename drivers/net/{mrvl => mvpp2}/Makefile (83%) rename drivers/net/{mrvl => mvpp2}/mrvl_ethdev.c (99%) rename drivers/net/{mrvl => mvpp2}/mrvl_ethdev.h (100%) rename drivers/net/{mrvl => mvpp2}/mrvl_flow.c (100%) rename drivers/net/{mrvl => mvpp2}/mrvl_qos.c (100%) rename drivers/net/{mrvl => mvpp2}/mrvl_qos.h (100%) rename drivers/net/{mrvl => mvpp2}/rte_pmd_mrvl_version.map (100%) diff --git a/MAINTAINERS b/MAINTAINERS index aa30bd9..30a9911 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -486,15 +486,15 @@ F: drivers/net/mlx5/ F: doc/guides/nics/mlx5.rst F: doc/guides/nics/features/mlx5.ini -Marvell mrvl +Marvell mvpp2 M: Jacek Siuda M: Tomasz Duszynski M: Dmitri Epshtein M: Natalie Samsonov M: Jianbo Liu -F: drivers/net/mrvl/ -F: doc/guides/nics/mrvl.rst -F: doc/guides/nics/features/mrvl.ini +F: drivers/net/mvpp2/ +F: doc/guides/nics/mvpp2.rst +F: doc/guides/nics/features/mvpp2.ini Microsoft vdev_netvsc - EXPERIMENTAL M: Matan Azrad diff --git a/config/common_base b/config/common_base index ee10b44..7abf7c6 100644 --- a/config/common_base +++ b/config/common_base @@ -383,7 +383,7 @@ CONFIG_RTE_LIBRTE_PMD_FAILSAFE=y # # Compile Marvell PMD driver # -CONFIG_RTE_LIBRTE_MRVL_PMD=n +CONFIG_RTE_LIBRTE_MVPP2_PMD=n # # Compile virtual device driver for NetVSC on Hyper-V/Azure diff --git a/doc/guides/cryptodevs/mrvl.rst b/doc/guides/cryptodevs/mrvl.rst index 6a0b08c..443ebcd 100644 --- a/doc/guides/cryptodevs/mrvl.rst +++ b/doc/guides/cryptodevs/mrvl.rst @@ -86,7 +86,7 @@ Currently there are two driver specific compilation options in Toggle display of debugging messages. For a list of prerequisites please refer to `Prerequisites` section in -:ref:`MRVL Poll Mode Driver ` guide. +:ref:`MVPP2 Poll Mode Driver ` guide. MRVL CRYPTO PMD requires MUSDK built with EIP197 support thus following extra option must be passed to the library configuration script: @@ -123,7 +123,7 @@ operation: .. code-block:: console - ./l2fwd-crypto --vdev=net_mrvl,iface=eth0 --vdev=crypto_mrvl -- \ + ./l2fwd-crypto --vdev=eth_mvpp2,iface=eth0 --vdev=crypto_mrvl -- \ --cipher_op ENCRYPT --cipher_algo aes-cbc \ --cipher_key 00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:0f \ --auth_op GENERATE --auth_algo sha1-hmac \ diff --git a/doc/guides/nics/features/mrvl.ini b/doc/guides/nics/features/mvpp2.ini similarity index 90% rename from doc/guides/nics/features/mrvl.ini rename to doc/guides/nics/features/mvpp2.ini index 8673a56..ef47546 100644 --- a/doc/guides/nics/features/mrvl.ini +++ b/doc/guides/nics/features/mvpp2.ini @@ -1,5 +1,5 @@ ; -; Supported features of the 'mrvl' network poll mode driver. +; Supported features of the 'mvpp2' network poll mode driver. ; ; Refer to default.ini for the full list of available PMD features. ; diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst index 59419f4..51c453d 100644 --- a/doc/guides/nics/index.rst +++ b/doc/guides/nics/index.rst @@ -30,7 +30,7 @@ Network Interface Controller Drivers liquidio mlx4 mlx5 -mrvl +mvpp2 nfp octeontx qede diff --git a/doc/guides/nics/mrvl.rst b/doc/guides/nics/mvpp2.rst similarity index 96% rename from doc/guides/nics/mrvl.rst rename to doc/guides/nics/mvpp2.rst index f9ec9d6..0408752 100644 --- a/doc/guides/nics/mrvl.rst +++ b/doc/guides/nics/mvpp2.rst @@ -29,12 +29,12 @@ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
Re: [dpdk-dev] [PATCH 2/2] dev: use rte_kvargs
> On Mar 26, 2018, at 8:59 AM, Gaëtan Rivet wrote: > > Hi Neil, > > On Mon, Mar 26, 2018 at 07:38:19AM -0400, Neil Horman wrote: >> On Fri, Mar 23, 2018 at 07:45:03PM +0100, Gaetan Rivet wrote: >>> Signed-off-by: Gaetan Rivet >>> --- >>> >>> Cc: Neil Horman >>> > > Keep in mind that all of this is to achieve the trivial task I was > doing in 20 lines or so. > >> I'm actually ok with this but as Keith noted, I'm not sure why you didn't >> just: >> >> 1) Add the ability to create a grouping key, so that key value pairs could >> contain a list of comma separated values (something like '{}' to denote that >> everything between the characters was the value in a kv pair, regardless of >> other tokenizing characters in the value). >> >> 2) Add the ability to recursively parse the value into a list of tokens >> > > I don't need a recursive construct or a tree-like structure. I only need > an alternative to '\0' to signify "end-of-list". > This seems like an edge-case to librte_kvargs that would only be useful > to a specific case. It does not seem a wise addition. > > So maybe I did not understand your suggestion. Can you give an example > of inputs? > > I need to parse something like > > "bus=pci,vendor_id=0x8086/class=eth" > (and I only care about bus=pci and class=eth). > > how can grouping help? My issue is that librte_kvargs would parse > > key:vendor_id > value:0x8086/class > > and would then stumble on the unexpected '=‘. Let me try to remember what I did here. Created a new list structure of top level keywords like ‘bus’, ‘class’, … The new list is passed into the new API that splits up the string by ‘/‘ then for each string you call into the original kvargs routines to parse the remaining items within each string. I believe I passed a function pointer in the new array of structures that allowed me to handle the top level keywords. What is passed to the top level keyword function is the string to be parsed by the normal kvargs routines. Each routine then used kvargs routines as normal. I am sure the layer could be done differently and the structure could maybe even contain the kvargs list or keywords if you want. Anyway it worked out pretty well, just adding new APIs to handle and layer on top of kvargs APIs. > >> 3) Layer your functionality on top of (1) and (2), as Keith noted > > The stack allocator seems like a nice-to-have that would interest > people using librte_kvargs. I find librte_kvargs to be cumbersome. I > cannot rewrite it from scratch, unless I update everything that relies > on it as well. So I do not touch it, because I don't care *that* much. > > Why not simply leave my helper alongside? If people care enough about > it and would prefer to use it over librte_kvargs, then maybe we could > think about doing the effort of exposing it cleanly (or maybe they could). > Right now, I see only me needing it and I do not see this effort as > worth it. > > Regards, > -- > Gaëtan Rivet > 6WIND Regards, Keith
[dpdk-dev] [PATCH v3 05/11] mempool: add op to populate objects using provided memory
The callback allows to customize how objects are stored in the memory chunk. Default implementation of the callback which simply puts objects one by one is available. Suggested-by: Olivier Matz Signed-off-by: Andrew Rybchenko --- v2 -> v3: - none v1 -> v2: - fix memory leak if off is bigger than len RFCv2 -> v1: - advertise ABI changes in release notes - use consistent name for default callback: rte_mempool_op__default() - add opaque data pointer to populated object callback - move default callback to dedicated file doc/guides/rel_notes/deprecation.rst | 2 +- doc/guides/rel_notes/release_18_05.rst | 2 + lib/librte_mempool/rte_mempool.c | 23 --- lib/librte_mempool/rte_mempool.h | 90 lib/librte_mempool/rte_mempool_ops.c | 21 +++ lib/librte_mempool/rte_mempool_ops_default.c | 24 lib/librte_mempool/rte_mempool_version.map | 1 + 7 files changed, 149 insertions(+), 14 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index e02d4ca..c06fc67 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -72,7 +72,7 @@ Deprecation Notices - removal of ``get_capabilities`` mempool ops and related flags. - substitute ``register_memory_area`` with ``populate`` ops. - - addition of new ops to customize objects population and allocate contiguous + - addition of new op to allocate contiguous block of objects if underlying driver supports it. * mbuf: The control mbuf API will be removed in v18.05. The impacted diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst index 59583ea..abaefe5 100644 --- a/doc/guides/rel_notes/release_18_05.rst +++ b/doc/guides/rel_notes/release_18_05.rst @@ -84,6 +84,8 @@ ABI Changes A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops`` to allow to customize required memory size calculation. + A new callback ``populate`` has been added to ``rte_mempool_ops`` + to allow to customize objects population. Removed Items diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index dd2d0fe..d917dc7 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -99,7 +99,8 @@ static unsigned optimize_object_size(unsigned obj_size) } static void -mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova) +mempool_add_elem(struct rte_mempool *mp, __rte_unused void *opaque, +void *obj, rte_iova_t iova) { struct rte_mempool_objhdr *hdr; struct rte_mempool_objtlr *tlr __rte_unused; @@ -116,9 +117,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova) tlr = __mempool_get_trailer(obj); tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE; #endif - - /* enqueue in ring */ - rte_mempool_ops_enqueue_bulk(mp, &obj, 1); } /* call obj_cb() for each mempool element */ @@ -407,17 +405,16 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr, else off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr; - while (off + total_elt_sz <= len && mp->populated_size < mp->size) { - off += mp->header_size; - if (iova == RTE_BAD_IOVA) - mempool_add_elem(mp, (char *)vaddr + off, - RTE_BAD_IOVA); - else - mempool_add_elem(mp, (char *)vaddr + off, iova + off); - off += mp->elt_size + mp->trailer_size; - i++; + if (off > len) { + ret = -EINVAL; + goto fail; } + i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size, + (char *)vaddr + off, + (iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off), + len - off, mempool_add_elem, NULL); + /* not enough room to store one object */ if (i == 0) { ret = -EINVAL; diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 191255d..754261e 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -456,6 +456,63 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp, uint32_t obj_num, uint32_t pg_shift, size_t *min_chunk_size, size_t *align); +/** + * Function to be called for each populated object. + * + * @param[in] mp + * A pointer to the mempool structure. + * @param[in] opaque + * An opaque pointer passed to iterator. + * @param[in] vaddr + * Object virtual address. + * @param[in] iova + * Input/output virtual address of the object or RTE_BAD_IOVA. + */ +typedef void (rte_mempool_populate_obj_cb_t)(struct rte_mempool *mp, + void *opaque, void *vaddr, rte_iova_t iova); + +/** + * Populate memory pool objects us
[dpdk-dev] [PATCH v3 08/11] mempool/octeontx: prepare to remove register memory area op
Callback to populate pool objects has all required information and executed a bit later than register memory area callback. Signed-off-by: Andrew Rybchenko --- v2 -> v3: - none v1 -> v2 - none drivers/mempool/octeontx/rte_mempool_octeontx.c | 25 ++--- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c b/drivers/mempool/octeontx/rte_mempool_octeontx.c index 64ed528..ab94dfe 100644 --- a/drivers/mempool/octeontx/rte_mempool_octeontx.c +++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c @@ -152,26 +152,15 @@ octeontx_fpavf_calc_mem_size(const struct rte_mempool *mp, } static int -octeontx_fpavf_register_memory_area(const struct rte_mempool *mp, - char *vaddr, rte_iova_t paddr, size_t len) -{ - RTE_SET_USED(paddr); - uint8_t gpool; - uintptr_t pool_bar; - - gpool = octeontx_fpa_bufpool_gpool(mp->pool_id); - pool_bar = mp->pool_id & ~(uint64_t)FPA_GPOOL_MASK; - - return octeontx_fpavf_pool_set_range(pool_bar, len, vaddr, gpool); -} - -static int octeontx_fpavf_populate(struct rte_mempool *mp, unsigned int max_objs, void *vaddr, rte_iova_t iova, size_t len, rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg) { size_t total_elt_sz; size_t off; + uint8_t gpool; + uintptr_t pool_bar; + int ret; if (iova == RTE_BAD_IOVA) return -EINVAL; @@ -188,6 +177,13 @@ octeontx_fpavf_populate(struct rte_mempool *mp, unsigned int max_objs, iova += off; len -= off; + gpool = octeontx_fpa_bufpool_gpool(mp->pool_id); + pool_bar = mp->pool_id & ~(uint64_t)FPA_GPOOL_MASK; + + ret = octeontx_fpavf_pool_set_range(pool_bar, len, vaddr, gpool); + if (ret < 0) + return ret; + return rte_mempool_op_populate_default(mp, max_objs, vaddr, iova, len, obj_cb, obj_cb_arg); } @@ -199,7 +195,6 @@ static struct rte_mempool_ops octeontx_fpavf_ops = { .enqueue = octeontx_fpavf_enqueue, .dequeue = octeontx_fpavf_dequeue, .get_count = octeontx_fpavf_get_count, - .register_memory_area = octeontx_fpavf_register_memory_area, .calc_mem_size = octeontx_fpavf_calc_mem_size, .populate = octeontx_fpavf_populate, }; -- 2.7.4
[dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver
The patch series should be applied on top of [7]. The initial patch series [1] is split into two to simplify processing. The second series relies on this one and will add bucket mempool driver and related ops. The patch series has generic enhancements suggested by Olivier. Basically it adds driver callbacks to calculate required memory size and to populate objects using provided memory area. It allows to remove so-called capability flags used before to tell generic code how to allocate and slice allocated memory into mempool objects. Clean up which removes get_capabilities and register_memory_area is not strictly required, but I think right thing to do. Existing mempool drivers are updated. rte_mempool_populate_iova_tab() is also deprecated in v2 as agreed in [2]. Unfortunately it requires addition of -Wno-deprecated-declarations flag to librte_mempool since the function is used by deprecated earlier rte_mempool_populate_phys_tab(). If the later may be removed in the release, we can avoid addition of the flag to allow usage of deprecated functions. One open question remains from previous review [3]. The patch series interfere with memory hotplug for DPDK [4] ([5] to be precise). So, rebase may be required. A new patch is added to the series to rename MEMPOOL_F_NO_PHYS_CONTIG as MEMPOOL_F_NO_IOVA_CONTIG as agreed in [6]. MEMPOOL_F_CAPA_PHYS_CONTIG is not renamed since it removed in this patchset. It breaks ABI since changes rte_mempool_ops. Also it removes rte_mempool_ops_register_memory_area() and rte_mempool_ops_get_capabilities() since corresponding callbacks are removed. Internal global functions are not listed in map file since it is not a part of external API. [1] https://dpdk.org/ml/archives/dev/2018-January/088698.html [2] https://dpdk.org/ml/archives/dev/2018-March/093186.html [3] https://dpdk.org/ml/archives/dev/2018-March/093329.html [4] https://dpdk.org/ml/archives/dev/2018-March/092070.html [5] https://dpdk.org/ml/archives/dev/2018-March/092088.html [6] https://dpdk.org/ml/archives/dev/2018-March/093345.html [7] https://dpdk.org/ml/archives/dev/2018-March/093196.html v2 -> v3: - fix build error in mempool/dpaa: prepare to remove register memory area op v1 -> v2: - deprecate rte_mempool_populate_iova_tab() - add patch to fix memory leak if no objects are populated - add patch to rename MEMPOOL_F_NO_PHYS_CONTIG - minor fixes (typos, blank line at the end of file) - highlight meaning of min_chunk_size (when it is virtual or physical contiguous) - make sure that mempool is initialized in rte_mempool_populate_anon() - move patch to ensure that mempool is initialized earlier in the series RFCv2 -> v1: - split the series in two - squash octeontx patches which implement calc_mem_size and populate callbacks into the patch which removes get_capabilities since it is the easiest way to untangle the tangle of tightly related library functions and flags advertised by the driver - consistently name default callbacks - move default callbacks to dedicated file - see detailed description in patches RFCv1 -> RFCv2: - add driver ops to calculate required memory size and populate mempool objects, remove extra flags which were required before to control it - transition of octeontx and dpaa drivers to the new callbacks - change info API to get information from driver required to API user to know contiguous block size - remove get_capabilities (not required any more and may be substituted with more in info get API) - remove register_memory_area since it is substituted with populate callback which can do more - use SPDX tags - avoid all objects affinity to single lcore - fix bucket get_count - deprecate XMEM API - avoid introduction of a new function to flush cache - fix NO_CACHE_ALIGN case in bucket mempool Andrew Rybchenko (9): mempool: fix memhdr leak when no objects are populated mempool: rename flag to control IOVA-contiguous objects mempool: add op to calculate memory size to be allocated mempool: add op to populate objects using provided memory mempool: remove callback to get capabilities mempool: deprecate xmem functions mempool/octeontx: prepare to remove register memory area op mempool/dpaa: prepare to remove register memory area op mempool: remove callback to register memory area Artem V. Andreev (2): mempool: ensure the mempool is initialized before populating mempool: support flushing the default cache of the mempool doc/guides/rel_notes/deprecation.rst| 12 +- doc/guides/rel_notes/release_18_05.rst | 33 ++- drivers/mempool/dpaa/dpaa_mempool.c | 13 +- drivers/mempool/octeontx/rte_mempool_octeontx.c | 64 -- drivers/net/thunderx/nicvf_ethdev.c | 2 +- lib/librte_mempool/Makefile | 6 +- lib/librte_mempool/meson.build | 17 +- lib/librte_mempool/rte_mempool.c| 179
[dpdk-dev] [PATCH v3 09/11] mempool/dpaa: prepare to remove register memory area op
Populate mempool driver callback is executed a bit later than register memory area, provides the same information and will substitute the later since it gives more flexibility and in addition to notification about memory area allows to customize how mempool objects are stored in memory. Signed-off-by: Andrew Rybchenko --- v2 -> v3: - fix build error because of prototype mismatch (char * -> void *) v1 -> v2: - fix build error because of prototype mismatch drivers/mempool/dpaa/dpaa_mempool.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/mempool/dpaa/dpaa_mempool.c b/drivers/mempool/dpaa/dpaa_mempool.c index 7b82f4b..580e464 100644 --- a/drivers/mempool/dpaa/dpaa_mempool.c +++ b/drivers/mempool/dpaa/dpaa_mempool.c @@ -264,10 +264,9 @@ dpaa_mbuf_get_count(const struct rte_mempool *mp) } static int -dpaa_register_memory_area(const struct rte_mempool *mp, - char *vaddr __rte_unused, - rte_iova_t paddr __rte_unused, - size_t len) +dpaa_populate(struct rte_mempool *mp, unsigned int max_objs, + void *vaddr, rte_iova_t paddr, size_t len, + rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg) { struct dpaa_bp_info *bp_info; unsigned int total_elt_sz; @@ -289,7 +288,9 @@ dpaa_register_memory_area(const struct rte_mempool *mp, if (len >= total_elt_sz * mp->size) bp_info->flags |= DPAA_MPOOL_SINGLE_SEGMENT; - return 0; + return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len, + obj_cb, obj_cb_arg); + } struct rte_mempool_ops dpaa_mpool_ops = { @@ -299,7 +300,7 @@ struct rte_mempool_ops dpaa_mpool_ops = { .enqueue = dpaa_mbuf_free_bulk, .dequeue = dpaa_mbuf_alloc_bulk, .get_count = dpaa_mbuf_get_count, - .register_memory_area = dpaa_register_memory_area, + .populate = dpaa_populate, }; MEMPOOL_REGISTER_OPS(dpaa_mpool_ops); -- 2.7.4
[dpdk-dev] [PATCH v3 02/11] mempool: rename flag to control IOVA-contiguous objects
Flag MEMPOOL_F_NO_PHYS_CONTIG is renamed as MEMPOOL_F_NO_IOVA_CONTIG to follow IO memory contiguos terminology. MEMPOOL_F_NO_PHYS_CONTIG is kept for backward compatibility and deprecated. Suggested-by: Olivier Matz Signed-off-by: Andrew Rybchenko --- v2 -> v3: - none v1 -> v2: - added in v2 as discussed in [1] [1] https://dpdk.org/ml/archives/dev/2018-March/093345.html drivers/net/thunderx/nicvf_ethdev.c | 2 +- lib/librte_mempool/rte_mempool.c| 6 +++--- lib/librte_mempool/rte_mempool.h| 9 + 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/net/thunderx/nicvf_ethdev.c b/drivers/net/thunderx/nicvf_ethdev.c index 067f224..f3be744 100644 --- a/drivers/net/thunderx/nicvf_ethdev.c +++ b/drivers/net/thunderx/nicvf_ethdev.c @@ -1308,7 +1308,7 @@ nicvf_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qidx, } /* Mempool memory must be physically contiguous */ - if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG) { + if (mp->flags & MEMPOOL_F_NO_IOVA_CONTIG) { PMD_INIT_LOG(ERR, "Mempool memory must be physically contiguous"); return -EINVAL; } diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 80bf941..6ffa795 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -446,7 +446,7 @@ rte_mempool_populate_iova_tab(struct rte_mempool *mp, char *vaddr, if (mp->nb_mem_chunks != 0) return -EEXIST; - if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG) + if (mp->flags & MEMPOOL_F_NO_IOVA_CONTIG) return rte_mempool_populate_iova(mp, vaddr, RTE_BAD_IOVA, pg_num * pg_sz, free_cb, opaque); @@ -500,7 +500,7 @@ rte_mempool_populate_virt(struct rte_mempool *mp, char *addr, if (RTE_ALIGN_CEIL(len, pg_sz) != len) return -EINVAL; - if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG) + if (mp->flags & MEMPOOL_F_NO_IOVA_CONTIG) return rte_mempool_populate_iova(mp, addr, RTE_BAD_IOVA, len, free_cb, opaque); @@ -602,7 +602,7 @@ rte_mempool_populate_default(struct rte_mempool *mp) goto fail; } - if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG) + if (mp->flags & MEMPOOL_F_NO_IOVA_CONTIG) iova = RTE_BAD_IOVA; else iova = mz->iova; diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 8b1b7f7..e531a15 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -244,7 +244,8 @@ struct rte_mempool { #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is "single-producer".*/ #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is "single-consumer".*/ #define MEMPOOL_F_POOL_CREATED 0x0010 /**< Internal: pool is created. */ -#define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous objs. */ +#define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */ +#define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */ /** * This capability flag is advertised by a mempool handler, if the whole * memory area containing the objects must be physically contiguous. @@ -710,8 +711,8 @@ typedef void (rte_mempool_ctor_t)(struct rte_mempool *, void *); * - MEMPOOL_F_SC_GET: If this flag is set, the default behavior * when using rte_mempool_get() or rte_mempool_get_bulk() is * "single-consumer". Otherwise, it is "multi-consumers". - * - MEMPOOL_F_NO_PHYS_CONTIG: If set, allocated objects won't - * necessarily be contiguous in physical memory. + * - MEMPOOL_F_NO_IOVA_CONTIG: If set, allocated objects won't + * necessarily be contiguous in IO memory. * @return * The pointer to the new allocated mempool, on success. NULL on error * with rte_errno set appropriately. Possible rte_errno values include: @@ -1439,7 +1440,7 @@ rte_mempool_empty(const struct rte_mempool *mp) * A pointer (virtual address) to the element of the pool. * @return * The IO address of the elt element. - * If the mempool was created with MEMPOOL_F_NO_PHYS_CONTIG, the + * If the mempool was created with MEMPOOL_F_NO_IOVA_CONTIG, the * returned value is RTE_BAD_IOVA. */ static inline rte_iova_t -- 2.7.4
[dpdk-dev] [PATCH v3 06/11] mempool: remove callback to get capabilities
The callback was introduced to let generic code to know octeontx mempool driver requirements to use single physically contiguous memory chunk to store all objects and align object address to total object size. Now these requirements are met using a new callbacks to calculate required memory chunk size and to populate objects using provided memory chunk. These capability flags are not used anywhere else. Restricting capabilities to flags is not generic and likely to be insufficient to describe mempool driver features. If required in the future, API which returns structured information may be added. Signed-off-by: Andrew Rybchenko --- v2 -> v3: - none v1 -> v2: - fix typo - rebase on top of patch which renames MEMPOOL_F_NO_PHYS_CONTIG RFCv2 -> v1: - squash mempool/octeontx patches to add calc_mem_size and populate callbacks to this one in order to avoid breakages in the middle of patchset - advertise API changes in release notes doc/guides/rel_notes/deprecation.rst| 1 - doc/guides/rel_notes/release_18_05.rst | 11 + drivers/mempool/octeontx/rte_mempool_octeontx.c | 59 + lib/librte_mempool/rte_mempool.c| 44 ++ lib/librte_mempool/rte_mempool.h| 52 +- lib/librte_mempool/rte_mempool_ops.c| 14 -- lib/librte_mempool/rte_mempool_ops_default.c| 15 +-- lib/librte_mempool/rte_mempool_version.map | 1 - 8 files changed, 68 insertions(+), 129 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index c06fc67..4deed9a 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -70,7 +70,6 @@ Deprecation Notices The following changes are planned: - - removal of ``get_capabilities`` mempool ops and related flags. - substitute ``register_memory_area`` with ``populate`` ops. - addition of new op to allocate contiguous block of objects if underlying driver supports it. diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst index abaefe5..c50f26c 100644 --- a/doc/guides/rel_notes/release_18_05.rst +++ b/doc/guides/rel_notes/release_18_05.rst @@ -66,6 +66,14 @@ API Changes Also, make sure to start the actual text at the margin. = +* **Removed mempool capability flags and related functions.** + + Flags ``MEMPOOL_F_CAPA_PHYS_CONTIG`` and + ``MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS`` were used by octeontx mempool + driver to customize generic mempool library behaviour. + Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be + used to achieve it without specific knowledge in the generic code. + ABI Changes --- @@ -86,6 +94,9 @@ ABI Changes to allow to customize required memory size calculation. A new callback ``populate`` has been added to ``rte_mempool_ops`` to allow to customize objects population. + Callback ``get_capabilities`` has been removed from ``rte_mempool_ops`` + since its features are covered by ``calc_mem_size`` and ``populate`` + callbacks. Removed Items diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c b/drivers/mempool/octeontx/rte_mempool_octeontx.c index d143d05..64ed528 100644 --- a/drivers/mempool/octeontx/rte_mempool_octeontx.c +++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c @@ -126,14 +126,29 @@ octeontx_fpavf_get_count(const struct rte_mempool *mp) return octeontx_fpa_bufpool_free_count(pool); } -static int -octeontx_fpavf_get_capabilities(const struct rte_mempool *mp, - unsigned int *flags) +static ssize_t +octeontx_fpavf_calc_mem_size(const struct rte_mempool *mp, +uint32_t obj_num, uint32_t pg_shift, +size_t *min_chunk_size, size_t *align) { - RTE_SET_USED(mp); - *flags |= (MEMPOOL_F_CAPA_PHYS_CONTIG | - MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS); - return 0; + ssize_t mem_size; + + /* +* Simply need space for one more object to be able to +* fulfil alignment requirements. +*/ + mem_size = rte_mempool_op_calc_mem_size_default(mp, obj_num + 1, + pg_shift, + min_chunk_size, align); + if (mem_size >= 0) { + /* +* Memory area which contains objects must be physically +* contiguous. +*/ + *min_chunk_size = mem_size; + } + + return mem_size; } static int @@ -150,6 +165,33 @@ octeontx_fpavf_register_memory_area(const struct rte_mempool *mp, return octeontx_fpavf_pool_set_range(pool_bar, len, vaddr, gpool); } +static int +octeontx_fpavf_populate(struct rte_mempool *mp, unsigned int max_objs
[dpdk-dev] [PATCH v3 07/11] mempool: deprecate xmem functions
Move rte_mempool_xmem_size() code to internal helper function since it is required in two places: deprecated rte_mempool_xmem_size() and non-deprecated rte_mempool_op_calc_mem_size_default(). Suggested-by: Olivier Matz Signed-off-by: Andrew Rybchenko --- v2 -> v3: - none v1 -> v2: - deprecate rte_mempool_populate_iova_tab() - add -Wno-deprecated-declarations to fix build errors because of rte_mempool_populate_iova_tab() deprecation - add @deprecated to deprecated functions description RFCv2 -> v1: - advertise deprecation in release notes - factor out default memory size calculation into non-deprecated internal function to avoid usage of deprecated function internally - remove test for deprecated functions to address build issue because of usage of deprecated functions (it is easy to allow usage of deprecated function in Makefile, but very complicated in meson) doc/guides/rel_notes/deprecation.rst | 7 --- doc/guides/rel_notes/release_18_05.rst | 11 ++ lib/librte_mempool/Makefile | 3 +++ lib/librte_mempool/meson.build | 12 +++ lib/librte_mempool/rte_mempool.c | 19 ++--- lib/librte_mempool/rte_mempool.h | 30 +++ lib/librte_mempool/rte_mempool_ops_default.c | 4 ++-- test/test/test_mempool.c | 31 8 files changed, 74 insertions(+), 43 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 4deed9a..473330d 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -60,13 +60,6 @@ Deprecation Notices - ``rte_eal_mbuf_default_mempool_ops`` * mempool: several API and ABI changes are planned in v18.05. - The following functions, introduced for Xen, which is not supported - anymore since v17.11, are hard to use, not used anywhere else in DPDK. - Therefore they will be deprecated in v18.05 and removed in v18.08: - - - ``rte_mempool_xmem_create`` - - ``rte_mempool_xmem_size`` - - ``rte_mempool_xmem_usage`` The following changes are planned: diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst index c50f26c..6a8db54 100644 --- a/doc/guides/rel_notes/release_18_05.rst +++ b/doc/guides/rel_notes/release_18_05.rst @@ -74,6 +74,17 @@ API Changes Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be used to achieve it without specific knowledge in the generic code. +* **Deprecated mempool xmem functions.** + + The following functions, introduced for Xen, which is not supported + anymore since v17.11, are hard to use, not used anywhere else in DPDK. + Therefore they were deprecated in v18.05 and will be removed in v18.08: + + - ``rte_mempool_xmem_create`` + - ``rte_mempool_xmem_size`` + - ``rte_mempool_xmem_usage`` + - ``rte_mempool_populate_iova_tab`` + ABI Changes --- diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile index 072740f..2c46fdd 100644 --- a/lib/librte_mempool/Makefile +++ b/lib/librte_mempool/Makefile @@ -7,6 +7,9 @@ include $(RTE_SDK)/mk/rte.vars.mk LIB = librte_mempool.a CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 +# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab() +# from earlier deprecated rte_mempool_populate_phys_tab() +CFLAGS += -Wno-deprecated-declarations LDLIBS += -lrte_eal -lrte_ring EXPORT_MAP := rte_mempool_version.map diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build index 9e3b527..22e912a 100644 --- a/lib/librte_mempool/meson.build +++ b/lib/librte_mempool/meson.build @@ -1,6 +1,18 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2017 Intel Corporation +extra_flags = [] + +# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab() +# from earlier deprecated rte_mempool_populate_phys_tab() +extra_flags += '-Wno-deprecated-declarations' + +foreach flag: extra_flags + if cc.has_argument(flag) + cflags += flag + endif +endforeach + version = 4 sources = files('rte_mempool.c', 'rte_mempool_ops.c', 'rte_mempool_ops_default.c') diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 40eedde..8c3b0b1 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -204,11 +204,13 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags, /* - * Calculate maximum amount of memory required to store given number of objects. + * Internal function to calculate required memory chunk size shared + * by default implementation of the corresponding callback and + * deprecated external function. */ size_t -rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift, - __rte_unused unsigned int flags) +rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t tot
[dpdk-dev] [PATCH v3 01/11] mempool: fix memhdr leak when no objects are populated
Fixes: 84121f197187 ("mempool: store memory chunks in a list") Cc: sta...@dpdk.org Suggested-by: Olivier Matz Signed-off-by: Andrew Rybchenko --- v2 -> v3: - none v1 -> v2: - added in v2 as discussed in [1] [1] https://dpdk.org/ml/archives/dev/2018-March/093329.html lib/librte_mempool/rte_mempool.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 54f7f4b..80bf941 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -408,12 +408,18 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr, } /* not enough room to store one object */ - if (i == 0) - return -EINVAL; + if (i == 0) { + ret = -EINVAL; + goto fail; + } STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next); mp->nb_mem_chunks++; return i; + +fail: + rte_free(memhdr); + return ret; } int -- 2.7.4
[dpdk-dev] [PATCH v3 03/11] mempool: ensure the mempool is initialized before populating
From: "Artem V. Andreev" Callback to calculate required memory area size may require mempool driver data to be already allocated and initialized. Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- v2 -> v3: - none v1 -> v2: - add init check to mempool_ops_alloc_once() - move ealier in the patch series since it is required when driver ops are called and it is better to have it before new ops are added RFCv2 -> v1: - rename helper function as mempool_ops_alloc_once() lib/librte_mempool/rte_mempool.c | 33 ++--- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 6ffa795..d8e3720 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -323,6 +323,21 @@ rte_mempool_free_memchunks(struct rte_mempool *mp) } } +static int +mempool_ops_alloc_once(struct rte_mempool *mp) +{ + int ret; + + /* create the internal ring if not already done */ + if ((mp->flags & MEMPOOL_F_POOL_CREATED) == 0) { + ret = rte_mempool_ops_alloc(mp); + if (ret != 0) + return ret; + mp->flags |= MEMPOOL_F_POOL_CREATED; + } + return 0; +} + /* Add objects in the pool, using a physically contiguous memory * zone. Return the number of objects added, or a negative value * on error. @@ -339,13 +354,9 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr, struct rte_mempool_memhdr *memhdr; int ret; - /* create the internal ring if not already done */ - if ((mp->flags & MEMPOOL_F_POOL_CREATED) == 0) { - ret = rte_mempool_ops_alloc(mp); - if (ret != 0) - return ret; - mp->flags |= MEMPOOL_F_POOL_CREATED; - } + ret = mempool_ops_alloc_once(mp); + if (ret != 0) + return ret; /* Notify memory area to mempool */ ret = rte_mempool_ops_register_memory_area(mp, vaddr, iova, len); @@ -556,6 +567,10 @@ rte_mempool_populate_default(struct rte_mempool *mp) unsigned int mp_flags; int ret; + ret = mempool_ops_alloc_once(mp); + if (ret != 0) + return ret; + /* mempool must not be populated */ if (mp->nb_mem_chunks != 0) return -EEXIST; @@ -667,6 +682,10 @@ rte_mempool_populate_anon(struct rte_mempool *mp) return 0; } + ret = mempool_ops_alloc_once(mp); + if (ret != 0) + return ret; + /* get chunk of virtually continuous memory */ size = get_anon_size(mp); addr = mmap(NULL, size, PROT_READ | PROT_WRITE, -- 2.7.4
[dpdk-dev] [PATCH v3 10/11] mempool: remove callback to register memory area
The callback is not required any more since there is a new callback to populate objects using provided memory area which provides the same information. Signed-off-by: Andrew Rybchenko --- v2 -> v3: - none v1 -> v2: - none RFCv2 -> v1: - advertise ABI changes in release notes doc/guides/rel_notes/deprecation.rst | 1 - doc/guides/rel_notes/release_18_05.rst | 2 ++ lib/librte_mempool/rte_mempool.c | 5 - lib/librte_mempool/rte_mempool.h | 31 -- lib/librte_mempool/rte_mempool_ops.c | 14 -- lib/librte_mempool/rte_mempool_version.map | 1 - 6 files changed, 2 insertions(+), 52 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 473330d..5301259 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -63,7 +63,6 @@ Deprecation Notices The following changes are planned: - - substitute ``register_memory_area`` with ``populate`` ops. - addition of new op to allocate contiguous block of objects if underlying driver supports it. diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst index 6a8db54..016c4ed 100644 --- a/doc/guides/rel_notes/release_18_05.rst +++ b/doc/guides/rel_notes/release_18_05.rst @@ -108,6 +108,8 @@ ABI Changes Callback ``get_capabilities`` has been removed from ``rte_mempool_ops`` since its features are covered by ``calc_mem_size`` and ``populate`` callbacks. + Callback ``register_memory_area`` has been removed from ``rte_mempool_ops`` + since the new callback ``populate`` may be used instead of it. Removed Items diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 8c3b0b1..c58bcc6 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -355,11 +355,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr, if (ret != 0) return ret; - /* Notify memory area to mempool */ - ret = rte_mempool_ops_register_memory_area(mp, vaddr, iova, len); - if (ret != -ENOTSUP && ret < 0) - return ret; - /* mempool is already populated */ if (mp->populated_size >= mp->size) return -ENOSPC; diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 9107f5a..314f909 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -371,12 +371,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp, typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp); /** - * Notify new memory area to mempool. - */ -typedef int (*rte_mempool_ops_register_memory_area_t) -(const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len); - -/** * Calculate memory size required to store given number of objects. * * If mempool objects are not required to be IOVA-contiguous @@ -514,10 +508,6 @@ struct rte_mempool_ops { rte_mempool_dequeue_t dequeue; /**< Dequeue an object. */ rte_mempool_get_count get_count; /**< Get qty of available objs. */ /** -* Notify new memory area to mempool -*/ - rte_mempool_ops_register_memory_area_t register_memory_area; - /** * Optional callback to calculate memory size required to * store specified number of objects. */ @@ -639,27 +629,6 @@ unsigned rte_mempool_ops_get_count(const struct rte_mempool *mp); /** - * @internal wrapper for mempool_ops register_memory_area callback. - * API to notify the mempool handler when a new memory area is added to pool. - * - * @param mp - * Pointer to the memory pool. - * @param vaddr - * Pointer to the buffer virtual address. - * @param iova - * Pointer to the buffer IO address. - * @param len - * Pool size. - * @return - * - 0: Success; - * - -ENOTSUP - doesn't support register_memory_area ops (valid error case). - * - Otherwise, rte_mempool_populate_phys fails thus pool create fails. - */ -int -rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, - char *vaddr, rte_iova_t iova, size_t len); - -/** * @internal wrapper for mempool_ops calc_mem_size callback. * API to calculate size of memory required to store specified number of * object. diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c index 6ac669a..ea9be1e 100644 --- a/lib/librte_mempool/rte_mempool_ops.c +++ b/lib/librte_mempool/rte_mempool_ops.c @@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h) ops->enqueue = h->enqueue; ops->dequeue = h->dequeue; ops->get_count = h->get_count; - ops->register_memory_area = h->register_memory_area; ops->calc_mem_size = h->calc_mem_size; ops->populate = h->populate; @@ -99,19 +98,6 @@ rte_mempool_ops_get_c
[dpdk-dev] [PATCH v3 11/11] mempool: support flushing the default cache of the mempool
From: "Artem V. Andreev" Mempool get/put API cares about cache itself, but sometimes it is required to flush the cache explicitly. The function is moved in the file since it now requires rte_mempool_default_cache(). Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- v2 -> v3: - none v1 -> v2: - none lib/librte_mempool/rte_mempool.h | 36 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 314f909..3e06ae0 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -1169,22 +1169,6 @@ void rte_mempool_cache_free(struct rte_mempool_cache *cache); /** - * Flush a user-owned mempool cache to the specified mempool. - * - * @param cache - * A pointer to the mempool cache. - * @param mp - * A pointer to the mempool. - */ -static __rte_always_inline void -rte_mempool_cache_flush(struct rte_mempool_cache *cache, - struct rte_mempool *mp) -{ - rte_mempool_ops_enqueue_bulk(mp, cache->objs, cache->len); - cache->len = 0; -} - -/** * Get a pointer to the per-lcore default mempool cache. * * @param mp @@ -1207,6 +1191,26 @@ rte_mempool_default_cache(struct rte_mempool *mp, unsigned lcore_id) } /** + * Flush a user-owned mempool cache to the specified mempool. + * + * @param cache + * A pointer to the mempool cache. + * @param mp + * A pointer to the mempool. + */ +static __rte_always_inline void +rte_mempool_cache_flush(struct rte_mempool_cache *cache, + struct rte_mempool *mp) +{ + if (cache == NULL) + cache = rte_mempool_default_cache(mp, rte_lcore_id()); + if (cache == NULL || cache->len == 0) + return; + rte_mempool_ops_enqueue_bulk(mp, cache->objs, cache->len); + cache->len = 0; +} + +/** * @internal Put several objects back in the mempool; used internally. * @param mp * A pointer to the mempool structure. -- 2.7.4
[dpdk-dev] [PATCH v1 2/6] mempool: implement abstract mempool info API
From: "Artem V. Andreev" Primarily, it is intended as a way for the mempool driver to provide additional information on how it lays up objects inside the mempool. Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- lib/librte_mempool/rte_mempool.h | 41 ++ lib/librte_mempool/rte_mempool_ops.c | 15 +++ lib/librte_mempool/rte_mempool_version.map | 7 + 3 files changed, 63 insertions(+) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 3e06ae0..1ac2f57 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -190,6 +190,14 @@ struct rte_mempool_memhdr { }; /** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Additional information about the mempool + */ +struct rte_mempool_info; + +/** * The RTE mempool structure. */ struct rte_mempool { @@ -499,6 +507,16 @@ int rte_mempool_op_populate_default(struct rte_mempool *mp, void *vaddr, rte_iova_t iova, size_t len, rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Get some additional information about a mempool. + */ +typedef int (*rte_mempool_get_info_t)(const struct rte_mempool *mp, + struct rte_mempool_info *info); + + /** Structure defining mempool operations structure */ struct rte_mempool_ops { char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */ @@ -517,6 +535,10 @@ struct rte_mempool_ops { * provided memory chunk. */ rte_mempool_populate_t populate; + /** +* Get mempool info +*/ + rte_mempool_get_info_t get_info; } __rte_cache_aligned; #define RTE_MEMPOOL_MAX_OPS_IDX 16 /**< Max registered ops structs */ @@ -680,6 +702,25 @@ int rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs, void *obj_cb_arg); /** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Wrapper for mempool_ops get_info callback. + * + * @param[in] mp + * Pointer to the memory pool. + * @param[out] info + * Pointer to the rte_mempool_info structure + * @return + * - 0: Success; The mempool driver supports retrieving supplementary + *mempool information + * - -ENOTSUP - doesn't support get_info ops (valid case). + */ +__rte_experimental +int rte_mempool_ops_get_info(const struct rte_mempool *mp, +struct rte_mempool_info *info); + +/** * @internal wrapper for mempool_ops free callback. * * @param mp diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c index ea9be1e..efc1c08 100644 --- a/lib/librte_mempool/rte_mempool_ops.c +++ b/lib/librte_mempool/rte_mempool_ops.c @@ -59,6 +59,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h) ops->get_count = h->get_count; ops->calc_mem_size = h->calc_mem_size; ops->populate = h->populate; + ops->get_info = h->get_info; rte_spinlock_unlock(&rte_mempool_ops_table.sl); @@ -134,6 +135,20 @@ rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs, obj_cb_arg); } +/* wrapper to get additional mempool info */ +int +rte_mempool_ops_get_info(const struct rte_mempool *mp, +struct rte_mempool_info *info) +{ + struct rte_mempool_ops *ops; + + ops = rte_mempool_get_ops(mp->ops_index); + + RTE_FUNC_PTR_OR_ERR_RET(ops->get_info, -ENOTSUP); + return ops->get_info(mp, info); +} + + /* sets mempool ops previously registered by rte_mempool_register_ops. */ int rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name, diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map index cf375db..c9d16ec 100644 --- a/lib/librte_mempool/rte_mempool_version.map +++ b/lib/librte_mempool/rte_mempool_version.map @@ -57,3 +57,10 @@ DPDK_18.05 { rte_mempool_op_populate_default; } DPDK_17.11; + +EXPERIMENTAL { + global: + + rte_mempool_ops_get_info; + +} DPDK_18.05; -- 2.7.4
[dpdk-dev] [PATCH v1 4/6] mempool/bucket: implement block dequeue operation
From: "Artem V. Andreev" Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- drivers/mempool/bucket/rte_mempool_bucket.c | 52 + 1 file changed, 52 insertions(+) diff --git a/drivers/mempool/bucket/rte_mempool_bucket.c b/drivers/mempool/bucket/rte_mempool_bucket.c index 5a1bd79..0365671 100644 --- a/drivers/mempool/bucket/rte_mempool_bucket.c +++ b/drivers/mempool/bucket/rte_mempool_bucket.c @@ -294,6 +294,46 @@ bucket_dequeue(struct rte_mempool *mp, void **obj_table, unsigned int n) return rc; } +static int +bucket_dequeue_contig_blocks(struct rte_mempool *mp, void **first_obj_table, +unsigned int n) +{ + struct bucket_data *bd = mp->pool_data; + const uint32_t header_size = bd->header_size; + struct bucket_stack *cur_stack = bd->buckets[rte_lcore_id()]; + unsigned int n_buckets_from_stack = RTE_MIN(n, cur_stack->top); + struct bucket_header *hdr; + void **first_objp = first_obj_table; + + bucket_adopt_orphans(bd); + + n -= n_buckets_from_stack; + while (n_buckets_from_stack-- > 0) { + hdr = bucket_stack_pop_unsafe(cur_stack); + *first_objp++ = (uint8_t *)hdr + header_size; + } + if (n > 0) { + if (unlikely(rte_ring_dequeue_bulk(bd->shared_bucket_ring, + first_objp, n, NULL) != n)) { + /* Return the already dequeued buckets */ + while (first_objp-- != first_obj_table) { + bucket_stack_push(cur_stack, + (uint8_t *)*first_objp - + header_size); + } + rte_errno = ENOBUFS; + return -rte_errno; + } + while (n-- > 0) { + hdr = (struct bucket_header *)*first_objp; + hdr->lcore_id = rte_lcore_id(); + *first_objp++ = (uint8_t *)hdr + header_size; + } + } + + return 0; +} + static void count_underfilled_buckets(struct rte_mempool *mp, void *opaque, @@ -547,6 +587,16 @@ bucket_populate(struct rte_mempool *mp, unsigned int max_objs, return n_objs; } +static int +bucket_get_info(const struct rte_mempool *mp, struct rte_mempool_info *info) +{ + struct bucket_data *bd = mp->pool_data; + + info->contig_block_size = bd->obj_per_bucket; + return 0; +} + + static const struct rte_mempool_ops ops_bucket = { .name = "bucket", .alloc = bucket_alloc, @@ -556,6 +606,8 @@ static const struct rte_mempool_ops ops_bucket = { .get_count = bucket_get_count, .calc_mem_size = bucket_calc_mem_size, .populate = bucket_populate, + .get_info = bucket_get_info, + .dequeue_contig_blocks = bucket_dequeue_contig_blocks, }; -- 2.7.4
[dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated
Size of memory chunk required to populate mempool objects depends on how objects are stored in the memory. Different mempool drivers may have different requirements and a new operation allows to calculate memory size in accordance with driver requirements and advertise requirements on minimum memory chunk size and alignment in a generic way. Bump ABI version since the patch breaks it. Suggested-by: Olivier Matz Signed-off-by: Andrew Rybchenko --- v2 -> v3: - none v1 -> v2: - clarify min_chunk_size meaning - rebase on top of patch series which fixes library version in meson build RFCv2 -> v1: - move default calc_mem_size callback to rte_mempool_ops_default.c - add ABI changes to release notes - name default callback consistently: rte_mempool_op__default() - bump ABI version since it is the first patch which breaks ABI - describe default callback behaviour in details - avoid introduction of internal function to cope with deprecation (keep it to deprecation patch) - move cache-line or page boundary chunk alignment to default callback - highlight that min_chunk_size and align parameters are output only doc/guides/rel_notes/deprecation.rst | 3 +- doc/guides/rel_notes/release_18_05.rst | 7 ++- lib/librte_mempool/Makefile | 3 +- lib/librte_mempool/meson.build | 5 +- lib/librte_mempool/rte_mempool.c | 43 +++--- lib/librte_mempool/rte_mempool.h | 86 +++- lib/librte_mempool/rte_mempool_ops.c | 18 ++ lib/librte_mempool/rte_mempool_ops_default.c | 38 lib/librte_mempool/rte_mempool_version.map | 7 +++ 9 files changed, 182 insertions(+), 28 deletions(-) create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 6594585..e02d4ca 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -72,8 +72,7 @@ Deprecation Notices - removal of ``get_capabilities`` mempool ops and related flags. - substitute ``register_memory_area`` with ``populate`` ops. - - addition of new ops to customize required memory chunk calculation, -customize objects population and allocate contiguous + - addition of new ops to customize objects population and allocate contiguous block of objects if underlying driver supports it. * mbuf: The control mbuf API will be removed in v18.05. The impacted diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst index f2525bb..59583ea 100644 --- a/doc/guides/rel_notes/release_18_05.rst +++ b/doc/guides/rel_notes/release_18_05.rst @@ -80,6 +80,11 @@ ABI Changes Also, make sure to start the actual text at the margin. = +* **Changed rte_mempool_ops structure.** + + A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops`` + to allow to customize required memory size calculation. + Removed Items - @@ -152,7 +157,7 @@ The libraries prepended with a plus sign were incremented in this version. librte_latencystats.so.1 librte_lpm.so.2 librte_mbuf.so.3 - librte_mempool.so.3 + + librte_mempool.so.4 + librte_meter.so.2 librte_metrics.so.1 librte_net.so.1 diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile index 24e735a..072740f 100644 --- a/lib/librte_mempool/Makefile +++ b/lib/librte_mempool/Makefile @@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring EXPORT_MAP := rte_mempool_version.map -LIBABIVER := 3 +LIBABIVER := 4 # all source are stored in SRCS-y SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool.c SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool_ops.c +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool_ops_default.c # install includes SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build index 712720f..9e3b527 100644 --- a/lib/librte_mempool/meson.build +++ b/lib/librte_mempool/meson.build @@ -1,7 +1,8 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2017 Intel Corporation -version = 3 -sources = files('rte_mempool.c', 'rte_mempool_ops.c') +version = 4 +sources = files('rte_mempool.c', 'rte_mempool_ops.c', + 'rte_mempool_ops_default.c') headers = files('rte_mempool.h') deps += ['ring'] diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index d8e3720..dd2d0fe 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -561,10 +561,10 @@ rte_mempool_populate_default(struct rte_mempool *mp) unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY; char mz_name[RTE_MEMZONE_NAMESIZE]; const struct rte_memzone *mz; - size_t size, total_elt_sz, align, pg_sz, pg_shift; + ssize_t mem_s
[dpdk-dev] [PATCH v1 5/6] mempool/bucket: do not allow one lcore to grab all buckets
From: "Artem V. Andreev" Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- drivers/mempool/bucket/rte_mempool_bucket.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/mempool/bucket/rte_mempool_bucket.c b/drivers/mempool/bucket/rte_mempool_bucket.c index 0365671..6c2da1c 100644 --- a/drivers/mempool/bucket/rte_mempool_bucket.c +++ b/drivers/mempool/bucket/rte_mempool_bucket.c @@ -42,6 +42,7 @@ struct bucket_data { unsigned int header_size; unsigned int total_elt_size; unsigned int obj_per_bucket; + unsigned int bucket_stack_thresh; uintptr_t bucket_page_mask; struct rte_ring *shared_bucket_ring; struct bucket_stack *buckets[RTE_MAX_LCORE]; @@ -139,6 +140,7 @@ bucket_enqueue(struct rte_mempool *mp, void * const *obj_table, unsigned int n) { struct bucket_data *bd = mp->pool_data; + struct bucket_stack *local_stack = bd->buckets[rte_lcore_id()]; unsigned int i; int rc = 0; @@ -146,6 +148,15 @@ bucket_enqueue(struct rte_mempool *mp, void * const *obj_table, rc = bucket_enqueue_single(bd, obj_table[i]); RTE_ASSERT(rc == 0); } + if (local_stack->top > bd->bucket_stack_thresh) { + rte_ring_enqueue_bulk(bd->shared_bucket_ring, + &local_stack->objects + [bd->bucket_stack_thresh], + local_stack->top - + bd->bucket_stack_thresh, + NULL); + local_stack->top = bd->bucket_stack_thresh; + } return rc; } @@ -408,6 +419,8 @@ bucket_alloc(struct rte_mempool *mp) bd->obj_per_bucket = (bd->bucket_mem_size - bucket_header_size) / bd->total_elt_size; bd->bucket_page_mask = ~(rte_align64pow2(bd->bucket_mem_size) - 1); + /* eventually this should be a tunable parameter */ + bd->bucket_stack_thresh = (mp->size / bd->obj_per_bucket) * 4 / 3; if (mp->flags & MEMPOOL_F_SP_PUT) rg_flags |= RING_F_SP_ENQ; -- 2.7.4
[dpdk-dev] [PATCH v1 6/6] doc: advertise bucket mempool driver
Signed-off-by: Andrew Rybchenko --- doc/guides/rel_notes/release_18_05.rst | 9 + 1 file changed, 9 insertions(+) diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst index 016c4ed..c578364 100644 --- a/doc/guides/rel_notes/release_18_05.rst +++ b/doc/guides/rel_notes/release_18_05.rst @@ -52,6 +52,15 @@ New Features * Added support for NVGRE, VXLAN and GENEVE filters in flow API. * Added support for DROP action in flow API. +* **Added bucket mempool driver.** + + Added bucket mempool driver which provide a way to allocate contiguous + block of objects. + Number of objects in the block depends on how many objects fit in + RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB memory chunk which is build time option. + The number may be obtained using rte_mempool_ops_get_info() API. + Contiguous blocks may be allocated using rte_mempool_get_contig_blocks() API. + API Changes --- -- 2.7.4
[dpdk-dev] [PATCH v1 0/6] mempool: add bucket driver
The initial patch series [1] (RFCv1 is [2]) is split into two to simplify processing. It is the second part which relies on the first one [3]. It should be applied on top of [4] and [3]. The patch series adds bucket mempool driver which allows to allocate (both physically and virtually) contiguous blocks of objects and adds mempool API to do it. It is still capable to provide separate objects, but it is definitely more heavy-weight than ring/stack drivers. The driver will be used by the future Solarflare driver enhancements which allow to utilize physical contiguous blocks in the NIC firmware. The target usecase is dequeue in blocks and enqueue separate objects back (which are collected in buckets to be dequeued). So, the memory pool with bucket driver is created by an application and provided to networking PMD receive queue. The choice of bucket driver is done using rte_eth_dev_pool_ops_supported(). A PMD that relies upon contiguous block allocation should report the bucket driver as the only supported and preferred one. Introduction of the contiguous block dequeue operation is proven by performance measurements using autotest with minor enhancements: - in the original test bulks are powers of two, which is unacceptable for us, so they are changed to multiple of contig_block_size; - the test code is duplicated to support plain dequeue and dequeue_contig_blocks; - all the extra test variations (with/without cache etc) are eliminated; - a fake read from the dequeued buffer is added (in both cases) to simulate mbufs access. start performance test for bucket (without cache) mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Srate_persec= 111935488 mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Srate_persec= 115290931 mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Srate_persec= 353055539 mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Srate_persec= 353330790 mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Srate_persec= 224657407 mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Srate_persec= 230411468 mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Srate_persec= 706700902 mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Srate_persec= 703673139 mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Srate_persec= 425236887 mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Srate_persec= 437295512 mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Srate_persec= 1343409356 mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Srate_persec= 1336567397 start performance test for bucket (without cache + contiguous dequeue) mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Crate_persec= 122945536 mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Crate_persec= 126458265 mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Crate_persec= 374262988 mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Crate_persec= 377316966 mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Crate_persec= 244842496 mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Crate_persec= 251618917 mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Crate_persec= 751226060 mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Crate_persec= 756233010 mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Crate_persec= 462068120 mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Crate_persec= 476997221 mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Crate_persec= 1432171313 mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Crate_persec= 1438829771 The number of objects in the contiguous block is a function of bucket memory size (.config option) and total element size. In the future additional API with possibility to pass parameters on mempool allocation may be added. It breaks ABI since changes rte_mempool_ops. The ABI version is already bumped in [4]. [1] https://dpdk.org/ml/archives/dev/2018-January/088698.html [2] https://dpdk.org/ml/archives/dev/2017-November/082335.html [3] https://dpdk.org/ml/archives/dev/2018-March/093807.html [4] https://dpdk.org/ml/archives/dev/2018-March/093196.html RFCv2 -> v1: - rebased on top of [3] - cleanup deprecation notice when it is done - mark a new API experimental - move contig blocks dequeue debug ch
[dpdk-dev] [PATCH v1 1/6] mempool/bucket: implement bucket mempool manager
From: "Artem V. Andreev" The manager provides a way to allocate physically and virtually contiguous set of objects. Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- MAINTAINERS| 9 + config/common_base | 2 + drivers/mempool/Makefile | 1 + drivers/mempool/bucket/Makefile| 27 + drivers/mempool/bucket/meson.build | 9 + drivers/mempool/bucket/rte_mempool_bucket.c| 562 + .../mempool/bucket/rte_mempool_bucket_version.map | 4 + mk/rte.app.mk | 1 + 8 files changed, 615 insertions(+) create mode 100644 drivers/mempool/bucket/Makefile create mode 100644 drivers/mempool/bucket/meson.build create mode 100644 drivers/mempool/bucket/rte_mempool_bucket.c create mode 100644 drivers/mempool/bucket/rte_mempool_bucket_version.map diff --git a/MAINTAINERS b/MAINTAINERS index aa30bd9..db903b3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -326,6 +326,15 @@ F: test/test/test_rawdev.c F: doc/guides/prog_guide/rawdev.rst +Memory Pool Drivers +--- + +Bucket memory pool +M: Artem V. Andreev +M: Andrew Rybchenko +F: drivers/mempool/bucket/ + + Bus Drivers --- diff --git a/config/common_base b/config/common_base index ee10b44..dd6d420 100644 --- a/config/common_base +++ b/config/common_base @@ -606,6 +606,8 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n # # Compile Mempool drivers # +CONFIG_RTE_DRIVER_MEMPOOL_BUCKET=y +CONFIG_RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB=64 CONFIG_RTE_DRIVER_MEMPOOL_RING=y CONFIG_RTE_DRIVER_MEMPOOL_STACK=y diff --git a/drivers/mempool/Makefile b/drivers/mempool/Makefile index fc8b73b..28c2e83 100644 --- a/drivers/mempool/Makefile +++ b/drivers/mempool/Makefile @@ -3,6 +3,7 @@ include $(RTE_SDK)/mk/rte.vars.mk +DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_BUCKET) += bucket ifeq ($(CONFIG_RTE_LIBRTE_DPAA_BUS),y) DIRS-$(CONFIG_RTE_LIBRTE_DPAA_MEMPOOL) += dpaa endif diff --git a/drivers/mempool/bucket/Makefile b/drivers/mempool/bucket/Makefile new file mode 100644 index 000..7364916 --- /dev/null +++ b/drivers/mempool/bucket/Makefile @@ -0,0 +1,27 @@ +# SPDX-License-Identifier: BSD-3-Clause +# +# Copyright (c) 2017-2018 Solarflare Communications Inc. +# All rights reserved. +# +# This software was jointly developed between OKTET Labs (under contract +# for Solarflare) and Solarflare Communications, Inc. + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = librte_mempool_bucket.a + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +LDLIBS += -lrte_eal -lrte_mempool -lrte_ring + +EXPORT_MAP := rte_mempool_bucket_version.map + +LIBABIVER := 1 + +SRCS-$(CONFIG_RTE_DRIVER_MEMPOOL_BUCKET) += rte_mempool_bucket.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/mempool/bucket/meson.build b/drivers/mempool/bucket/meson.build new file mode 100644 index 000..618d791 --- /dev/null +++ b/drivers/mempool/bucket/meson.build @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: BSD-3-Clause +# +# Copyright (c) 2017-2018 Solarflare Communications Inc. +# All rights reserved. +# +# This software was jointly developed between OKTET Labs (under contract +# for Solarflare) and Solarflare Communications, Inc. + +sources = files('rte_mempool_bucket.c') diff --git a/drivers/mempool/bucket/rte_mempool_bucket.c b/drivers/mempool/bucket/rte_mempool_bucket.c new file mode 100644 index 000..5a1bd79 --- /dev/null +++ b/drivers/mempool/bucket/rte_mempool_bucket.c @@ -0,0 +1,562 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * + * Copyright (c) 2017-2018 Solarflare Communications Inc. + * All rights reserved. + * + * This software was jointly developed between OKTET Labs (under contract + * for Solarflare) and Solarflare Communications, Inc. + */ + +#include +#include +#include + +#include +#include +#include +#include + +/* + * The general idea of the bucket mempool driver is as follows. + * We keep track of physically contiguous groups (buckets) of objects + * of a certain size. Every such a group has a counter that is + * incremented every time an object from that group is enqueued. + * Until the bucket is full, no objects from it are eligible for allocation. + * If a request is made to dequeue a multiply of bucket size, it is + * satisfied by returning the whole buckets, instead of separate objects. + */ + + +struct bucket_header { + unsigned int lcore_id; + uint8_t fill_cnt; +}; + +struct bucket_stack { + unsigned int top; + unsigned int limit; + void *objects[]; +}; + +struct bucket_data { + unsigned int header_size; + unsigned int total_elt_size; + unsigned int obj_per_bucket; + uintptr_t bucket_page_mask; + struct rte_ring *shared_bucket_ring; + struct bucket_stack *buckets[RTE_MAX_LCORE]; + /* +* Multi-producer single-consumer ring to hold objects that are +
[dpdk-dev] [PATCH v1 3/6] mempool: support block dequeue operation
From: "Artem V. Andreev" If mempool manager supports object blocks (physically and virtual contiguous set of objects), it is sufficient to get the first object only and the function allows to avoid filling in of information about each block member. Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- doc/guides/rel_notes/deprecation.rst | 7 -- lib/librte_mempool/Makefile| 1 + lib/librte_mempool/meson.build | 2 + lib/librte_mempool/rte_mempool.c | 39 lib/librte_mempool/rte_mempool.h | 151 - lib/librte_mempool/rte_mempool_ops.c | 1 + lib/librte_mempool/rte_mempool_version.map | 1 + 7 files changed, 194 insertions(+), 8 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 5301259..8249638 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -59,13 +59,6 @@ Deprecation Notices - ``rte_eal_mbuf_default_mempool_ops`` -* mempool: several API and ABI changes are planned in v18.05. - - The following changes are planned: - - - addition of new op to allocate contiguous -block of objects if underlying driver supports it. - * mbuf: The control mbuf API will be removed in v18.05. The impacted functions and macros are: diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile index 2c46fdd..62dd1a4 100644 --- a/lib/librte_mempool/Makefile +++ b/lib/librte_mempool/Makefile @@ -10,6 +10,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 # Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab() # from earlier deprecated rte_mempool_populate_phys_tab() CFLAGS += -Wno-deprecated-declarations +CFLAGS += -DALLOW_EXPERIMENTAL_API LDLIBS += -lrte_eal -lrte_ring EXPORT_MAP := rte_mempool_version.map diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build index 22e912a..8ef88e3 100644 --- a/lib/librte_mempool/meson.build +++ b/lib/librte_mempool/meson.build @@ -1,6 +1,8 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2017 Intel Corporation +allow_experimental_apis = true + extra_flags = [] # Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab() diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index c58bcc6..79f8429 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -1125,6 +1125,36 @@ void rte_mempool_check_cookies(const struct rte_mempool *mp, #endif } +void +rte_mempool_contig_blocks_check_cookies(const struct rte_mempool *mp, + void * const *first_obj_table_const, unsigned int n, int free) +{ +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG + struct rte_mempool_info info; + const size_t total_elt_sz = + mp->header_size + mp->elt_size + mp->trailer_size; + unsigned int i, j; + + rte_mempool_ops_get_info(mp, &info); + + for (i = 0; i < n; ++i) { + void *first_obj = first_obj_table_const[i]; + + for (j = 0; j < info.contig_block_size; ++j) { + void *obj; + + obj = (void *)((uintptr_t)first_obj + j * total_elt_sz); + rte_mempool_check_cookies(mp, &obj, 1, free); + } + } +#else + RTE_SET_USED(mp); + RTE_SET_USED(first_obj_table_const); + RTE_SET_USED(n); + RTE_SET_USED(free); +#endif +} + #ifdef RTE_LIBRTE_MEMPOOL_DEBUG static void mempool_obj_audit(struct rte_mempool *mp, __rte_unused void *opaque, @@ -1190,6 +1220,7 @@ void rte_mempool_dump(FILE *f, struct rte_mempool *mp) { #ifdef RTE_LIBRTE_MEMPOOL_DEBUG + struct rte_mempool_info info; struct rte_mempool_debug_stats sum; unsigned lcore_id; #endif @@ -1231,6 +1262,7 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp) /* sum and dump statistics */ #ifdef RTE_LIBRTE_MEMPOOL_DEBUG + rte_mempool_ops_get_info(mp, &info); memset(&sum, 0, sizeof(sum)); for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { sum.put_bulk += mp->stats[lcore_id].put_bulk; @@ -1239,6 +1271,8 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp) sum.get_success_objs += mp->stats[lcore_id].get_success_objs; sum.get_fail_bulk += mp->stats[lcore_id].get_fail_bulk; sum.get_fail_objs += mp->stats[lcore_id].get_fail_objs; + sum.get_success_blks += mp->stats[lcore_id].get_success_blks; + sum.get_fail_blks += mp->stats[lcore_id].get_fail_blks; } fprintf(f, " stats:\n"); fprintf(f, "put_bulk=%"PRIu64"\n", sum.put_bulk); @@ -1247,6 +1281,11 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp) fprintf(f, "get_success_objs=%"PRIu64"\n", sum.get_success_objs); fprintf(f, "get_fail_bulk=%"PRIu64"\n", sum.get_fail_bulk); fprint
Re: [dpdk-dev] [PATCH] kni: optimize the kni release speed
On 2/6/2018 10:33 AM, zhouyangchao wrote: > Physical addresses in the fifo named alloc_q need to be traversed to > release in user space. The physical address to the virtual address > conversion in kernel space is much better. Yes current approach should be slower but this is not in data path, this is when a kni interface released, I expect no recognizable difference. > > Signed-off-by: Yangchao Zhou > --- > lib/librte_eal/linuxapp/kni/kni_dev.h | 1 + > lib/librte_eal/linuxapp/kni/kni_misc.c | 1 + > lib/librte_eal/linuxapp/kni/kni_net.c | 15 +++ > lib/librte_kni/rte_kni.c | 26 +- > 4 files changed, 18 insertions(+), 25 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/kni/kni_dev.h > b/lib/librte_eal/linuxapp/kni/kni_dev.h > index c9393d8..7cd9bf8 100644 > --- a/lib/librte_eal/linuxapp/kni/kni_dev.h > +++ b/lib/librte_eal/linuxapp/kni/kni_dev.h > @@ -92,6 +92,7 @@ struct kni_dev { > void *alloc_va[MBUF_BURST_SZ]; > }; > > +void kni_net_fifo_pa2va(struct kni_dev *kni); > void kni_net_rx(struct kni_dev *kni); > void kni_net_init(struct net_device *dev); > void kni_net_config_lo_mode(char *lo_str); > diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c > b/lib/librte_eal/linuxapp/kni/kni_misc.c > index 01574ec..668488b 100644 > --- a/lib/librte_eal/linuxapp/kni/kni_misc.c > +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c > @@ -507,6 +507,7 @@ kni_ioctl_release(struct net *net, uint32_t ioctl_num, > dev->pthread = NULL; > } > > + kni_net_fifo_pa2va(dev); > kni_dev_remove(dev); > list_del(&dev->list); > ret = 0; > diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c > b/lib/librte_eal/linuxapp/kni/kni_net.c > index 9f9b798..662a527 100644 > --- a/lib/librte_eal/linuxapp/kni/kni_net.c > +++ b/lib/librte_eal/linuxapp/kni/kni_net.c > @@ -73,6 +73,21 @@ va2pa(void *va, struct rte_kni_mbuf *m) > return pa; > } > > +/* convert physical addresses to virtual addresses in fifo for kni release */ > +void > +kni_net_fifo_pa2va(struct kni_dev *kni) > +{ > + void *fifo = kni->alloc_q; > + int i, count = kni_fifo_count(fifo); > + void *pa = NULL, *kva, *va; > + for (i = 0; i < count; ++i) { > + (void)kni_fifo_get(fifo, &pa, 1); > + kva = pa2kva(pa); > + va = pa2va(pa, kva); > + (void)kni_fifo_put(fifo, &va, 1); kni fifo are single producer, single consumer. For alloc_q kernel side is consumer, I aware at this stage applications should stop the traffic, but still I am not comfortable mixing producer/consumer roles here. Also alloc_q should have physical addresses this logic stores virtual addresses in it and not sure about this either to mix addressing logic in the queue. Instead of this conversion, what about moving from alloc_q to free_q? free_q already has virtual addresses and freed by userspace, so this will be safe. I suggest keeping alloc_q free logic in the userspace in any case, if alloc_q is free it won't cost anyway. And while checking for this I may found something else. We have same problem with rx_q, it has physical addresses which makes hard to free in userspace. The existing intention is to give some time to kernel to consume the rx_q so that it won't be an issue for userspace. But that logic can be wrong. During the time userspace waits the netdev may be already destroyed and there is nothing to receive the packet, perhaps we should move wait above the ioctl. Since you are already checking these parts perhaps you would like to comment :) > + } > +} > + > /* > * It can be called to process the request. > */ > diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c > index 2867411..f8398a9 100644 > --- a/lib/librte_kni/rte_kni.c > +++ b/lib/librte_kni/rte_kni.c > @@ -435,30 +435,6 @@ va2pa(struct rte_mbuf *m) >(unsigned long)m->buf_iova)); > } > > -static void > -obj_free(struct rte_mempool *mp __rte_unused, void *opaque, void *obj, > - unsigned obj_idx __rte_unused) > -{ > - struct rte_mbuf *m = obj; > - void *mbuf_phys = opaque; > - > - if (va2pa(m) == mbuf_phys) > - rte_pktmbuf_free(m); > -} > - > -static void > -kni_free_fifo_phy(struct rte_mempool *mp, struct rte_kni_fifo *fifo) > -{ > - void *mbuf_phys; > - int ret; > - > - do { > - ret = kni_fifo_get(fifo, &mbuf_phys, 1); > - if (ret) > - rte_mempool_obj_iter(mp, obj_free, mbuf_phys); > - } while (ret); > -} > - > int > rte_kni_release(struct rte_kni *kni) > { > @@ -484,7 +460,7 @@ rte_kni_release(struct rte_kni *kni) > if (kni_fifo_count(kni->rx_q)) > RTE_LOG(ERR, KNI, "Fail to free all Rx-q items\n"); > > - kni_free_fifo_phy(kni->pktmbuf_pool, kni->alloc_q); > + kni_free_fifo(kni->alloc_q); > kni_free_fifo(kni->tx_q); > kni_f
Re: [dpdk-dev] [PATCH 4/4] igb_uio: bind error if pcie bridge
On 3/21/2018 6:06 PM, Ajit Khaparde wrote: > From: Darren Edamura > > Probe function should exit immediately if pcie bridge detected > > Signed-off-by: Darren Edamura > Signed-off-by: Rahul Gupta > Signed-off-by: Scott Branden > Signed-off-by: Ajit Khaparde > --- > lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c > b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c > index 4cae4dd27..3fabbfc4d 100644 > --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c > +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c > @@ -473,6 +473,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct > pci_device_id *id) > void *map_addr; > int err; > > + if (pci_is_bridge(dev)) > + return -ENODEV; What do you think printing a log here? > + > udev = kzalloc(sizeof(struct rte_uio_pci_dev), GFP_KERNEL); > if (!udev) > return -ENOMEM; >
Re: [dpdk-dev] [PATCH 4/4] igb_uio: bind error if pcie bridge
Hi Ferruh, On 18-03-26 10:24 AM, Ferruh Yigit wrote: On 3/21/2018 6:06 PM, Ajit Khaparde wrote: From: Darren Edamura Probe function should exit immediately if pcie bridge detected Signed-off-by: Darren Edamura Signed-off-by: Rahul Gupta Signed-off-by: Scott Branden Signed-off-by: Ajit Khaparde --- lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c index 4cae4dd27..3fabbfc4d 100644 --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c @@ -473,6 +473,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) void *map_addr; int err; + if (pci_is_bridge(dev)) + return -ENODEV; What do you think printing a log here? I think it brings little value. ENODEV is already returned? + udev = kzalloc(sizeof(struct rte_uio_pci_dev), GFP_KERNEL); if (!udev) return -ENOMEM; Regards, Scott
Re: [dpdk-dev] [PATCH v2] net/mlx5: setup RSS regardless of queue count
> -Original Message- > From: Yongseok Koh [mailto:ys...@mellanox.com] > Sent: Thursday, March 22, 2018 7:38 PM <..> > > > > Signed-off-by: Allain Legacy > > Signed-off-by: Nelio Laranjeiro > > --- > > Dahir, Allain > > Did you get a chance to test this patch? It would be good to have 'tested-by' > tag from you. > > > Acked-by: Yongseok Koh Tested-by: Allain Legacy
Re: [dpdk-dev] [PATCH 4/4] igb_uio: bind error if pcie bridge
On 3/26/2018 7:05 PM, Scott Branden wrote: > Hi Ferruh, > > > On 18-03-26 10:24 AM, Ferruh Yigit wrote: >> On 3/21/2018 6:06 PM, Ajit Khaparde wrote: >>> From: Darren Edamura >>> >>> Probe function should exit immediately if pcie bridge detected >>> >>> Signed-off-by: Darren Edamura >>> Signed-off-by: Rahul Gupta >>> Signed-off-by: Scott Branden >>> Signed-off-by: Ajit Khaparde >>> --- >>> lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >>> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >>> index 4cae4dd27..3fabbfc4d 100644 >>> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >>> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >>> @@ -473,6 +473,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct >>> pci_device_id *id) >>> void *map_addr; >>> int err; >>> >>> + if (pci_is_bridge(dev)) >>> + return -ENODEV; >> What do you think printing a log here? > I think it brings little value. ENODEV is already returned? User should not provide bridge address at first place, I guess this is a protection in case user provides bridge address by mistake. In that case no device will be probed and user won't have any idea why. I think a log in dmesg saying bridge device is provided may help to the user. >> >>> + >>> udev = kzalloc(sizeof(struct rte_uio_pci_dev), GFP_KERNEL); >>> if (!udev) >>> return -ENOMEM; >>> > Regards, > Scott >
Re: [dpdk-dev] [PATCH V2] igb_uio: fix uevent montior issue
On 2/27/2018 7:20 AM, Jeff Guo wrote: > udev could not detect remove and add event of device when hotplug in > and out devices, that related with the fix about using pointer of > rte_uio_pci_dev as dev_id instead of uio_device for irq device handler, > that would result igb uio irq failure when kernel version after than 3.17. > > The root cause is that the older version of Linux kernel don't expose the > uio_device structure, only for the kernel version after than 3.17 use > uio_device. so this patch correct it by use a macro to check before handle > the pci interrupt. > > Fixes: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17") > Signed-off-by: Jeff Guo > --- > v2->v1: > use macro in compat.h to replace of version check in .c file, benifit for > future backport and make more readable. > --- > lib/librte_eal/linuxapp/igb_uio/compat.h | 4 > lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 16 > 2 files changed, 20 insertions(+) > > diff --git a/lib/librte_eal/linuxapp/igb_uio/compat.h > b/lib/librte_eal/linuxapp/igb_uio/compat.h > index ce456d4..2c61190 100644 > --- a/lib/librte_eal/linuxapp/igb_uio/compat.h > +++ b/lib/librte_eal/linuxapp/igb_uio/compat.h > @@ -132,3 +132,7 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev) > #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 5, 0) > #define HAVE_PCI_MSI_MASK_IRQ 1 > #endif > + > +#if LINUX_VERSION_CODE > KERNEL_VERSION(3, 17, 0) > +#define HAVE_UIO_DEVICE_STRUCTURE 1 > +#endif > diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c > b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c > index 4cae4dd..99018f4 100644 > --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c > +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c > @@ -192,8 +192,14 @@ igbuio_pci_irqcontrol(struct uio_info *info, s32 > irq_state) > static irqreturn_t > igbuio_pci_irqhandler(int irq, void *dev_id) > { > +#ifndef HAVE_UIO_DEVICE_STRUCTURE > struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id; > struct uio_info *info = &udev->info; > +#else > + struct uio_device *idev = (struct uio_device *)dev_id; > + struct uio_info *info = idev->info; > + struct rte_uio_pci_dev *udev = info->priv; > +#endif > > /* Legacy mode need to mask in hardware */ > if (udev->mode == RTE_INTR_MODE_LEGACY && > @@ -279,9 +285,15 @@ igbuio_pci_enable_interrupts(struct rte_uio_pci_dev > *udev) > } > > if (udev->info.irq != UIO_IRQ_NONE) > +#ifndef HAVE_UIO_DEVICE_STRUCTURE > err = request_irq(udev->info.irq, igbuio_pci_irqhandler, > udev->info.irq_flags, udev->info.name, > udev); > +#else > + err = request_irq(udev->info.irq, igbuio_pci_irqhandler, > + udev->info.irq_flags, udev->info.name, > + udev->info.uio_dev); > +#endif Hi Jeff, Can you please describe how this is solving the problem. Isn't only requirement for dev_id to be unique? Why it differs to pass uio_dev instead of udev pointer? > dev_info(&udev->pdev->dev, "uio device registered with irq %ld\n", >udev->info.irq); > > @@ -292,7 +304,11 @@ static void > igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev) > { > if (udev->info.irq) { > +#ifndef HAVE_UIO_DEVICE_STRUCTURE > free_irq(udev->info.irq, udev); > +#else > + free_irq(udev->info.irq, udev->info.uio_dev); > +#endif > udev->info.irq = 0; > } > >
Re: [dpdk-dev] [PATCH] ethdev: return diagnostic when setting MAC address
On 2/27/2018 3:11 PM, Olivier Matz wrote: > Change the prototype and the behavior of dev_ops->eth_mac_addr_set(): a > return code is added to notify the caller (librte_ether) if an error > occurred in the PMD. > > The new default MAC address is now copied in dev->data->mac_addrs[0] > only if the operation is successful. > > The patch also updates all the PMDs accordingly. > > Signed-off-by: Olivier Matz > --- > > Hi, > > This patch is the following of the discussion we had in this thread: > https://dpdk.org/dev/patchwork/patch/32284/ > > I did my best to keep the consistency inside the PMDs. The behavior > of eth_mac_addr_set() is inspired from other fonctions in the same > PMD, usually eth_mac_addr_add(). For instance: > - dpaa and dpaa2 return 0 on error. > - some PMDs (bnxt, mlx5, ...?) do not return a -errno code (-1 or > positive values). > - some PMDs (avf, tap) check if the address is the same and return 0 > in that case. This could go in generic code? > > I tried to use the following errors when relevant: > - -EPERM when a VF is not allowed to do a change > - -ENOTSUP if the function is not supported > - -EIO if this is an unknown error from lower layer (hw or sdk) > - -EINVAL for other unknown errors > > Please, PMD maintainers, feel free to comment if you ahve specific > needs for your driver. > > Thanks > Olivier > > > doc/guides/rel_notes/deprecation.rst| 8 > drivers/net/ark/ark_ethdev.c| 9 ++--- > drivers/net/avf/avf_ethdev.c| 12 > drivers/net/bnxt/bnxt_ethdev.c | 10 ++ > drivers/net/bonding/rte_eth_bond_pmd.c | 8 ++-- > drivers/net/dpaa/dpaa_ethdev.c | 4 +++- > drivers/net/dpaa2/dpaa2_ethdev.c| 6 -- > drivers/net/e1000/igb_ethdev.c | 12 +++- > drivers/net/failsafe/failsafe_ops.c | 16 +--- > drivers/net/i40e/i40e_ethdev.c | 24 ++- > drivers/net/i40e/i40e_ethdev_vf.c | 12 +++- > drivers/net/ixgbe/ixgbe_ethdev.c| 13 - > drivers/net/mlx4/mlx4.h | 2 +- > drivers/net/mlx4/mlx4_ethdev.c | 7 +-- > drivers/net/mlx5/mlx5.h | 2 +- > drivers/net/mlx5/mlx5_mac.c | 7 +-- > drivers/net/mrvl/mrvl_ethdev.c | 7 ++- > drivers/net/null/rte_eth_null.c | 3 ++- > drivers/net/octeontx/octeontx_ethdev.c | 4 +++- > drivers/net/qede/qede_ethdev.c | 7 +++ > drivers/net/sfc/sfc_ethdev.c| 14 +- > drivers/net/szedata2/rte_eth_szedata2.c | 3 ++- > drivers/net/tap/rte_eth_tap.c | 34 > + > drivers/net/virtio/virtio_ethdev.c | 15 ++- > drivers/net/vmxnet3/vmxnet3_ethdev.c| 5 +++-- > lib/librte_ether/rte_ethdev.c | 7 +-- > lib/librte_ether/rte_ethdev_core.h | 2 +- > test/test/virtual_pmd.c | 3 ++- > 28 files changed, 159 insertions(+), 97 deletions(-) ethdev part looks good to me. Are you planning to have another version for mrvl and sfc comments? PMD maintainers, please check and provide feedback for your PMD, otherwise the patch will go in as it is. Thanks, ferruh
Re: [dpdk-dev] [PATCH 4/4] igb_uio: bind error if pcie bridge
On 18-03-26 11:20 AM, Ferruh Yigit wrote: On 3/26/2018 7:05 PM, Scott Branden wrote: Hi Ferruh, On 18-03-26 10:24 AM, Ferruh Yigit wrote: On 3/21/2018 6:06 PM, Ajit Khaparde wrote: From: Darren Edamura Probe function should exit immediately if pcie bridge detected Signed-off-by: Darren Edamura Signed-off-by: Rahul Gupta Signed-off-by: Scott Branden Signed-off-by: Ajit Khaparde --- lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c index 4cae4dd27..3fabbfc4d 100644 --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c @@ -473,6 +473,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) void *map_addr; int err; + if (pci_is_bridge(dev)) + return -ENODEV; What do you think printing a log here? I think it brings little value. ENODEV is already returned? User should not provide bridge address at first place, I guess this is a protection in case user provides bridge address by mistake. In that case no device will be probed and user won't have any idea why. I think a log in dmesg saying bridge device is provided may help to the user. I'll add a dev_warn as we actually encounter this issue on old silicon revisions due to bridge address equaling a PF. It's not a user error in such case and just needs to be ignored. So adding this generic check allows such to occur. For other use cases like you mentioned it would be a user mistake. + udev = kzalloc(sizeof(struct rte_uio_pci_dev), GFP_KERNEL); if (!udev) return -ENOMEM; Regards, Scott
Re: [dpdk-dev] [dpdk-stable] [PATCH] net/bnxt: fix an erorr with vnic_tpa_cfg command
On 2/28/2018 10:12 PM, Ajit Khaparde wrote: > When the vnic_tpa_cfg HWRM command is sent to the FW, > we are not passing the VNIC ID in case of disable. > This can cause the FW to return an error. > Correct VNIC ID needs to be passed for both enable and disable. Hi Ajit, Patch title doesn't tell what is actually fixed, after reading commit log, will you be agree on following: "net/bnxt: fix LRO disable" > > Fixes: 0958d8b6435d ("net/bnxt: support LRO") > Cc: sta...@dpdk.org > > Signed-off-by: Ajit Khaparde > --- > drivers/net/bnxt/bnxt_hwrm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c > index b7843afe6..05663fedd 100644 > --- a/drivers/net/bnxt/bnxt_hwrm.c > +++ b/drivers/net/bnxt/bnxt_hwrm.c > @@ -1517,12 +1517,12 @@ int bnxt_hwrm_vnic_tpa_cfg(struct bnxt *bp, > HWRM_VNIC_TPA_CFG_INPUT_FLAGS_GRO | > HWRM_VNIC_TPA_CFG_INPUT_FLAGS_AGG_WITH_ECN | > HWRM_VNIC_TPA_CFG_INPUT_FLAGS_AGG_WITH_SAME_GRE_SEQ); > - req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id); > req.max_agg_segs = rte_cpu_to_le_16(5); > req.max_aggs = > rte_cpu_to_le_16(HWRM_VNIC_TPA_CFG_INPUT_MAX_AGGS_MAX); > req.min_agg_len = rte_cpu_to_le_32(512); > } > + req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id); > > rc = bnxt_hwrm_send_message(bp, &req, sizeof(req)); > >
Re: [dpdk-dev] [dpdk-stable] [PATCH] net/bnxt: fix an erorr with vnic_tpa_cfg command
On Mon, Mar 26, 2018 at 1:20 PM, Ferruh Yigit wrote: > On 2/28/2018 10:12 PM, Ajit Khaparde wrote: > > When the vnic_tpa_cfg HWRM command is sent to the FW, > > we are not passing the VNIC ID in case of disable. > > This can cause the FW to return an error. > > Correct VNIC ID needs to be passed for both enable and disable. > > Hi Ajit, > > Patch title doesn't tell what is actually fixed, after reading commit log, > will > you be agree on following: > > "net/bnxt: fix LRO disable" > Yes, Ferruh. This is fine. Thanks > > > > > Fixes: 0958d8b6435d ("net/bnxt: support LRO") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Ajit Khaparde > > --- > > drivers/net/bnxt/bnxt_hwrm.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c > > index b7843afe6..05663fedd 100644 > > --- a/drivers/net/bnxt/bnxt_hwrm.c > > +++ b/drivers/net/bnxt/bnxt_hwrm.c > > @@ -1517,12 +1517,12 @@ int bnxt_hwrm_vnic_tpa_cfg(struct bnxt *bp, > > HWRM_VNIC_TPA_CFG_INPUT_FLAGS_GRO | > > HWRM_VNIC_TPA_CFG_INPUT_FLAGS_AGG_WITH_ECN > | > > HWRM_VNIC_TPA_CFG_INPUT_FLAGS_ > AGG_WITH_SAME_GRE_SEQ); > > - req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id); > > req.max_agg_segs = rte_cpu_to_le_16(5); > > req.max_aggs = > > rte_cpu_to_le_16(HWRM_VNIC_ > TPA_CFG_INPUT_MAX_AGGS_MAX); > > req.min_agg_len = rte_cpu_to_le_32(512); > > } > > + req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id); > > > > rc = bnxt_hwrm_send_message(bp, &req, sizeof(req)); > > > > > >