[dpdk-dev] [PATCH 0/8] net/mvpp2: add new features
This patch series introduces fixes and adds support for traffic metering and traffic manager. Natalie Samsonov (2): net/mvpp2: initialize ppio only once net/mvpp2: update MTU and MRU related calculations Tomasz Duszynski (5): net/mvpp2: move common code net/mvpp2: add metering support net/mvpp2: change default policer configuration net/mvpp2: add init and deinit to flow net/mvpp2: add traffic manager support Yuval Caduri (1): net/mvpp2: detach tx_qos from rx cls/qos config doc/guides/nics/mvpp2.rst | 31 +- drivers/net/mvpp2/Makefile |2 + drivers/net/mvpp2/meson.build |4 +- drivers/net/mvpp2/mrvl_ethdev.c | 188 ++-- drivers/net/mvpp2/mrvl_ethdev.h | 122 - drivers/net/mvpp2/mrvl_flow.c | 129 +++-- drivers/net/mvpp2/mrvl_flow.h | 15 + drivers/net/mvpp2/mrvl_mtr.c| 512 drivers/net/mvpp2/mrvl_mtr.h| 15 + drivers/net/mvpp2/mrvl_qos.c| 244 +- drivers/net/mvpp2/mrvl_qos.h|2 +- drivers/net/mvpp2/mrvl_tm.c | 1009 +++ drivers/net/mvpp2/mrvl_tm.h | 15 + 13 files changed, 2064 insertions(+), 224 deletions(-) create mode 100644 drivers/net/mvpp2/mrvl_flow.h create mode 100644 drivers/net/mvpp2/mrvl_mtr.c create mode 100644 drivers/net/mvpp2/mrvl_mtr.h create mode 100644 drivers/net/mvpp2/mrvl_tm.c create mode 100644 drivers/net/mvpp2/mrvl_tm.h -- 2.7.4
[dpdk-dev] [PATCH 5/8] net/mvpp2: add init and deinit to flow
Add init and deinit functionality to flow implementation. Init puts structures used by flow in a sane sate. Deinit deallocates all resources used by flow. Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov Reviewed-by: Liron Himi Reviewed-by: Shlomi Gridish --- drivers/net/mvpp2/mrvl_ethdev.c | 3 +++ drivers/net/mvpp2/mrvl_flow.c | 33 - drivers/net/mvpp2/mrvl_flow.h | 15 +++ 3 files changed, 50 insertions(+), 1 deletion(-) create mode 100644 drivers/net/mvpp2/mrvl_flow.h diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index 1464385..5e3a106 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -23,6 +23,7 @@ #include #include "mrvl_ethdev.h" #include "mrvl_qos.h" +#include "mrvl_flow.h" #include "mrvl_mtr.h" /* bitmask with reserved hifs */ @@ -628,6 +629,7 @@ mrvl_dev_start(struct rte_eth_dev *dev) goto out; } + mrvl_flow_init(dev); mrvl_mtr_init(dev); return 0; @@ -768,6 +770,7 @@ mrvl_dev_close(struct rte_eth_dev *dev) mrvl_flush_rx_queues(dev); mrvl_flush_tx_shadow_queues(dev); + mrvl_flow_deinit(dev); mrvl_mtr_deinit(dev); for (i = 0; i < priv->ppio_params.inqs_params.num_tcs; ++i) { diff --git a/drivers/net/mvpp2/mrvl_flow.c b/drivers/net/mvpp2/mrvl_flow.c index e6953e4..065b1aa 100644 --- a/drivers/net/mvpp2/mrvl_flow.c +++ b/drivers/net/mvpp2/mrvl_flow.c @@ -11,7 +11,7 @@ #include -#include "mrvl_ethdev.h" +#include "mrvl_flow.h" #include "mrvl_qos.h" /** Number of rules in the classifier table. */ @@ -2790,3 +2790,34 @@ const struct rte_flow_ops mrvl_flow_ops = { .flush = mrvl_flow_flush, .isolate = mrvl_flow_isolate }; + +/** + * Initialize flow resources. + * + * @param dev Pointer to the device. + */ +void +mrvl_flow_init(struct rte_eth_dev *dev) +{ + struct mrvl_priv *priv = dev->data->dev_private; + + LIST_INIT(&priv->flows); +} + +/** + * Cleanup flow resources. + * + * @param dev Pointer to the device. + */ +void +mrvl_flow_deinit(struct rte_eth_dev *dev) +{ + struct mrvl_priv *priv = dev->data->dev_private; + + mrvl_flow_flush(dev, NULL); + + if (priv->cls_tbl) { + pp2_cls_tbl_deinit(priv->cls_tbl); + priv->cls_tbl = NULL; + } +} diff --git a/drivers/net/mvpp2/mrvl_flow.h b/drivers/net/mvpp2/mrvl_flow.h new file mode 100644 index 000..f63747c --- /dev/null +++ b/drivers/net/mvpp2/mrvl_flow.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Marvell International Ltd. + * Copyright(c) 2018 Semihalf. + * All rights reserved. + */ + +#ifndef _MRVL_FLOW_H_ +#define _MRVL_FLOW_H_ + +#include "mrvl_ethdev.h" + +void mrvl_flow_init(struct rte_eth_dev *dev); +void mrvl_flow_deinit(struct rte_eth_dev *dev); + +#endif /* _MRVL_FLOW_H_ */ -- 2.7.4
[dpdk-dev] [PATCH 4/8] net/mvpp2: change default policer configuration
Change QoS configuration file syntax for port's default policer setup. Since default policer configuration is performed before any other policer configuration we can pick a default id. This simplifies default policer configuration since user no longer has to choose ids from range [0, PP2_CLS_PLCR_NUM]. Explicitly document values for rate_limit_enable field. Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov Reviewed-by: Liron Himi --- doc/guides/nics/mvpp2.rst | 31 --- drivers/net/mvpp2/mrvl_ethdev.c | 6 +- drivers/net/mvpp2/mrvl_ethdev.h | 2 +- drivers/net/mvpp2/mrvl_qos.c| 198 ++-- drivers/net/mvpp2/mrvl_qos.h| 2 +- 5 files changed, 134 insertions(+), 105 deletions(-) diff --git a/doc/guides/nics/mvpp2.rst b/doc/guides/nics/mvpp2.rst index 0408752..a452c8a 100644 --- a/doc/guides/nics/mvpp2.rst +++ b/doc/guides/nics/mvpp2.rst @@ -152,20 +152,23 @@ Configuration syntax .. code-block:: console - [port default] - default_tc = - mapping_priority = - policer_enable = + [policer ] token_unit = color = cir = ebs = cbs = + [port default] + default_tc = + mapping_priority = + rate_limit_enable = rate_limit = burst_size = + default_policer = + [port tc ] rxq = pcp = @@ -201,7 +204,9 @@ Where: - : List of DSCP values to handle in particular TC (e.g. 0-12 32-48 63). -- : Enable ingress policer. +- : Id of the policer configuration section to be used as default. + +- : Id of the policer configuration section (0..31). - : Policer token unit (`bytes` or `packets`). @@ -215,7 +220,7 @@ Where: - : Default color for specific tc. -- : Enables per port or per txq rate limiting. +- : Enables per port or per txq rate limiting (`0`/`1` to disable/enable). - : Committed information rate, in kilo bits per second. @@ -234,6 +239,13 @@ Configuration file example .. code-block:: console + [policer 0] + token_unit = bytes + color = blind + cir = 10 + ebs = 64 + cbs = 64 + [port 0 default] default_tc = 0 mapping_priority = ip @@ -265,12 +277,7 @@ Configuration file example default_tc = 0 mapping_priority = vlan/ip - policer_enable = 1 - token_unit = bytes - color = blind - cir = 10 - ebs = 64 - cbs = 64 + default_policer = 0 [port 1 tc 0] rxq = 0 diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index a4951d3..1464385 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -798,9 +798,9 @@ mrvl_dev_close(struct rte_eth_dev *dev) } /* policer must be released after ppio deinitialization */ - if (priv->policer) { - pp2_cls_plcr_deinit(priv->policer); - priv->policer = NULL; + if (priv->default_policer) { + pp2_cls_plcr_deinit(priv->default_policer); + priv->default_policer = NULL; } } diff --git a/drivers/net/mvpp2/mrvl_ethdev.h b/drivers/net/mvpp2/mrvl_ethdev.h index ecb8fdc..de423a9 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.h +++ b/drivers/net/mvpp2/mrvl_ethdev.h @@ -168,7 +168,7 @@ struct mrvl_priv { uint32_t cls_tbl_pattern; LIST_HEAD(mrvl_flows, rte_flow) flows; - struct pp2_cls_plcr *policer; + struct pp2_cls_plcr *default_policer; LIST_HEAD(profiles, mrvl_mtr_profile) profiles; LIST_HEAD(mtrs, mrvl_mtr) mtrs; diff --git a/drivers/net/mvpp2/mrvl_qos.c b/drivers/net/mvpp2/mrvl_qos.c index eeb46f8..e039635 100644 --- a/drivers/net/mvpp2/mrvl_qos.c +++ b/drivers/net/mvpp2/mrvl_qos.c @@ -42,7 +42,8 @@ #define MRVL_TOK_WRR_WEIGHT "wrr_weight" /* policer specific configuration tokens */ -#define MRVL_TOK_PLCR_ENABLE "policer_enable" +#define MRVL_TOK_PLCR "policer" +#define MRVL_TOK_PLCR_DEFAULT "default_policer" #define MRVL_TOK_PLCR_UNIT "token_unit" #define MRVL_TOK_PLCR_UNIT_BYTES "bytes" #define MRVL_TOK_PLCR_UNIT_PACKETS "packets" @@ -368,6 +369,9 @@ parse_tc_cfg(struct rte_cfgfile *file, int port, int tc, cfg->port[port].tc[tc].dscps = n; } + if (!cfg->port[port].setup_policer) + return 0; + entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_PLCR_DEFAULT_COLOR); if (entry) { @@ -390,6 +394,85 @@ parse_tc_cfg(struct rte_cfgfile *file, int port, int tc, } /** + * Parse default port policer. + * + * @param file Config file handle. + * @param sec_name Section name with policer configuration + * @param port Port number. + * @param cfg[out] Parsing results. + * @returns 0 in case of success, negative value otherwise. + */ +static int +parse_policer(struct rte_cfgfile *file, int port, const char *sec_name, + struct mrvl_qos_cfg *cfg) +{ + const char *entry; + uint32_t val; + + /* Read policer token uni
[dpdk-dev] [PATCH 2/8] net/mvpp2: move common code
Cleanup sources by moving common code to the pmd header file. Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov Reviewed-by: Liron Himi --- drivers/net/mvpp2/mrvl_ethdev.c | 9 - drivers/net/mvpp2/mrvl_ethdev.h | 11 +++ drivers/net/mvpp2/mrvl_flow.c | 5 - drivers/net/mvpp2/mrvl_qos.c| 9 - 4 files changed, 11 insertions(+), 23 deletions(-) diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index f022cad..adb07d0 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -10,15 +10,6 @@ #include #include -/* Unluckily, container_of is defined by both DPDK and MUSDK, - * we'll declare only one version. - * - * Note that it is not used in this PMD anyway. - */ -#ifdef container_of -#undef container_of -#endif - #include #include #include diff --git a/drivers/net/mvpp2/mrvl_ethdev.h b/drivers/net/mvpp2/mrvl_ethdev.h index 3726f78..2204be2 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.h +++ b/drivers/net/mvpp2/mrvl_ethdev.h @@ -10,12 +10,23 @@ #include #include +/* + * container_of is defined by both DPDK and MUSDK, + * we'll declare only one version. + * + * Note that it is not used in this PMD anyway. + */ +#ifdef container_of +#undef container_of +#endif + #include #include #include #include #include #include +#include "env/mv_common.h" /* for BIT() */ /** Maximum number of rx queues per port */ #define MRVL_PP2_RXQ_MAX 32 diff --git a/drivers/net/mvpp2/mrvl_flow.c b/drivers/net/mvpp2/mrvl_flow.c index 13295e6..db750f4 100644 --- a/drivers/net/mvpp2/mrvl_flow.c +++ b/drivers/net/mvpp2/mrvl_flow.c @@ -11,13 +11,8 @@ #include -#ifdef container_of -#undef container_of -#endif - #include "mrvl_ethdev.h" #include "mrvl_qos.h" -#include "env/mv_common.h" /* for BIT() */ /** Number of rules in the classifier table. */ #define MRVL_CLS_MAX_NUM_RULES 20 diff --git a/drivers/net/mvpp2/mrvl_qos.c b/drivers/net/mvpp2/mrvl_qos.c index 71856c1..eeb46f8 100644 --- a/drivers/net/mvpp2/mrvl_qos.c +++ b/drivers/net/mvpp2/mrvl_qos.c @@ -15,15 +15,6 @@ #include #include -/* Unluckily, container_of is defined by both DPDK and MUSDK, - * we'll declare only one version. - * - * Note that it is not used in this PMD anyway. - */ -#ifdef container_of -#undef container_of -#endif - #include "mrvl_qos.h" /* Parsing tokens. Defined conveniently, so that any correction is easy. */ -- 2.7.4
[dpdk-dev] [PATCH 1/8] net/mvpp2: initialize ppio only once
From: Natalie Samsonov This changes stop/start/configure behavior due to issue in MUSDK library itself. From now on, ppio can be reconfigured only after interface is closed. Signed-off-by: Natalie Samsonov Reviewed-by: Yuval Caduri --- drivers/net/mvpp2/mrvl_ethdev.c | 53 + 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index 6824445..f022cad 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -304,6 +304,11 @@ mrvl_dev_configure(struct rte_eth_dev *dev) struct mrvl_priv *priv = dev->data->dev_private; int ret; + if (priv->ppio) { + MRVL_LOG(INFO, "Device reconfiguration is not supported"); + return -EINVAL; + } + if (dev->data->dev_conf.rxmode.mq_mode != ETH_MQ_RX_NONE && dev->data->dev_conf.rxmode.mq_mode != ETH_MQ_RX_RSS) { MRVL_LOG(INFO, "Unsupported rx multi queue mode %d", @@ -525,6 +530,9 @@ mrvl_dev_start(struct rte_eth_dev *dev) char match[MRVL_MATCH_LEN]; int ret = 0, i, def_init_size; + if (priv->ppio) + return mrvl_dev_set_link_up(dev); + snprintf(match, sizeof(match), "ppio-%d:%d", priv->pp_id, priv->ppio_id); priv->ppio_params.match = match; @@ -749,28 +757,7 @@ mrvl_flush_bpool(struct rte_eth_dev *dev) static void mrvl_dev_stop(struct rte_eth_dev *dev) { - struct mrvl_priv *priv = dev->data->dev_private; - mrvl_dev_set_link_down(dev); - mrvl_flush_rx_queues(dev); - mrvl_flush_tx_shadow_queues(dev); - if (priv->cls_tbl) { - pp2_cls_tbl_deinit(priv->cls_tbl); - priv->cls_tbl = NULL; - } - if (priv->qos_tbl) { - pp2_cls_qos_tbl_deinit(priv->qos_tbl); - priv->qos_tbl = NULL; - } - if (priv->ppio) - pp2_ppio_deinit(priv->ppio); - priv->ppio = NULL; - - /* policer must be released after ppio deinitialization */ - if (priv->policer) { - pp2_cls_plcr_deinit(priv->policer); - priv->policer = NULL; - } } /** @@ -785,6 +772,9 @@ mrvl_dev_close(struct rte_eth_dev *dev) struct mrvl_priv *priv = dev->data->dev_private; size_t i; + mrvl_flush_rx_queues(dev); + mrvl_flush_tx_shadow_queues(dev); + for (i = 0; i < priv->ppio_params.inqs_params.num_tcs; ++i) { struct pp2_ppio_tc_params *tc_params = &priv->ppio_params.inqs_params.tcs_params[i]; @@ -795,7 +785,28 @@ mrvl_dev_close(struct rte_eth_dev *dev) } } + if (priv->cls_tbl) { + pp2_cls_tbl_deinit(priv->cls_tbl); + priv->cls_tbl = NULL; + } + + if (priv->qos_tbl) { + pp2_cls_qos_tbl_deinit(priv->qos_tbl); + priv->qos_tbl = NULL; + } + mrvl_flush_bpool(dev); + + if (priv->ppio) { + pp2_ppio_deinit(priv->ppio); + priv->ppio = NULL; + } + + /* policer must be released after ppio deinitialization */ + if (priv->policer) { + pp2_cls_plcr_deinit(priv->policer); + priv->policer = NULL; + } } /** -- 2.7.4
[dpdk-dev] [PATCH 3/8] net/mvpp2: add metering support
Add support for configuring plcr via DPDK generic metering API. Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov Reviewed-by: Liron Himi --- drivers/net/mvpp2/Makefile | 1 + drivers/net/mvpp2/meson.build | 3 +- drivers/net/mvpp2/mrvl_ethdev.c | 24 ++ drivers/net/mvpp2/mrvl_ethdev.h | 71 ++ drivers/net/mvpp2/mrvl_flow.c | 91 +++ drivers/net/mvpp2/mrvl_mtr.c| 512 drivers/net/mvpp2/mrvl_mtr.h| 15 ++ 7 files changed, 673 insertions(+), 44 deletions(-) create mode 100644 drivers/net/mvpp2/mrvl_mtr.c create mode 100644 drivers/net/mvpp2/mrvl_mtr.h diff --git a/drivers/net/mvpp2/Makefile b/drivers/net/mvpp2/Makefile index 211d398..4848d65 100644 --- a/drivers/net/mvpp2/Makefile +++ b/drivers/net/mvpp2/Makefile @@ -39,5 +39,6 @@ LDLIBS += -lrte_bus_vdev -lrte_common_mvep SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_ethdev.c SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_qos.c SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_flow.c +SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_mtr.c include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/mvpp2/meson.build b/drivers/net/mvpp2/meson.build index 3620659..f475511 100644 --- a/drivers/net/mvpp2/meson.build +++ b/drivers/net/mvpp2/meson.build @@ -19,7 +19,8 @@ endif sources = files( 'mrvl_ethdev.c', 'mrvl_flow.c', - 'mrvl_qos.c' + 'mrvl_qos.c', + 'mrvl_mtr.c' ) deps += ['cfgfile', 'common_mvep'] diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index adb07d0..a4951d3 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -23,6 +23,7 @@ #include #include "mrvl_ethdev.h" #include "mrvl_qos.h" +#include "mrvl_mtr.h" /* bitmask with reserved hifs */ #define MRVL_MUSDK_HIFS_RESERVED 0x0F @@ -627,6 +628,8 @@ mrvl_dev_start(struct rte_eth_dev *dev) goto out; } + mrvl_mtr_init(dev); + return 0; out: MRVL_LOG(ERR, "Failed to start device"); @@ -765,6 +768,7 @@ mrvl_dev_close(struct rte_eth_dev *dev) mrvl_flush_rx_queues(dev); mrvl_flush_tx_shadow_queues(dev); + mrvl_mtr_deinit(dev); for (i = 0; i < priv->ppio_params.inqs_params.num_tcs; ++i) { struct pp2_ppio_tc_params *tc_params = @@ -1868,6 +1872,25 @@ mrvl_eth_filter_ctrl(struct rte_eth_dev *dev __rte_unused, } } +/** + * DPDK callback to get rte_mtr callbacks. + * + * @param dev + * Pointer to the device structure. + * @param ops + * Pointer to pass the mtr ops. + * + * @return + * Always 0. + */ +static int +mrvl_mtr_ops_get(struct rte_eth_dev *dev __rte_unused, void *ops) +{ + *(const void **)ops = &mrvl_mtr_ops; + + return 0; +} + static const struct eth_dev_ops mrvl_ops = { .dev_configure = mrvl_dev_configure, .dev_start = mrvl_dev_start, @@ -1905,6 +1928,7 @@ static const struct eth_dev_ops mrvl_ops = { .rss_hash_update = mrvl_rss_hash_update, .rss_hash_conf_get = mrvl_rss_hash_conf_get, .filter_ctrl = mrvl_eth_filter_ctrl, + .mtr_ops_get = mrvl_mtr_ops_get, }; /** diff --git a/drivers/net/mvpp2/mrvl_ethdev.h b/drivers/net/mvpp2/mrvl_ethdev.h index 2204be2..ecb8fdc 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.h +++ b/drivers/net/mvpp2/mrvl_ethdev.h @@ -9,6 +9,7 @@ #include #include +#include /* * container_of is defined by both DPDK and MUSDK, @@ -70,6 +71,69 @@ /** Minimum number of sent buffers to release from shadow queue to BM */ #define MRVL_PP2_BUF_RELEASE_BURST_SIZE64 +/** Maximum length of a match string */ +#define MRVL_MATCH_LEN 16 + +/** Parsed fields in processed rte_flow_item. */ +enum mrvl_parsed_fields { + /* eth flags */ + F_DMAC = BIT(0), + F_SMAC = BIT(1), + F_TYPE = BIT(2), + /* vlan flags */ + F_VLAN_PRI = BIT(3), + F_VLAN_ID = BIT(4), + F_VLAN_TCI = BIT(5), /* not supported by MUSDK yet */ + /* ip4 flags */ + F_IP4_TOS = BIT(6), + F_IP4_SIP = BIT(7), + F_IP4_DIP = BIT(8), + F_IP4_PROTO =BIT(9), + /* ip6 flags */ + F_IP6_TC = BIT(10), /* not supported by MUSDK yet */ + F_IP6_SIP = BIT(11), + F_IP6_DIP = BIT(12), + F_IP6_FLOW = BIT(13), + F_IP6_NEXT_HDR = BIT(14), + /* tcp flags */ + F_TCP_SPORT =BIT(15), + F_TCP_DPORT =BIT(16), + /* udp flags */ + F_UDP_SPORT =BIT(17), + F_UDP_DPORT =BIT(18), +}; + +/** PMD-specific definition of a flow rule handle. */ +struct mrvl_mtr; +struct rte_flow { + LIST_ENTRY(rte_flow) next; + struct mrvl_mtr *mtr; + + enum mrvl_parsed_fields pattern; + + struct pp2_cls_tbl_rule rule; + struct pp2_cls_cos_desc cos; + struct pp2_cls_tbl_action action; +}; + +struct mrvl_mtr_profile { + L
[dpdk-dev] [PATCH 6/8] net/mvpp2: add traffic manager support
Add traffic manager support. Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov Reviewed-by: Liron Himi --- drivers/net/mvpp2/Makefile |1 + drivers/net/mvpp2/meson.build |3 +- drivers/net/mvpp2/mrvl_ethdev.c | 26 + drivers/net/mvpp2/mrvl_ethdev.h | 31 ++ drivers/net/mvpp2/mrvl_tm.c | 1009 +++ drivers/net/mvpp2/mrvl_tm.h | 15 + 6 files changed, 1084 insertions(+), 1 deletion(-) create mode 100644 drivers/net/mvpp2/mrvl_tm.c create mode 100644 drivers/net/mvpp2/mrvl_tm.h diff --git a/drivers/net/mvpp2/Makefile b/drivers/net/mvpp2/Makefile index 4848d65..661d2cd 100644 --- a/drivers/net/mvpp2/Makefile +++ b/drivers/net/mvpp2/Makefile @@ -40,5 +40,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_ethdev.c SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_qos.c SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_flow.c SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_mtr.c +SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_tm.c include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/mvpp2/meson.build b/drivers/net/mvpp2/meson.build index f475511..70ef2d6 100644 --- a/drivers/net/mvpp2/meson.build +++ b/drivers/net/mvpp2/meson.build @@ -20,7 +20,8 @@ sources = files( 'mrvl_ethdev.c', 'mrvl_flow.c', 'mrvl_qos.c', - 'mrvl_mtr.c' + 'mrvl_mtr.c', + 'mrvl_tm.c' ) deps += ['cfgfile', 'common_mvep'] diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index 5e3a106..a1dc6b1 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -25,6 +25,7 @@ #include "mrvl_qos.h" #include "mrvl_flow.h" #include "mrvl_mtr.h" +#include "mrvl_tm.h" /* bitmask with reserved hifs */ #define MRVL_MUSDK_HIFS_RESERVED 0x0F @@ -340,6 +341,10 @@ mrvl_dev_configure(struct rte_eth_dev *dev) priv->ppio_params.maintain_stats = 1; priv->nb_rx_queues = dev->data->nb_rx_queues; + ret = mrvl_tm_init(dev); + if (ret < 0) + return ret; + if (dev->data->nb_rx_queues == 1 && dev->data->dev_conf.rxmode.mq_mode == ETH_MQ_RX_RSS) { MRVL_LOG(WARNING, "Disabling hash for 1 rx queue"); @@ -794,6 +799,7 @@ mrvl_dev_close(struct rte_eth_dev *dev) } mrvl_flush_bpool(dev); + mrvl_tm_deinit(dev); if (priv->ppio) { pp2_ppio_deinit(priv->ppio); @@ -1894,6 +1900,25 @@ mrvl_mtr_ops_get(struct rte_eth_dev *dev __rte_unused, void *ops) return 0; } +/** + * DPDK callback to get rte_tm callbacks. + * + * @param dev + * Pointer to the device structure. + * @param ops + * Pointer to pass the tm ops. + * + * @return + * Always 0. + */ +static int +mrvl_tm_ops_get(struct rte_eth_dev *dev __rte_unused, void *ops) +{ + *(const void **)ops = &mrvl_tm_ops; + + return 0; +} + static const struct eth_dev_ops mrvl_ops = { .dev_configure = mrvl_dev_configure, .dev_start = mrvl_dev_start, @@ -1932,6 +1957,7 @@ static const struct eth_dev_ops mrvl_ops = { .rss_hash_conf_get = mrvl_rss_hash_conf_get, .filter_ctrl = mrvl_eth_filter_ctrl, .mtr_ops_get = mrvl_mtr_ops_get, + .tm_ops_get = mrvl_tm_ops_get, }; /** diff --git a/drivers/net/mvpp2/mrvl_ethdev.h b/drivers/net/mvpp2/mrvl_ethdev.h index de423a9..984f31e 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.h +++ b/drivers/net/mvpp2/mrvl_ethdev.h @@ -10,6 +10,7 @@ #include #include #include +#include /* * container_of is defined by both DPDK and MUSDK, @@ -134,6 +135,29 @@ struct mrvl_mtr { struct pp2_cls_plcr *plcr; }; +struct mrvl_tm_shaper_profile { + LIST_ENTRY(mrvl_tm_shaper_profile) next; + uint32_t id; + int refcnt; + struct rte_tm_shaper_params params; +}; + +enum { + MRVL_NODE_PORT, + MRVL_NODE_QUEUE, +}; + +struct mrvl_tm_node { + LIST_ENTRY(mrvl_tm_node) next; + uint32_t id; + uint32_t type; + int refcnt; + struct mrvl_tm_node *parent; + struct mrvl_tm_shaper_profile *profile; + uint8_t weight; + uint64_t stats_mask; +}; + struct mrvl_priv { /* Hot fields, used in fast path. */ struct pp2_bpool *bpool; /**< BPool pointer */ @@ -173,6 +197,10 @@ struct mrvl_priv { LIST_HEAD(profiles, mrvl_mtr_profile) profiles; LIST_HEAD(mtrs, mrvl_mtr) mtrs; uint32_t used_plcrs; + + LIST_HEAD(shaper_profiles, mrvl_tm_shaper_profile) shaper_profiles; + LIST_HEAD(nodes, mrvl_tm_node) nodes; + uint64_t rate_max; }; /** Flow operations forward declaration. */ @@ -181,6 +209,9 @@ extern const struct rte_flow_ops mrvl_flow_ops; /** Meter operations forward declaration. */ extern const struct rte_mtr_ops mrvl_mtr_ops; +/** Traffic manager operations forward declaration. */ +extern const struct rte_tm_ops mrvl_tm_ops; + /** Current log type. */ extern int mrvl_logtype; diff --git a/drivers/net/mvp
[dpdk-dev] [PATCH 8/8] net/mvpp2: update MTU and MRU related calculations
From: Natalie Samsonov This commit updates MTU and MRU related calculations. Signed-off-by: Natalie Samsonov Reviewed-by: Yelena Krivosheev Reviewed-by: Dmitri Epshtein --- drivers/net/mvpp2/mrvl_ethdev.c | 70 +++-- drivers/net/mvpp2/mrvl_ethdev.h | 7 + 2 files changed, 61 insertions(+), 16 deletions(-) diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index 5643e7d..035ee81 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -325,7 +325,7 @@ mrvl_dev_configure(struct rte_eth_dev *dev) if (dev->data->dev_conf.rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len - -ETHER_HDR_LEN - ETHER_CRC_LEN; +MRVL_PP2_ETH_HDRS_LEN; ret = mrvl_configure_rxqs(priv, dev->data->port_id, dev->data->nb_rx_queues); @@ -375,21 +375,55 @@ static int mrvl_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) { struct mrvl_priv *priv = dev->data->dev_private; - /* extra MV_MH_SIZE bytes are required for Marvell tag */ - uint16_t mru = mtu + MV_MH_SIZE + ETHER_HDR_LEN + ETHER_CRC_LEN; + uint16_t mru; + uint16_t mbuf_data_size = 0; /* SW buffer size */ int ret; - if (mtu < ETHER_MIN_MTU || mru > MRVL_PKT_SIZE_MAX) + mru = MRVL_PP2_MTU_TO_MRU(mtu); + /* +* min_rx_buf_size is equal to mbuf data size +* if pmd didn't set it differently +*/ + mbuf_data_size = dev->data->min_rx_buf_size - RTE_PKTMBUF_HEADROOM; + /* Prevent PMD from: +* - setting mru greater than the mbuf size resulting in +* hw and sw buffer size mismatch +* - setting mtu that requires the support of scattered packets +* when this feature has not been enabled/supported so far +* (TODO check scattered_rx flag here once scattered RX is supported). +*/ + if (mru + MRVL_PKT_OFFS > mbuf_data_size) { + mru = mbuf_data_size - MRVL_PKT_OFFS; + mtu = MRVL_PP2_MRU_TO_MTU(mru); + MRVL_LOG(WARNING, "MTU too big, max MTU possible limitted " + "by current mbuf size: %u. Set MTU to %u, MRU to %u", + mbuf_data_size, mtu, mru); + } + + if (mtu < ETHER_MIN_MTU || mru > MRVL_PKT_SIZE_MAX) { + MRVL_LOG(ERR, "Invalid MTU [%u] or MRU [%u]", mtu, mru); return -EINVAL; + } + + dev->data->mtu = mtu; + dev->data->dev_conf.rxmode.max_rx_pkt_len = mru - MV_MH_SIZE; if (!priv->ppio) return 0; ret = pp2_ppio_set_mru(priv->ppio, mru); - if (ret) + if (ret) { + MRVL_LOG(ERR, "Failed to change MRU"); return ret; + } + + ret = pp2_ppio_set_mtu(priv->ppio, mtu); + if (ret) { + MRVL_LOG(ERR, "Failed to change MTU"); + return ret; + } - return pp2_ppio_set_mtu(priv->ppio, mtu); + return 0; } /** @@ -600,6 +634,9 @@ mrvl_dev_start(struct rte_eth_dev *dev) } priv->vlan_flushed = 1; } + ret = mrvl_mtu_set(dev, dev->data->mtu); + if (ret) + MRVL_LOG(ERR, "Failed to set MTU to %d", dev->data->mtu); /* For default QoS config, don't start classifier. */ if (mrvl_qos_cfg && @@ -1552,8 +1589,8 @@ mrvl_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, { struct mrvl_priv *priv = dev->data->dev_private; struct mrvl_rxq *rxq; - uint32_t min_size, -max_rx_pkt_len = dev->data->dev_conf.rxmode.max_rx_pkt_len; + uint32_t frame_size, buf_size = rte_pktmbuf_data_room_size(mp); + uint32_t max_rx_pkt_len = dev->data->dev_conf.rxmode.max_rx_pkt_len; int ret, tc, inq; uint64_t offloads; @@ -1568,15 +1605,16 @@ mrvl_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, return -EFAULT; } - min_size = rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM - - MRVL_PKT_EFFEC_OFFS; - if (min_size < max_rx_pkt_len) { - MRVL_LOG(ERR, - "Mbuf size must be increased to %u bytes to hold up to %u bytes of data.", - max_rx_pkt_len + RTE_PKTMBUF_HEADROOM + - MRVL_PKT_EFFEC_OFFS, + frame_size = buf_size - RTE_PKTMBUF_HEADROOM - MRVL_PKT_EFFEC_OFFS; + if (frame_size < max_rx_pkt_len) { + MRVL_LOG(WARNING, + "Mbuf size must be increased to %u bytes to hold up " + "to %u bytes of data.", + buf_size + max_rx_pkt_len - frame_size, max_rx_pkt_len); - return -EINVAL; +
[dpdk-dev] [PATCH 7/8] net/mvpp2: detach tx_qos from rx cls/qos config
From: Yuval Caduri Functional change: Open receive cls/qos related features, only if the config file contains an rx_related configuration entry. This allows to configure tx_related entries, w/o unintentionally opening rx cls/qos. Code: 'use_global_defaults' is by default set to '1'. Only if an rx_related entry was configured, it is updated to '0'. rx cls/qos is performed only if 'use_global_defaults' is '0'. Default TC configuration is now only mandatory when 'use_global_defaults' is '0'. Signed-off-by: Yuval Caduri Reviewed-by: Natalie Samsonov Tested-by: Natalie Samsonov --- drivers/net/mvpp2/mrvl_ethdev.c | 3 ++- drivers/net/mvpp2/mrvl_qos.c| 41 +++-- 2 files changed, 25 insertions(+), 19 deletions(-) diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index a1dc6b1..5643e7d 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -602,7 +602,8 @@ mrvl_dev_start(struct rte_eth_dev *dev) } /* For default QoS config, don't start classifier. */ - if (mrvl_qos_cfg) { + if (mrvl_qos_cfg && + mrvl_qos_cfg->port[dev->data->port_id].use_global_defaults == 0) { ret = mrvl_start_qos_mapping(priv); if (ret) { MRVL_LOG(ERR, "Failed to setup QoS mapping"); diff --git a/drivers/net/mvpp2/mrvl_qos.c b/drivers/net/mvpp2/mrvl_qos.c index e039635..5d80c3e 100644 --- a/drivers/net/mvpp2/mrvl_qos.c +++ b/drivers/net/mvpp2/mrvl_qos.c @@ -324,6 +324,7 @@ parse_tc_cfg(struct rte_cfgfile *file, int port, int tc, if (rte_cfgfile_num_sections(file, sec_name, strlen(sec_name)) <= 0) return 0; + cfg->port[port].use_global_defaults = 0; entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_RXQ); if (entry) { n = get_entry_values(entry, @@ -421,7 +422,7 @@ parse_policer(struct rte_cfgfile *file, int port, const char *sec_name, cfg->port[port].policer_params.token_unit = PP2_CLS_PLCR_PACKETS_TOKEN_UNIT; } else { - RTE_LOG(ERR, PMD, "Unknown token: %s\n", entry); + MRVL_LOG(ERR, "Unknown token: %s", entry); return -1; } } @@ -438,7 +439,7 @@ parse_policer(struct rte_cfgfile *file, int port, const char *sec_name, cfg->port[port].policer_params.color_mode = PP2_CLS_PLCR_COLOR_AWARE_MODE; } else { - RTE_LOG(ERR, PMD, "Error in parsing: %s\n", entry); + MRVL_LOG(ERR, "Error in parsing: %s", entry); return -1; } } @@ -518,28 +519,15 @@ mrvl_get_qoscfg(const char *key __rte_unused, const char *path, snprintf(sec_name, sizeof(sec_name), "%s %d %s", MRVL_TOK_PORT, n, MRVL_TOK_DEFAULT); + /* Use global defaults, unless an override occurs */ + (*cfg)->port[n].use_global_defaults = 1; + /* Skip ports non-existing in configuration. */ if (rte_cfgfile_num_sections(file, sec_name, strlen(sec_name)) <= 0) { - (*cfg)->port[n].use_global_defaults = 1; - (*cfg)->port[n].mapping_priority = - PP2_CLS_QOS_TBL_VLAN_IP_PRI; continue; } - entry = rte_cfgfile_get_entry(file, sec_name, - MRVL_TOK_DEFAULT_TC); - if (entry) { - if (get_val_securely(entry, &val) < 0 || - val > USHRT_MAX) - return -1; - (*cfg)->port[n].default_tc = (uint8_t)val; - } else { - MRVL_LOG(ERR, - "Default Traffic Class required in custom configuration!"); - return -1; - } - /* * Read per-port rate limiting. Setting that will * disable per-queue rate limiting. @@ -573,6 +561,7 @@ mrvl_get_qoscfg(const char *key __rte_unused, const char *path, entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_MAPPING_PRIORITY); if (entry) { + (*cfg)->port[n].use_global_defaults = 0; if (!strncmp(entry, MRVL_TOK_VLAN_IP, sizeof(MRVL_TOK_VLAN_IP))) (*cfg)->port[n].mapping_priority = @@ -602,6 +591,7 @@ mrvl_get_qoscfg(const char *key __rte_unused, const char *path, entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_PLCR_DEFAULT);
Re: [dpdk-dev] [PATCH v1 4/7] examples/power: add host channel to power manager
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of David Hunt > Sent: Thursday, August 30, 2018 6:54 PM > To: dev@dpdk.org > Cc: Mcnamara, John ; Hunt, David > > Subject: [dpdk-dev] [PATCH v1 4/7] examples/power: add host channel to > power manager > > This patch adds a fifo channel to the vm_power_manager app through which > we can send commands and polices. Intended for sending JSON strings. > The fifo is at /tmp/powermonitor/fifo.0 > > Signed-off-by: David Hunt > --- > examples/vm_power_manager/channel_manager.c | 108 > +++ > examples/vm_power_manager/channel_manager.h | 17 ++- > examples/vm_power_manager/channel_monitor.c | 146 > +++- > examples/vm_power_manager/main.c| 2 + > 4 files changed, 238 insertions(+), 35 deletions(-) > > diff --git a/examples/vm_power_manager/channel_manager.c > b/examples/vm_power_manager/channel_manager.c > index 2bb8641d3..bcd106be1 100644 > --- a/examples/vm_power_manager/channel_manager.c > +++ b/examples/vm_power_manager/channel_manager.c > @@ -13,6 +13,7 @@ > > #include > #include > +#include > #include > #include > > @@ -284,6 +285,38 @@ open_non_blocking_channel(struct channel_info > *info) > return 0; > } > > +static int > +open_host_channel(struct channel_info *info) > +{ > + int flags; > + > + info->fd = open(info->channel_path, O_RDWR | O_RSYNC); > + if (info->fd == -1) { > + RTE_LOG(ERR, CHANNEL_MANAGER, "Error(%s) opening fifo > for '%s'\n", > + strerror(errno), > + info->channel_path); > + return -1; > + } > + > + /* Get current flags */ > + flags = fcntl(info->fd, F_GETFL, 0); > + if (flags < 0) { > + RTE_LOG(WARNING, CHANNEL_MANAGER, "Error(%s) fcntl > get flags socket for" > + "'%s'\n", strerror(errno), info- > >channel_path); > + return 1; > + } > + /* Set to Non Blocking */ > + flags |= O_NONBLOCK; > + if (fcntl(info->fd, F_SETFL, flags) < 0) { > + RTE_LOG(WARNING, CHANNEL_MANAGER, > + "Error(%s) setting non-blocking " > + "socket for '%s'\n", > + strerror(errno), info->channel_path); > + return -1; > + } > + return 0; > +} > + > static int > setup_channel_info(struct virtual_machine_info **vm_info_dptr, > struct channel_info **chan_info_dptr, unsigned > channel_num) > @@ -294,6 +327,7 @@ setup_channel_info(struct virtual_machine_info > **vm_info_dptr, > chan_info->channel_num = channel_num; > chan_info->priv_info = (void *)vm_info; > chan_info->status = CHANNEL_MGR_CHANNEL_DISCONNECTED; > + chan_info->type = CHANNEL_TYPE_BINARY; > if (open_non_blocking_channel(chan_info) < 0) { > RTE_LOG(ERR, CHANNEL_MANAGER, "Could not open > channel: " > "'%s' for VM '%s'\n", > @@ -316,6 +350,35 @@ setup_channel_info(struct virtual_machine_info > **vm_info_dptr, > return 0; > } > > +static int > +setup_host_channel_info(struct channel_info **chan_info_dptr, > + unsigned int channel_num) > +{ > + struct channel_info *chan_info = *chan_info_dptr; > + > + chan_info->channel_num = channel_num; > + chan_info->priv_info = (void *)0; > + chan_info->status = CHANNEL_MGR_CHANNEL_DISCONNECTED; > + chan_info->type = CHANNEL_TYPE_JSON; > + sprintf(chan_info->channel_path, "%sfifo.0", > CHANNEL_MGR_SOCKET_PATH); > + > + if (open_host_channel(chan_info) < 0) { > + RTE_LOG(ERR, CHANNEL_MANAGER, "Could not open host > channel: " > + "'%s'\n", > + chan_info->channel_path); > + return -1; > + } > + if (add_channel_to_monitor(&chan_info) < 0) { > + RTE_LOG(ERR, CHANNEL_MANAGER, "Could add channel: " > + "'%s' to epoll ctl\n", > + chan_info->channel_path); > + return -1; > + > + } > + chan_info->status = CHANNEL_MGR_CHANNEL_CONNECTED; > + return 0; > +} > + > int > add_all_channels(const char *vm_name) > { > @@ -470,6 +533,51 @@ add_channels(const char *vm_name, unsigned > *channel_list, > return num_channels_enabled; > } > > +int > +add_host_channel(void) > +{ > + struct channel_info *chan_info; > + char socket_path[PATH_MAX]; > + int num_channels_enabled = 0; > + int ret; > + > + snprintf(socket_path, sizeof(socket_path), "%sfifo.%u", > + CHANNEL_MGR_SOCKET_PATH, 0); > + > + errno = 0; > + ret = mkfifo(socket_path, 0666); > + if ((errno != EEXIST) && (ret < 0)) { > + printf(" %d %d, %d\n", ret, EEXIST, errno); > + RTE_LOG(ERR, CHANNEL_MANAGER, "Cannot create fifo '%s' > error: " > +
Re: [dpdk-dev] [PATCH] ethdev: make default behavior CRC strip on Rx
On 9/4/2018 6:54 AM, Andrew Rybchenko wrote: > On 09/04/2018 08:17 AM, Shahaf Shuler wrote: >> Hi Ferruh, >> >> Monday, September 3, 2018 5:45 PM, Ferruh Yigit: >>> Removed DEV_RX_OFFLOAD_CRC_STRIP offload flag. >>> Without any specific Rx offload flag, default behavior by PMDs is to >>> strip CRC. >>> >>> PMDs that support keeping CRC should advertise >>> DEV_RX_OFFLOAD_KEEP_CRC >>> Rx offload capability. >>> >>> Applications that require keeping CRC should check PMD capability first >>> and if it is supported can enable this feature by setting >>> DEV_RX_OFFLOAD_KEEP_CRC in Rx offload flag in rte_eth_dev_configure() >>> >>> Signed-off-by: Ferruh Yigit >>> --- >> [...] >> >>> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c >>> index 1f7bfd441..718f4b1d9 100644 >>> --- a/drivers/net/mlx5/mlx5_rxq.c >>> +++ b/drivers/net/mlx5/mlx5_rxq.c >>> @@ -388,7 +388,6 @@ mlx5_get_rx_queue_offloads(struct rte_eth_dev >>> *dev) >>> DEV_RX_OFFLOAD_TIMESTAMP | >>> DEV_RX_OFFLOAD_JUMBO_FRAME); >>> >>> - offloads |= DEV_RX_OFFLOAD_CRC_STRIP; >>> if (config->hw_fcs_strip) >>> offloads |= DEV_RX_OFFLOAD_KEEP_CRC; >>> >>> @@ -1438,7 +1437,7 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t >>> idx, uint16_t desc, >>> tmpl->rxq.vlan_strip = !!(offloads & DEV_RX_OFFLOAD_VLAN_STRIP); >>> /* By default, FCS (CRC) is stripped by hardware. */ >>> tmpl->rxq.crc_present = 0; >>> - if (rte_eth_dev_must_keep_crc(offloads)) { >>> + if (offloads | DEV_RX_OFFLOAD_KEEP_CRC) { >> I don't understand this logic, and it exists on many other location in the >> patch. >> Shouldn't it be (offloads & DEV_RX_OFFLOAD_KEEP_CRC) ? > > OMG, how can I overlook it on my review. Really good catch. Same here, new version coming soon.
Re: [dpdk-dev] [PATCH 2/2] app/test-eventdev: add Tx adapter support
On Tue, Sep 04, 2018 at 11:07:15AM +0530, Rao, Nikhil wrote: > Hi Pavan, > > Few comments below. > > On 8/31/2018 4:10 PM, Pavan Nikhilesh wrote: > > Convert existing Tx service based pipeline to Tx adapter based APIs and > > simplify worker functions. > > > > Signed-off-by: Pavan Nikhilesh > > --- > > app/test-eventdev/test_pipeline_atq.c| 216 +++- > > app/test-eventdev/test_pipeline_common.c | 193 ++ > > app/test-eventdev/test_pipeline_common.h | 43 ++-- > > app/test-eventdev/test_pipeline_queue.c | 238 --- > > 4 files changed, 322 insertions(+), 368 deletions(-) > > > > diff --git a/app/test-eventdev/test_pipeline_atq.c > > b/app/test-eventdev/test_pipeline_atq.c > > > -static int > > +static __rte_noinline int > > pipeline_atq_worker_multi_stage_burst_fwd(void *arg) > > { > > PIPELINE_WORKER_MULTI_STAGE_BURST_INIT; > > - const uint8_t nb_stages = t->opt->nb_stages; > > - const uint8_t tx_queue = t->tx_service.queue_id; > > + const uint8_t *tx_queue = t->tx_evqueue_id; > > > > while (t->done == false) { > > uint16_t nb_rx = rte_event_dequeue_burst(dev, port, ev, > > @@ -253,9 +235,10 @@ pipeline_atq_worker_multi_stage_burst_fwd(void *arg) > > > > if (cq_id == last_queue) { > > w->processed_pkts++; > > - ev[i].queue_id = tx_queue; > > + ev[i].queue_id = tx_queue[ev[i].mbuf->port]; > > pipeline_fwd_event(&ev[i], > > RTE_SCHED_TYPE_ATOMIC); > > + > Unintentional newline ? Will remove in next version. > > } else { > > ev[i].sub_event_type++; > > pipeline_fwd_event(&ev[i], > > static int > > > > @@ -317,23 +296,25 @@ pipeline_atq_eventdev_setup(struct evt_test *test, > > struct evt_options *opt) > > int nb_ports; > > int nb_queues; > > uint8_t queue; > > - struct rte_event_dev_info info; > > - struct test_pipeline *t = evt_test_priv(test); > > - uint8_t tx_evqueue_id = 0; > > + uint8_t tx_evqueue_id[RTE_MAX_ETHPORTS] = {0}; > > uint8_t queue_arr[RTE_EVENT_MAX_QUEUES_PER_DEV]; > > uint8_t nb_worker_queues = 0; > > + uint8_t tx_evport_id = 0; > > + uint16_t prod = 0; > > + struct rte_event_dev_info info; > > + struct test_pipeline *t = evt_test_priv(test); > > > > nb_ports = evt_nr_active_lcores(opt->wlcores); > > nb_queues = rte_eth_dev_count_avail(); > > > > - /* One extra port and queueu for Tx service */ > > - if (t->mt_unsafe) { > > - tx_evqueue_id = nb_queues; > > - nb_ports++; > > - nb_queues++; > > + /* One queue for Tx service */ > > + if (!t->internal_port) { > See comment about struct test_pipeline::internal_port in the > test_pipeline_common.h review below. > > > + RTE_ETH_FOREACH_DEV(prod) { > > + tx_evqueue_id[prod] = nb_queues; > > + nb_queues++; > > + } > > } > > > > > > @@ -388,14 +371,11 @@ pipeline_atq_eventdev_setup(struct evt_test *test, > > struct evt_options *opt) > > .new_event_threshold = info.max_num_events, > > }; > > > > - if (t->mt_unsafe) { > > + if (!t->internal_port) { > > ret = pipeline_event_port_setup(test, opt, queue_arr, > > nb_worker_queues, p_conf); > > if (ret) > > return ret; > > - > > - ret = pipeline_event_tx_service_setup(test, opt, > > tx_evqueue_id, > > - nb_ports - 1, p_conf); > > } else > > ret = pipeline_event_port_setup(test, opt, NULL, nb_queues, > > p_conf); > > @@ -424,14 +404,17 @@ pipeline_atq_eventdev_setup(struct evt_test *test, > > struct evt_options *opt) > >* stride = 1 > >* > >* event queue pipelines: > > - * eth0 -> q0 > > - *} (q3->tx) Tx service > > - * eth1 -> q1 > > + * eth0 -> q0 \ > > + * q3->tx > > + * eth1 -> q1 / > >* > >* q0,q1 are configured as stated above. > >* q3 configured as SINGLE_LINK|ATOMIC. > >*/ > > ret = pipeline_event_rx_adapter_setup(opt, 1, p_conf); > > + if (ret) > > + return ret; > > + ret = pipeline_event_tx_adapter_setup(opt, p_conf); > pipeline_event_tx_adapter_setup() creates a tx adapter per eth port, > that doesn't match the preceding diagram. I will fix in next version. > > > > > diff --git a/app/test-eventdev/test_pipeline_common.c > > b/app/test-eventdev/test_pipeline_common.c > > index a54068df3..7f858e23f 100644 > > --- a/app/test-eventdev/test_pipeline_
[dpdk-dev] [PATCH] doc: Clarify IOMMU usage with "uio-pci" kernel module
When binding the devices used by DPDK to the "uio-pci" kernel module, the IOMMU should be disabled in order not to break the IO transmission because of the virtual / physical address mapping. The patch clarifies the IOMMU configuration on both x86_64 and arm64 systems. Signed-off-by: tone.zhang --- doc/guides/linux_gsg/linux_drivers.rst | 7 +++ 1 file changed, 7 insertions(+) diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst index 371a817..8f9ec8f 100644 --- a/doc/guides/linux_gsg/linux_drivers.rst +++ b/doc/guides/linux_gsg/linux_drivers.rst @@ -48,6 +48,13 @@ be loaded as shown below: ``vfio-pci`` kernel module rather than ``igb_uio`` or ``uio_pci_generic``. For more details see :ref:`linux_gsg_binding_kernel` below. +.. note:: + + If the devices for used DPDK bound to the ``uio-pci`` kernel module, please make + sure that the IOMMU is disabled. We can add ``intel_iommu=off`` or ``amd_iommu=off`` + in ``GRUB_CMDLINE_LINUX`` in grub on x86_64 systems, or add ``iommu.passthrough=1`` + on arm64 system. + Since DPDK release 1.7 onward provides VFIO support, use of UIO is optional for platforms that support using VFIO. -- 2.7.4
Re: [dpdk-dev] [PATCH] ethdev: make default behavior CRC strip on Rx
On Mon, Sep 03, 2018 at 03:45:01PM +0100, Ferruh Yigit wrote: > Removed DEV_RX_OFFLOAD_CRC_STRIP offload flag. > Without any specific Rx offload flag, default behavior by PMDs is to > strip CRC. > > PMDs that support keeping CRC should advertise DEV_RX_OFFLOAD_KEEP_CRC > Rx offload capability. > > Applications that require keeping CRC should check PMD capability first > and if it is supported can enable this feature by setting > DEV_RX_OFFLOAD_KEEP_CRC in Rx offload flag in rte_eth_dev_configure() > > Signed-off-by: Ferruh Yigit > --- [...] > diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c > index 682444590..fa4af49af 100644 > --- a/drivers/net/mvpp2/mrvl_ethdev.c > +++ b/drivers/net/mvpp2/mrvl_ethdev.c > @@ -67,7 +67,6 @@ > /** Port Rx offload capabilities */ > #define MRVL_RX_OFFLOADS (DEV_RX_OFFLOAD_VLAN_FILTER | \ > DEV_RX_OFFLOAD_JUMBO_FRAME | \ > - DEV_RX_OFFLOAD_CRC_STRIP | \ > DEV_RX_OFFLOAD_CHECKSUM) > > /** Port Tx offloads capabilities */ > @@ -311,14 +310,6 @@ mrvl_dev_configure(struct rte_eth_dev *dev) > return -EINVAL; > } > > - /* KEEP_CRC offload flag is not supported by PMD > - * can remove the below block when DEV_RX_OFFLOAD_CRC_STRIP removed > - */ > - if (rte_eth_dev_must_keep_crc(dev->data->dev_conf.rxmode.offloads)) { > - MRVL_LOG(INFO, "L2 CRC stripping is always enabled in hw"); > - dev->data->dev_conf.rxmode.offloads |= DEV_RX_OFFLOAD_CRC_STRIP; > - } > - > if (dev->data->dev_conf.rxmode.split_hdr_size) { > MRVL_LOG(INFO, "Split headers not supported"); > return -EINVAL; > @@ -1334,7 +1325,6 @@ mrvl_dev_infos_get(struct rte_eth_dev *dev __rte_unused, > > /* By default packets are dropped if no descriptors are available */ > info->default_rxconf.rx_drop_en = 1; > - info->default_rxconf.offloads = DEV_RX_OFFLOAD_CRC_STRIP; > > info->max_rx_pktlen = MRVL_PKT_SIZE_MAX; > } As for mvpp2: Acked-by: Tomasz Duszynski > diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c > index 2200ee319..7f7ac155e 100644 > --- a/drivers/net/netvsc/hn_ethdev.c > +++ b/drivers/net/netvsc/hn_ethdev.c > @@ -40,8 +40,7 @@ > DEV_TX_OFFLOAD_VLAN_INSERT) > > #define HN_RX_OFFLOAD_CAPS (DEV_RX_OFFLOAD_CHECKSUM | \ > - DEV_RX_OFFLOAD_VLAN_STRIP | \ > - DEV_RX_OFFLOAD_CRC_STRIP) > + DEV_RX_OFFLOAD_VLAN_STRIP) > > int hn_logtype_init; > int hn_logtype_driver; > diff --git a/drivers/net/netvsc/hn_rndis.c b/drivers/net/netvsc/hn_rndis.c > index f44add726..9de99e16a 100644 > --- a/drivers/net/netvsc/hn_rndis.c > +++ b/drivers/net/netvsc/hn_rndis.c > @@ -892,8 +892,7 @@ int hn_rndis_get_offload(struct hn_data *hv, > == HN_NDIS_LSOV2_CAP_IP6) > dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO; > > - dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP | > - DEV_RX_OFFLOAD_CRC_STRIP; > + dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP; > > if (hwcaps.ndis_csum.ndis_ip4_rxcsum & NDIS_RXCSUM_CAP_IP4) > dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_IPV4_CKSUM; > diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c > index ee743e975..168088c6d 100644 > --- a/drivers/net/nfp/nfp_net.c > +++ b/drivers/net/nfp/nfp_net.c > @@ -411,12 +411,6 @@ nfp_net_configure(struct rte_eth_dev *dev) > return -EINVAL; > } > > - /* KEEP_CRC offload flag is not supported by PMD > - * can remove the below block when DEV_RX_OFFLOAD_CRC_STRIP removed > - */ > - if (rte_eth_dev_must_keep_crc(rxmode->offloads)) > - PMD_INIT_LOG(INFO, "HW does strip CRC. No configurable!"); > - > return 0; > } > > @@ -1168,8 +1162,7 @@ nfp_net_infos_get(struct rte_eth_dev *dev, struct > rte_eth_dev_info *dev_info) >DEV_RX_OFFLOAD_UDP_CKSUM | >DEV_RX_OFFLOAD_TCP_CKSUM; > > - dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_JUMBO_FRAME | > - DEV_RX_OFFLOAD_KEEP_CRC; > + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_JUMBO_FRAME; > > if (hw->cap & NFP_NET_CFG_CTRL_TXVLAN) > dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT; > diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c > index 244f86545..de10b5bdf 100644 > --- a/drivers/net/null/rte_eth_null.c > +++ b/drivers/net/null/rte_eth_null.c > @@ -305,7 +305,6 @@ eth_dev_info(struct rte_eth_dev *dev, > dev_info->min_rx_bufsize = 0; > dev_info->reta_size = internals->reta_size; > dev_info->flow_type_rss_offloads = internals->flow_type_rss_offloads; > - dev_info->rx_offload_capa = DEV_RX_OFFLOAD_CRC_STRIP; > } > > stati
[dpdk-dev] [PATCH v2] ethdev: make default behavior CRC strip on Rx
Removed DEV_RX_OFFLOAD_CRC_STRIP offload flag. Without any specific Rx offload flag, default behavior by PMDs is to strip CRC. PMDs that support keeping CRC should advertise DEV_RX_OFFLOAD_KEEP_CRC Rx offload capability. Applications that require keeping CRC should check PMD capability first and if it is supported can enable this feature by setting DEV_RX_OFFLOAD_KEEP_CRC in Rx offload flag in rte_eth_dev_configure() Signed-off-by: Ferruh Yigit Acked-by: Tomasz Duszynski --- v2: * fix flag check * add KEEP_CRC flag into "show port cap #" Note "show port cap #" and "show port # [r/t]x_offload capabilities/configuration" does same thing, in long term I suggest removing "show port cap" one --- app/test-eventdev/test_perf_common.c | 1 - app/test-eventdev/test_pipeline_common.c | 1 - app/test-pmd/cmdline.c| 2 -- app/test-pmd/config.c | 25 -- app/test-pmd/parameters.c | 4 +-- app/test-pmd/testpmd.c| 5 doc/guides/nics/features.rst | 3 ++- doc/guides/nics/fm10k.rst | 3 +-- doc/guides/rel_notes/deprecation.rst | 6 - doc/guides/sample_app_ug/flow_filtering.rst | 2 -- doc/guides/sample_app_ug/link_status_intr.rst | 1 - drivers/net/af_packet/rte_eth_af_packet.c | 1 - drivers/net/avf/avf_ethdev.c | 1 - drivers/net/avp/avp_ethdev.c | 1 - drivers/net/axgbe/axgbe_ethdev.c | 1 - drivers/net/axgbe/axgbe_rxtx.c| 2 +- drivers/net/bnxt/bnxt_ethdev.c| 1 - drivers/net/bnxt/bnxt_rxq.c | 6 +++-- drivers/net/cxgbe/cxgbe_ethdev.c | 12 - drivers/net/dpaa/dpaa_ethdev.c| 1 - drivers/net/dpaa2/dpaa2_ethdev.c | 1 - drivers/net/e1000/em_rxtx.c | 7 +++-- drivers/net/e1000/igb_ethdev.c| 8 +++--- drivers/net/e1000/igb_rxtx.c | 7 +++-- drivers/net/enic/enic_res.c | 1 - drivers/net/failsafe/failsafe_ops.c | 2 -- drivers/net/fm10k/fm10k_ethdev.c | 7 - drivers/net/i40e/i40e_ethdev.c| 1 - drivers/net/i40e/i40e_ethdev_vf.c | 3 +-- drivers/net/i40e/i40e_rxtx.c | 2 +- drivers/net/ixgbe/ixgbe_ethdev.c | 8 +++--- drivers/net/ixgbe/ixgbe_ipsec.c | 2 +- drivers/net/ixgbe/ixgbe_rxtx.c| 15 ++- drivers/net/kni/rte_eth_kni.c | 1 - drivers/net/mlx4/mlx4_rxq.c | 3 +-- drivers/net/mlx5/mlx5_rxq.c | 3 +-- drivers/net/mvpp2/mrvl_ethdev.c | 10 --- drivers/net/netvsc/hn_ethdev.c| 3 +-- drivers/net/netvsc/hn_rndis.c | 3 +-- drivers/net/nfp/nfp_net.c | 9 +-- drivers/net/null/rte_eth_null.c | 1 - drivers/net/octeontx/octeontx_ethdev.c| 8 -- drivers/net/octeontx/octeontx_ethdev.h| 3 +-- drivers/net/pcap/rte_eth_pcap.c | 1 - drivers/net/qede/qede_ethdev.c| 1 - drivers/net/ring/rte_eth_ring.c | 1 - drivers/net/sfc/sfc_rx.c | 9 --- drivers/net/softnic/rte_eth_softnic.c | 1 - drivers/net/szedata2/rte_eth_szedata2.c | 3 +-- drivers/net/tap/rte_eth_tap.c | 3 +-- drivers/net/thunderx/nicvf_ethdev.c | 9 --- drivers/net/thunderx/nicvf_ethdev.h | 1 - drivers/net/vhost/rte_eth_vhost.c | 3 +-- drivers/net/virtio/virtio_ethdev.c| 3 +-- drivers/net/vmxnet3/vmxnet3_ethdev.c | 3 +-- examples/bbdev_app/main.c | 1 - examples/bond/main.c | 1 - examples/exception_path/main.c| 3 --- examples/flow_filtering/main.c| 1 - examples/ip_fragmentation/main.c | 3 +-- examples/ip_pipeline/link.c | 1 - examples/ip_reassembly/main.c | 3 +-- examples/ipsec-secgw/ipsec-secgw.c| 3 +-- examples/ipv4_multicast/main.c| 3 +-- examples/kni/main.c | 3 --- examples/l2fwd-crypto/main.c | 1 - examples/l2fwd-jobstats/main.c| 1 - examples/l2fwd-keepalive/main.c | 1 - examples/l2fwd/main.c | 1 - examples/l3fwd-acl/main.c | 3 +-- examples/l3fwd-power/main.c | 3 +-- examples/l3fwd-vf/main.c | 3 +-- examples/l3fwd/main.c | 3 +-- examples/link_status_interrupt/main.c | 1 - examples/load_balancer/init.c | 3 +-- examples/multi_process/symmetric_mp/main.c| 3 +-- examples/netmap_compat/brid
Re: [dpdk-dev] [PATCH] doc: Clarify IOMMU usage with "uio-pci" kernel module
> -Original Message- > From: dev On Behalf Of tone.zhang > Sent: Tuesday, September 4, 2018 4:59 PM > To: dev@dpdk.org > Cc: nd > Subject: [dpdk-dev] [PATCH] doc: Clarify IOMMU usage with "uio-pci" kernel > module > > When binding the devices used by DPDK to the "uio-pci" kernel module, the > IOMMU should be disabled in order not to break the IO transmission > because of the virtual / physical address mapping. > > The patch clarifies the IOMMU configuration on both x86_64 and arm64 > systems. > > Signed-off-by: tone.zhang Acked-by: Gavin Hu > --- > doc/guides/linux_gsg/linux_drivers.rst | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/doc/guides/linux_gsg/linux_drivers.rst > b/doc/guides/linux_gsg/linux_drivers.rst > index 371a817..8f9ec8f 100644 > --- a/doc/guides/linux_gsg/linux_drivers.rst > +++ b/doc/guides/linux_gsg/linux_drivers.rst > @@ -48,6 +48,13 @@ be loaded as shown below: > ``vfio-pci`` kernel module rather than ``igb_uio`` or ``uio_pci_generic``. > For more details see :ref:`linux_gsg_binding_kernel` below. > > +.. note:: > + > + If the devices for used DPDK bound to the ``uio-pci`` kernel module, > please > make > + sure that the IOMMU is disabled. We can add ``intel_iommu=off`` or > ``amd_iommu=off`` > + in ``GRUB_CMDLINE_LINUX`` in grub on x86_64 systems, or add > ``iommu.passthrough=1`` > + on arm64 system. > + > Since DPDK release 1.7 onward provides VFIO support, use of UIO is optional > for platforms that support using VFIO. > > -- > 2.7.4
Re: [dpdk-dev] [PATCH] net/ixgbe: Strip SR-IOV transparent VLANs in VF
Hi Qi, On 04/09/2018 03:16, Zhang, Qi Z wrote: -Original Message- From: Robert Shearman [mailto:robertshear...@gmail.com] Sent: Monday, September 3, 2018 9:14 PM To: Zhang, Qi Z ; dev@dpdk.org Cc: Lu, Wenzhuo ; Ananyev, Konstantin ; Robert Shearman Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: Strip SR-IOV transparent VLANs in VF Hi Qi, On 03/09/2018 12:45, Zhang, Qi Z wrote: Hi Robert: -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of robertshear...@gmail.com Sent: Saturday, August 25, 2018 12:35 AM To: dev@dpdk.org Cc: Lu, Wenzhuo ; Ananyev, Konstantin ; Robert Shearman Subject: [dpdk-dev] [PATCH] net/ixgbe: Strip SR-IOV transparent VLANs in VF From: Robert Shearman SR-IOV VFs support "transparent" VLANs. Traffic from/to a VM associated with a VF has a VLAN tag inserted/stripped in a manner intended to be totally transparent to the VM. On a Linux hypervisor the vlan can be specified by "ip link set vf vlan ". The VM VF driver is not configured to use any VLAN and the VM should never see the transparent VLAN for that reason. However, in practice these VLAN headers are being received by the VM which discards the packets as that VLAN is unknown to it. The Linux kernel ixbge driver explicitly removes the VLAN in this case (presumably due to the hardware not being able to do this) but the DPDK driver does not. I'm not quite understand this part. What does explicitly remove the VLAN means?, DPDK also discard unmatched VLAN and strip vlan if vlan_strip is enabled what is the gap? It will be better if you can give same examples Sure. Typical use case for this is a hypervisor where it is necessary to provide L2 access into the guests, but there are insufficient, and so the hypervisor is using the PF and VFs are assigned to guests. In order to avoid having to configure each guest to use the VLAN and to not send any untagged traffic it is desirable to use transparent VLANs. For example: Guest 1 = VLAN 10 Guest 2 = VLAN 20 ip link set eth0 vf 1 vlan 10 ip link set eth0 vf 2 vlan 20 Now this means that packets arriving tagged on the physical port should be delivered to the guest and arrive in the guest untagged. Similarly, packets transmitted untagged by the guest should gain a tag before they go out of the physical port. What you get when using the Linux VF ixgbe driver inside the VMs is exactly this since the driver knows that for this hardware the transparent stripping isn't done in hardware and is done inside the driver. What you get currently when using the DPDK VF ixgbe driver inside the VMs is that packets arrive tagged (e.g. with VLAN tag 10) and these are then dropped because the VM doesn't know about VLAN 10. Transparent VLAN insertion works currently with both Linux and DPDK VF drivers. What do you mean "stripping isn't done in hardware" and "packets arrived tagged"? Let me explain how PMD driver works. (or it is expected) if we enable vlan_strip, the VLAN header is expected to be stripped from packet data by hardware. And in rx descriptor, it still keep the stripped vlan information, so driver will set mbuf->ol_flags with PKT_RX_VLAN | PKT_RX_VLAN_STRIPPED and also set stripped vlan tag to mbuf->vlan_tci So in my review, it is "stripping is done and packets arrived with untagged", and application also know what exactly happened and make decision based on the requirement So do you mean ixgbevf does not support vlan_strip as a hardware offload?, and it should be done with software? But in your code, I didn't see the part that vlan header is stripped from the packet data. ( set mbuf->ol_flag and mbuf->vlan_tci does not mean the vlan is stripped) I understand how the VLAN stripping hardware offload is supposed to work, but this use case is distinct from VLAN stripping and it was my mistake to use that loaded term in my explanation of the use case. The expectation in this case is that the packet arrive completely untagged, i.e. whether the VLAN has been stripped and placed in metadata or not. The application running inside the VM expects the packet to arrive is if the VLAN tag was never there. The application cannot do the removal of the VLAN tag itself because in this use case it is implicit that it shouldn't know about the tag and the presence of the tag is driver/hardware specific. Thanks for highlighting the PKT_RX_VLAN_STRIPPED flag - I should remove that as well when the transparent VLAN filter triggers. Thanks, Rob
Re: [dpdk-dev] [PATCH v2] ethdev: make default behavior CRC strip on Rx
Tuesday, September 4, 2018 1:13 PM, Ferruh Yigit: > Subject: [PATCH v2] ethdev: make default behavior CRC strip on Rx > > Removed DEV_RX_OFFLOAD_CRC_STRIP offload flag. > Without any specific Rx offload flag, default behavior by PMDs is to > strip CRC. > > PMDs that support keeping CRC should advertise > DEV_RX_OFFLOAD_KEEP_CRC > Rx offload capability. > > Applications that require keeping CRC should check PMD capability first > and if it is supported can enable this feature by setting > DEV_RX_OFFLOAD_KEEP_CRC in Rx offload flag in rte_eth_dev_configure() > > Signed-off-by: Ferruh Yigit > Acked-by: Tomasz Duszynski > --- > v2: > * fix flag check > * add KEEP_CRC flag into "show port cap #" > > Note "show port cap #" and > "show port # [r/t]x_offload capabilities/configuration" > does same thing, in long term I suggest removing "show port cap" one Acked-by: Shahaf Shuler
Re: [dpdk-dev] [PATCH] doc: Clarify IOMMU usage with "uio-pci" kernel module
On Tue, Sep 04, 2018 at 04:59:07PM +0800, tone.zhang wrote: > When binding the devices used by DPDK to the "uio-pci" kernel module, > the IOMMU should be disabled in order not to break the IO transmission > because of the virtual / physical address mapping. > > The patch clarifies the IOMMU configuration on both x86_64 and arm64 > systems. > > Signed-off-by: tone.zhang > --- > doc/guides/linux_gsg/linux_drivers.rst | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/doc/guides/linux_gsg/linux_drivers.rst > b/doc/guides/linux_gsg/linux_drivers.rst > index 371a817..8f9ec8f 100644 > --- a/doc/guides/linux_gsg/linux_drivers.rst > +++ b/doc/guides/linux_gsg/linux_drivers.rst > @@ -48,6 +48,13 @@ be loaded as shown below: > ``vfio-pci`` kernel module rather than ``igb_uio`` or ``uio_pci_generic``. > For more details see :ref:`linux_gsg_binding_kernel` below. > > +.. note:: > + > + If the devices for used DPDK bound to the ``uio-pci`` kernel module, > please make > + sure that the IOMMU is disabled. We can add ``intel_iommu=off`` or > ``amd_iommu=off`` > + in ``GRUB_CMDLINE_LINUX`` in grub on x86_64 systems, or add > ``iommu.passthrough=1`` > + on arm64 system. > + I think passthrough mode should work on x86 too. I remember running with iommu=pt setting in the kernel in the past. /Bruce
Re: [dpdk-dev] [PATCH] doc: Clarify IOMMU usage with "uio-pci" kernel module
On Tue, 2018-09-04 at 11:06 +0100, Bruce Richardson wrote: > On Tue, Sep 04, 2018 at 04:59:07PM +0800, tone.zhang wrote: > > When binding the devices used by DPDK to the "uio-pci" kernel > > module, > > the IOMMU should be disabled in order not to break the IO > > transmission > > because of the virtual / physical address mapping. > > > > The patch clarifies the IOMMU configuration on both x86_64 and > > arm64 > > systems. > > > > Signed-off-by: tone.zhang > > --- > > doc/guides/linux_gsg/linux_drivers.rst | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/doc/guides/linux_gsg/linux_drivers.rst > > b/doc/guides/linux_gsg/linux_drivers.rst > > index 371a817..8f9ec8f 100644 > > --- a/doc/guides/linux_gsg/linux_drivers.rst > > +++ b/doc/guides/linux_gsg/linux_drivers.rst > > @@ -48,6 +48,13 @@ be loaded as shown below: > > ``vfio-pci`` kernel module rather than ``igb_uio`` or > > ``uio_pci_generic``. > > For more details see :ref:`linux_gsg_binding_kernel` below. > > > > +.. note:: > > + > > + If the devices for used DPDK bound to the ``uio-pci`` kernel > > module, please make > > + sure that the IOMMU is disabled. We can add ``intel_iommu=off`` > > or ``amd_iommu=off`` > > + in ``GRUB_CMDLINE_LINUX`` in grub on x86_64 systems, or add > > ``iommu.passthrough=1`` > > + on arm64 system. > > + > > I think passthrough mode should work on x86 too. I remember running > with > iommu=pt setting in the kernel in the past. > > /Bruce It does, can confirm. -- Kind regards, Luca Boccassi
Re: [dpdk-dev] [PATCH] doc: Clarify IOMMU usage with "uio-pci" kernel module
-Original Message- From: Luca Boccassi Sent: Tuesday, September 4, 2018 6:15 PM To: Bruce Richardson ; Tone Zhang (Arm Technology China) Cc: dev@dpdk.org; nd Subject: Re: [dpdk-dev] [PATCH] doc: Clarify IOMMU usage with "uio-pci" kernel module On Tue, 2018-09-04 at 11:06 +0100, Bruce Richardson wrote: > On Tue, Sep 04, 2018 at 04:59:07PM +0800, tone.zhang wrote: > > When binding the devices used by DPDK to the "uio-pci" kernel > > module, the IOMMU should be disabled in order not to break the IO > > transmission because of the virtual / physical address mapping. > > > > The patch clarifies the IOMMU configuration on both x86_64 and > > arm64 > > systems. > > > > Signed-off-by: tone.zhang > > --- > > doc/guides/linux_gsg/linux_drivers.rst | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/doc/guides/linux_gsg/linux_drivers.rst > > b/doc/guides/linux_gsg/linux_drivers.rst > > index 371a817..8f9ec8f 100644 > > --- a/doc/guides/linux_gsg/linux_drivers.rst > > +++ b/doc/guides/linux_gsg/linux_drivers.rst > > @@ -48,6 +48,13 @@ be loaded as shown below: > > ``vfio-pci`` kernel module rather than ``igb_uio`` or > > ``uio_pci_generic``. > > For more details see :ref:`linux_gsg_binding_kernel` below. > > > > +.. note:: > > + > > + If the devices for used DPDK bound to the ``uio-pci`` kernel > > module, please make > > + sure that the IOMMU is disabled. We can add ``intel_iommu=off`` > > or ``amd_iommu=off`` > > + in ``GRUB_CMDLINE_LINUX`` in grub on x86_64 systems, or add > > ``iommu.passthrough=1`` > > + on arm64 system. > > + > > I think passthrough mode should work on x86 too. I remember running > with iommu=pt setting in the kernel in the past. > > /Bruce It does, can confirm. -- Kind regards, Luca Boccassi @Luca, @Bruce, Thanks for the comments. I will update the change and push V2. Thanks! Br, Tone
Re: [dpdk-dev] Tx vlan offload problem with igb and DPDK v17.11
Hi all, I have solved the issue of the PKT_TX_VLAN_PKT using the SW version rte_vlan_insert function. However I would like to tell you what I have seen during my tests. I hope it can shed a light on the issue you the developers should correct. When I use m->old_flags |= PKT_TX_VLAN_PKT my Wireshark captures reveals that the 802.1q and the vlan tag is attached to the output packet. The problem is the 'ether_proto' field of the vlan header, which is set again to 0x8100 (VLAN) instead of 0x0800 (IPv4). Apart from this, the rest of the packet is correct. So if this is corrected in the driver it will work, I think. Regards, El lun., 3 sept. 2018 a las 19:32, Victor Huertas () escribió: > Hi all, > > I have realized that the PKT_TX_VLAN_PKT flag for Tx Vlan Offload doesn't > work in my application. > > According to the NICs I have (IGB) there seems to be a problem with this > vlan offload tx feature and this version of DPDK according to the Bug 17 : > https://bugs.dpdk.org/show_bug.cgi?id=17 > > I have tested it using vfio_pci and igb_uio drivers as well as SW vlan > insertion (rte_vlan_insert) and the result is exactly the same. > > Have this bug been solved so far? > > These are my NICs: > 04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network > Connection (rev 01) > Subsystem: Super Micro Computer Inc Device 10c9 > Flags: fast devsel, IRQ 17 > Memory at fafe (32-bit, non-prefetchable) [disabled] [size=128K] > Memory at fafc (32-bit, non-prefetchable) [disabled] [size=128K] > I/O ports at ec00 [disabled] [size=32] > Memory at fafbc000 (32-bit, non-prefetchable) [disabled] [size=16K] > [virtual] Expansion ROM at faf8 [disabled] [size=128K] > Capabilities: [40] Power Management version 3 > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Capabilities: [70] MSI-X: Enable- Count=10 Masked- > Capabilities: [a0] Express Endpoint, MSI 00 > Capabilities: [100] Advanced Error Reporting > Capabilities: [140] Device Serial Number 00-30-48-ff-ff-bb-17-02 > Capabilities: [150] Alternative Routing-ID Interpretation (ARI) > Capabilities: [160] Single Root I/O Virtualization (SR-IOV) > Kernel driver in use: vfio-pci > Kernel modules: igb > > 04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network > Connection (rev 01) > Subsystem: Super Micro Computer Inc Device 10c9 > Flags: fast devsel, IRQ 16 > Memory at faf6 (32-bit, non-prefetchable) [disabled] [size=128K] > Memory at faf4 (32-bit, non-prefetchable) [disabled] [size=128K] > I/O ports at e880 [disabled] [size=32] > Memory at faf3c000 (32-bit, non-prefetchable) [disabled] [size=16K] > [virtual] Expansion ROM at faf0 [disabled] [size=128K] > Capabilities: [40] Power Management version 3 > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Capabilities: [70] MSI-X: Enable- Count=10 Masked- > Capabilities: [a0] Express Endpoint, MSI 00 > Capabilities: [100] Advanced Error Reporting > Capabilities: [140] Device Serial Number 00-30-48-ff-ff-bb-17-02 > Capabilities: [150] Alternative Routing-ID Interpretation (ARI) > Capabilities: [160] Single Root I/O Virtualization (SR-IOV) > Kernel driver in use: vfio-pci > Kernel modules: igb > > Thanks for your attention > > Regards, > > PD: BTW, I have observed that capturing a, for example, an ARP message in > an rx queue which the VLAN stripped the answer is sent correctly if I set > the PKT_TX_VLAN_PKT flag and the VLAN_TCI is the same... However, if I try > to set the VLAN header from a non-VLAN stripped frame then it doesnt work. > > > > -- > Victor > -- Victor
Re: [dpdk-dev] [PATCH] ethdev: make default behavior CRC strip on Rx
Hi Ferruh, Monday, September 3, 2018 5:45 PM, Ferruh Yigit: > Subject: [PATCH] ethdev: make default behavior CRC strip on Rx > > Removed DEV_RX_OFFLOAD_CRC_STRIP offload flag. > Without any specific Rx offload flag, default behavior by PMDs is to > strip CRC. > > PMDs that support keeping CRC should advertise > DEV_RX_OFFLOAD_KEEP_CRC > Rx offload capability. > > Applications that require keeping CRC should check PMD capability first > and if it is supported can enable this feature by setting > DEV_RX_OFFLOAD_KEEP_CRC in Rx offload flag in rte_eth_dev_configure() > > Signed-off-by: Ferruh Yigit [...] > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c > index 1f7bfd441..718f4b1d9 100644 > --- a/drivers/net/mlx5/mlx5_rxq.c > +++ b/drivers/net/mlx5/mlx5_rxq.c > @@ -388,7 +388,6 @@ mlx5_get_rx_queue_offloads(struct rte_eth_dev > *dev) >DEV_RX_OFFLOAD_TIMESTAMP | >DEV_RX_OFFLOAD_JUMBO_FRAME); > > - offloads |= DEV_RX_OFFLOAD_CRC_STRIP; > if (config->hw_fcs_strip) > offloads |= DEV_RX_OFFLOAD_KEEP_CRC; > > @@ -1438,7 +1437,7 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t > idx, uint16_t desc, > tmpl->rxq.vlan_strip = !!(offloads & > DEV_RX_OFFLOAD_VLAN_STRIP); > /* By default, FCS (CRC) is stripped by hardware. */ > tmpl->rxq.crc_present = 0; > - if (rte_eth_dev_must_keep_crc(offloads)) { > + if (offloads | DEV_RX_OFFLOAD_KEEP_CRC) { I don't understand this logic, and it exists on many other location in the patch. Shouldn't it be (offloads & DEV_RX_OFFLOAD_KEEP_CRC) ? > if (config->hw_fcs_strip) { > tmpl->rxq.crc_present = 1; > } else { Also I think the CRC offload should have an entry on the port caps printed by testpmt "show port caps 0" (see port_offload_cap_display()).
[dpdk-dev] [PATCH v3] net/i40e: add interface to choose latest vector path
Right now, vector path is limited to only use on later platform. This patch adds a devarg enable-latest-vec to allow the users to use the latest vector path that the platform supported. Namely, using AVX2 vector path on broadwell is possible. Signed-off-by: Xiaoyun Li --- v3: * Polish the doc and commit log. v2: * Correct the calling of the wrong function last time. * Fix seg fault bug. doc/guides/nics/i40e.rst | 8 ++ doc/guides/rel_notes/release_18_11.rst | 4 +++ drivers/net/i40e/i40e_ethdev.c | 38 ++ drivers/net/i40e/i40e_ethdev.h | 1 + drivers/net/i40e/i40e_rxtx.c | 27 ++ 5 files changed, 78 insertions(+) diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst index 65d87f869..6158e7c34 100644 --- a/doc/guides/nics/i40e.rst +++ b/doc/guides/nics/i40e.rst @@ -163,6 +163,14 @@ Runtime Config Options Currently hot-plugging of representor ports is not supported so all required representors must be specified on the creation of the PF. +- ``Enable latest vector`` (default ``disable``) + + Vector path was limited to use only on later platform. But users may want the + latest vector path. For example, VPP users may want to use AVX2 vector path on HSW/BDW + because it can get better perf. So ``devargs`` parameter ``enable-latest-vec`` + is introduced, for example:: +-w 84:00.0,enable-latest-vec=1 + Driver compilation and testing -- diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst index 3ae6b3f58..f8b0f3189 100644 --- a/doc/guides/rel_notes/release_18_11.rst +++ b/doc/guides/rel_notes/release_18_11.rst @@ -54,6 +54,10 @@ New Features Also, make sure to start the actual text at the margin. = +* **Added a devarg to eable the latest vector path.** + A new devarg ``enable-latest-vec`` was introduced to allow users to choose + the latest vector path that the platform supported. For example, VPP users + can use AVX2 vector path on BDW/HSW to get better performance. API Changes --- diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 85a6a867f..16b5345fb 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -12513,6 +12513,44 @@ i40e_config_rss_filter(struct i40e_pf *pf, return 0; } +#define ETH_I40E_ENABLE_LATEST_VEC "enable-latest-vec" + +bool +i40e_parse_latest_vec(struct rte_eth_dev *dev) +{ + static const char *const valid_keys[] = { + ETH_I40E_ENABLE_LATEST_VEC, NULL}; + int enable_latest_vec; + struct rte_kvargs *kvlist; + + if (!dev->device->devargs) + return 0; + + kvlist = rte_kvargs_parse(dev->device->devargs->args, valid_keys); + if (!kvlist) + return -EINVAL; + + if (!rte_kvargs_count(kvlist, ETH_I40E_ENABLE_LATEST_VEC)) + return 0; + + if (rte_kvargs_count(kvlist, ETH_I40E_ENABLE_LATEST_VEC) > 1) + PMD_DRV_LOG(WARNING, "More than one argument \"%s\" and only " + "the first one is used !", + ETH_I40E_ENABLE_LATEST_VEC); + + enable_latest_vec = atoi((&kvlist->pairs[0])->value); + + rte_kvargs_free(kvlist); + + if (enable_latest_vec != 0 && enable_latest_vec != 1) + PMD_DRV_LOG(WARNING, "Value should be 0 or 1, set it as 1!"); + + if (enable_latest_vec) + return true; + else + return false; +} + RTE_INIT(i40e_init_log) { i40e_logtype_init = rte_log_register("pmd.net.i40e.init"); diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h index 3fffe5a55..cdf68cd93 100644 --- a/drivers/net/i40e/i40e_ethdev.h +++ b/drivers/net/i40e/i40e_ethdev.h @@ -1243,6 +1243,7 @@ int i40e_config_rss_filter(struct i40e_pf *pf, struct i40e_rte_flow_rss_conf *conf, bool add); int i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params); int i40e_vf_representor_uninit(struct rte_eth_dev *ethdev); +bool i40e_parse_latest_vec(struct rte_eth_dev *dev); #define I40E_DEV_TO_PCI(eth_dev) \ RTE_DEV_TO_PCI((eth_dev)->device) diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 2a28ee348..75f8ec284 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -2960,6 +2960,15 @@ i40e_set_rx_function(struct rte_eth_dev *dev) if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F)) dev->rx_pkt_burst = i40e_recv_scattered_pkts_vec_avx2; + /* +* Give users chance to use the latest vector path +* that the platform supported. +*/ + i
[dpdk-dev] [PATCH] app/testpmd: add check for rx offload security flag
Add a check for the DEV_RX_OFFLOAD_SECURITY flag to the port_offload_cap_display(). Signed-off-by: Kevin Laatz --- app/test-pmd/config.c | 9 + 1 file changed, 9 insertions(+) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 14ccd68..4c60c7e 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -594,6 +594,15 @@ port_offload_cap_display(portid_t port_id) printf("off\n"); } + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_SECURITY) { + printf("RX offload security: "); + if (ports[port_id].dev_conf.rxmode.offloads & + DEV_RX_OFFLOAD_SECURITY) + printf("on\n"); + else + printf("off\n"); + } + if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_VLAN_INSERT) { printf("VLAN insert: "); if (ports[port_id].dev_conf.txmode.offloads & -- 2.9.5
[dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK
This is a proposal to enable using externally allocated memory in DPDK. In a nutshell, here is what is being done here: - Index internal malloc heaps by NUMA node index, rather than NUMA node itself (external heaps will have ID's in order of creation) - Add identifier string to malloc heap, to uniquely identify it - Each new heap will receive a unique socket ID that will be used by allocator to decide from which heap (internal or external) to allocate requested amount of memory - Allow creating named heaps and add/remove memory to/from those heaps - Allocate memseg lists at runtime, to keep track of IOVA addresses of externally allocated memory - If IOVA addresses aren't provided, use RTE_BAD_IOVA - Allow malloc and memzones to allocate from external heaps - Allow other data structures to allocate from externall heaps The responsibility to ensure memory is accessible before using it is on the shoulders of the user - there is no checking done with regards to validity of the memory (nor could there be...). The general approach is to create heap and add memory into it. For any other process wishing to use the same memory, said memory must first be attached (otherwise some things will not work). A design decision was made to make multiprocess synchronization a manual process. Due to underlying issues with attaching to fbarrays in secondary processes, this design was deemed to be better because we don't want to fail to create external heap in the primary because something in the secondary has failed when in fact we may not eve have wanted this memory to be accessible in the secondary in the first place. Using external memory in multiprocess is *hard*, because not only memory space needs to be preallocated, but it also needs to be attached in each process to allow other processes to access the page table. The attach API call may or may not succeed, depending on memory layout, for reasons similar to other multiprocess failures. This is treated as a "known issue" for this release. RFC -> v1 changes: - Removed the "named heaps" API, allocate using fake socket ID instead - Added multiprocess support - Everything is now thread-safe - Numerous bugfixes and API improvements Anatoly Burakov (16): mem: add length to memseg list mem: allow memseg lists to be marked as external malloc: index heaps using heap ID rather than NUMA node mem: do not check for invalid socket ID flow_classify: do not check for invalid socket ID pipeline: do not check for invalid socket ID sched: do not check for invalid socket ID malloc: add name to malloc heaps malloc: add function to query socket ID of named heap malloc: allow creating malloc heaps malloc: allow destroying heaps malloc: allow adding memory to named heaps malloc: allow removing memory from named heaps malloc: allow attaching to external memory chunks malloc: allow detaching from external memory test: add unit tests for external memory support config/common_base| 1 + config/rte_config.h | 1 + drivers/bus/fslmc/fslmc_vfio.c| 7 +- drivers/bus/pci/linux/pci.c | 2 +- drivers/net/mlx4/mlx4_mr.c| 3 + drivers/net/mlx5/mlx5.c | 5 +- drivers/net/mlx5/mlx5_mr.c| 3 + drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +- lib/librte_eal/bsdapp/eal/eal.c | 3 + lib/librte_eal/bsdapp/eal/eal_memory.c| 9 +- lib/librte_eal/common/eal_common_memory.c | 9 +- lib/librte_eal/common/eal_common_memzone.c| 8 +- .../common/include/rte_eal_memconfig.h| 6 +- lib/librte_eal/common/include/rte_malloc.h| 181 + .../common/include/rte_malloc_heap.h | 3 + lib/librte_eal/common/include/rte_memory.h| 9 + lib/librte_eal/common/malloc_heap.c | 287 +++-- lib/librte_eal/common/malloc_heap.h | 17 + lib/librte_eal/common/rte_malloc.c| 383 - lib/librte_eal/linuxapp/eal/eal.c | 3 + lib/librte_eal/linuxapp/eal/eal_memalloc.c| 12 +- lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +- lib/librte_eal/linuxapp/eal/eal_vfio.c| 17 +- lib/librte_eal/rte_eal_version.map| 7 + lib/librte_flow_classify/rte_flow_classify.c | 3 +- lib/librte_mempool/rte_mempool.c | 31 +- lib/librte_pipeline/rte_pipeline.c| 3 +- lib/librte_sched/rte_sched.c | 2 +- test/test/Makefile| 1 + test/test/autotest_data.py| 14 +- test/test/meson.build | 1 + test/test/test_external_mem.c | 384 ++ test/test/test_malloc.c | 3 + test/test/test_memzone.c | 3 + 34 files changed, 1346 insertions(+), 84 deletions(-) create mode 100644 test/
[dpdk-dev] [PATCH 01/16] mem: add length to memseg list
Previously, to calculate length of memory area covered by a memseg list, we would've needed to multiply page size by length of fbarray backing that memseg list. This is not obvious and unnecessarily low level, so store length in the memseg list itself. Signed-off-by: Anatoly Burakov --- drivers/bus/pci/linux/pci.c | 2 +- lib/librte_eal/bsdapp/eal/eal_memory.c| 2 ++ lib/librte_eal/common/eal_common_memory.c | 5 ++--- lib/librte_eal/common/include/rte_eal_memconfig.h | 1 + lib/librte_eal/linuxapp/eal/eal_memalloc.c| 3 ++- lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +++- 6 files changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 04648ac93..d6e1027ab 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev) static int find_max_end_va(const struct rte_memseg_list *msl, void *arg) { - size_t sz = msl->memseg_arr.len * msl->page_sz; + size_t sz = msl->len; void *end_va = RTE_PTR_ADD(msl->base_va, sz); void **max_va = arg; diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c index 16d2bc7c3..65ea670f9 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memory.c +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c @@ -79,6 +79,7 @@ rte_eal_hugepage_init(void) } msl->base_va = addr; msl->page_sz = page_sz; + msl->len = internal_config.memory; msl->socket_id = 0; /* populate memsegs. each memseg is 1 page long */ @@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl) return -1; } msl->base_va = addr; + msl->len = mem_sz; return 0; } diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index fbfb1b055..0868bf681 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl) /* a memseg list was specified, check if it's the right one */ start = msl->base_va; - end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len); + end = RTE_PTR_ADD(start, msl->len); if (addr < start || addr >= end) return NULL; @@ -194,8 +194,7 @@ virt2memseg_list(const void *addr) msl = &mcfg->memsegs[msl_idx]; start = msl->base_va; - end = RTE_PTR_ADD(start, - (size_t)msl->page_sz * msl->memseg_arr.len); + end = RTE_PTR_ADD(start, msl->len); if (addr >= start && addr < end) break; } diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index aff0688dd..1d8b0a6fe 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -30,6 +30,7 @@ struct rte_memseg_list { uint64_t addr_64; /**< Makes sure addr is always 64-bits */ }; + size_t len; /**< Length of memory area covered by this memseg list. */ int socket_id; /**< Socket ID for all memsegs in this list. */ uint64_t page_sz; /**< Page size for all memsegs in this list. */ volatile uint32_t version; /**< version number for multiprocess sync. */ diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index aa95551a8..d040a2f71 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -828,7 +828,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg) int msl_idx, seg_idx, ret, dir_fd = -1; start_addr = (uintptr_t) msl->base_va; - end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz; + end_addr = start_addr + msl->len; if ((uintptr_t)wa->ms->addr < start_addr || (uintptr_t)wa->ms->addr >= end_addr) @@ -1314,6 +1314,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl, return -1; } local_msl->base_va = primary_msl->base_va; + local_msl->len = primary_msl->len; return 0; } diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index dbf19499e..c522538bf 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -857,6 +857,7 @@ alloc_va_space(struct rte_memseg_list *msl) return -1; } msl->base_va = addr; + msl->len = mem_sz; return 0; } @@ -1365,6 +1366,7 @@ eal_legacy_hugepage_init(void) msl->base_va = addr;
[dpdk-dev] [PATCH 06/16] pipeline: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov --- lib/librte_pipeline/rte_pipeline.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c index 0cb8b804e..2c047a8a4 100644 --- a/lib/librte_pipeline/rte_pipeline.c +++ b/lib/librte_pipeline/rte_pipeline.c @@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params) } /* socket */ - if ((params->socket_id < 0) || - (params->socket_id >= RTE_MAX_NUMA_NODES)) { + if (params->socket_id < 0) { RTE_LOG(ERR, PIPELINE, "%s: Incorrect value for parameter socket_id\n", __func__); -- 2.17.1
[dpdk-dev] [PATCH 10/16] malloc: allow creating malloc heaps
Add API to allow creating new malloc heaps. They will be created with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing with internal heaps. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 19 lib/librte_eal/common/malloc_heap.c| 30 + lib/librte_eal/common/malloc_heap.h| 3 ++ lib/librte_eal/common/rte_malloc.c | 52 ++ lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 105 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 8870732a6..182afab1c 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -263,6 +263,25 @@ int rte_malloc_get_socket_stats(int socket, struct rte_malloc_socket_stats *socket_stats); +/** + * Creates a new empty malloc heap with a specified name. + * + * @note Heaps created via this call will automatically get assigned a unique + * socket ID, which can be found using ``rte_malloc_heap_get_socket()`` + * + * @param heap_name + * Name of the heap to create. + * + * @return + * - 0 on successful creation + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - ``heap_name`` was NULL, empty or too long + * EEXIST - heap by name of ``heap_name`` already exists + * ENOSPC - no more space in internal config to store a new heap + */ +int __rte_experimental +rte_malloc_heap_create(const char *heap_name); + /** * Find socket ID corresponding to a named heap. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 813961f0c..2742f7b11 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -29,6 +29,10 @@ #include "malloc_heap.h" #include "malloc_mp.h" +/* start external socket ID's at a very high number */ +#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */ +#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES)) + static unsigned check_hugepage_sz(unsigned flags, uint64_t hugepage_sz) { @@ -1006,6 +1010,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f) rte_spinlock_unlock(&heap->lock); } +int +malloc_heap_create(struct malloc_heap *heap, const char *heap_name) +{ + static uint32_t next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID; + + /* prevent overflow. did you really create 2 billion heaps??? */ + if (next_socket_id > INT32_MAX) { + RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n"); + rte_errno = ENOSPC; + return -1; + } + + /* initialize empty heap */ + heap->alloc_count = 0; + heap->first = NULL; + heap->last = NULL; + LIST_INIT(heap->free_head); + rte_spinlock_init(&heap->lock); + heap->total_size = 0; + heap->socket_id = next_socket_id++; + + /* set up name */ + strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN); + return 0; +} + int rte_eal_malloc_heap_init(void) { diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index 61b844b6f..eebee16dc 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -33,6 +33,9 @@ void * malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags, size_t align, bool contig); +int +malloc_heap_create(struct malloc_heap *heap, const char *heap_name); + int malloc_heap_free(struct malloc_elem *elem); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index b789333b3..ade5af406 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -286,3 +287,54 @@ rte_malloc_virt2iova(const void *addr) return ms->iova + RTE_PTR_DIFF(addr, ms->addr); } + +int +rte_malloc_heap_create(const char *heap_name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + int i, ret; + + if (heap_name == NULL || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + /* check if there is space in the heap list, or if heap with this name +* already exists. +*/ + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); + + for (i = 0; i < RTE_MAX_HEAPS; i++) { + struct malloc_heap *tmp = &mcfg->malloc_heaps[i]; + /* existing heap */ + if (strncmp(heap_name, tmp->name, + RTE_HEAP_NAME_MAX_LEN) == 0) { +
[dpdk-dev] [PATCH 16/16] test: add unit tests for external memory support
Add simple unit tests to test external memory support. The tests are pretty basic and mostly consist of checking if invalid API calls are handled correctly, plus a simple allocation/deallocation test for malloc and memzone. Signed-off-by: Anatoly Burakov --- test/test/Makefile| 1 + test/test/autotest_data.py| 14 +- test/test/meson.build | 1 + test/test/test_external_mem.c | 384 ++ 4 files changed, 396 insertions(+), 4 deletions(-) create mode 100644 test/test/test_external_mem.c diff --git a/test/test/Makefile b/test/test/Makefile index e6967bab6..074ac6e03 100644 --- a/test/test/Makefile +++ b/test/test/Makefile @@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c SRCS-y += test_reciprocal_division.c SRCS-y += test_reciprocal_division_perf.c SRCS-y += test_fbarray.c +SRCS-y += test_external_mem.c SRCS-y += test_ring.c SRCS-y += test_ring_perf.c diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py index f68d9b111..51f8e1689 100644 --- a/test/test/autotest_data.py +++ b/test/test/autotest_data.py @@ -477,10 +477,16 @@ "Report": None, }, { -"Name":"Fbarray autotest", -"Command": "fbarray_autotest", -"Func":default_autotest, -"Report": None, + "Name":"Fbarray autotest", + "Command": "fbarray_autotest", + "Func":default_autotest, + "Report": None, +}, +{ + "Name":"External memory autotest", + "Command": "external_mem_autotest", + "Func":default_autotest, + "Report": None, }, # #Please always keep all dump tests at the end and together! diff --git a/test/test/meson.build b/test/test/meson.build index b1dd6eca2..3abf02b71 100644 --- a/test/test/meson.build +++ b/test/test/meson.build @@ -155,6 +155,7 @@ test_names = [ 'eventdev_common_autotest', 'eventdev_octeontx_autotest', 'eventdev_sw_autotest', + 'external_mem_autotest', 'func_reentrancy_autotest', 'flow_classify_autotest', 'hash_scaling_autotest', diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c new file mode 100644 index 0..5edb5c348 --- /dev/null +++ b/test/test/test_external_mem.c @@ -0,0 +1,384 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "test.h" + +#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */ + +static int +test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, + int n_pages) +{ + static const char * const names[] = { + NULL, /* NULL name */ + "", /* empty name */ + "this heap name is definitely way too long to be valid" + }; + const char *valid_name = "valid heap name"; + unsigned int i; + + /* check invalid name handling */ + for (i = 0; i < RTE_DIM(names); i++) { + const char *name = names[i]; + + /* these calls may fail for other reasons, so check errno */ + if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Created heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Destroyed heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_get_socket(name) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Found socket for heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_add(name, addr, len, + NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) { + printf("%s():%i: Added memory to heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Removed memory from heap with invalid name\n", + __func__, __LINE__); + goto fail; + } + + if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 || + rte_errno != EINVAL) { + printf("%s():%i: Attached memory to heap with inv
[dpdk-dev] [PATCH 02/16] mem: allow memseg lists to be marked as external
When we allocate and use DPDK memory, we need to be able to differentiate between DPDK hugepage segments and segments that were made part of DPDK but are externally allocated. Add such a property to memseg lists. All current calls for memseg walk functions were adjusted to ignore external segments where it made sense. Mempools is a special case, because we may be asked to allocate a mempool on a specific socket, and we need to ignore all page sizes on other heaps or other sockets. Previously, this assumption of knowing all page sizes was not a problem, but it will be now, so we have to match socket ID with page size when calculating minimum page size for a mempool. Signed-off-by: Anatoly Burakov --- Notes: v1: - Adjust all calls to memseg walk functions to ignore external segments where it made sense to do so drivers/bus/fslmc/fslmc_vfio.c| 7 +++-- drivers/net/mlx4/mlx4_mr.c| 3 ++ drivers/net/mlx5/mlx5.c | 5 ++- drivers/net/mlx5/mlx5_mr.c| 3 ++ drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++- lib/librte_eal/bsdapp/eal/eal.c | 3 ++ lib/librte_eal/bsdapp/eal/eal_memory.c| 7 +++-- lib/librte_eal/common/eal_common_memory.c | 4 +++ .../common/include/rte_eal_memconfig.h| 1 + lib/librte_eal/common/include/rte_memory.h| 9 ++ lib/librte_eal/common/malloc_heap.c | 9 -- lib/librte_eal/linuxapp/eal/eal.c | 3 ++ lib/librte_eal/linuxapp/eal/eal_memalloc.c| 9 ++ lib/librte_eal/linuxapp/eal/eal_vfio.c| 17 +++--- lib/librte_mempool/rte_mempool.c | 31 ++- test/test/test_malloc.c | 3 ++ test/test/test_memzone.c | 3 ++ 17 files changed, 102 insertions(+), 20 deletions(-) diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c index 4c2cd2a87..2e9244fb7 100644 --- a/drivers/bus/fslmc/fslmc_vfio.c +++ b/drivers/bus/fslmc/fslmc_vfio.c @@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len) } static int -fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused, -const struct rte_memseg *ms, void *arg) +fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms, + void *arg) { int *n_segs = arg; int ret; + if (msl->external) + return 0; + ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len); if (ret) DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)", diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c index d23d3c613..9f5d790b6 100644 --- a/drivers/net/mlx4/mlx4_mr.c +++ b/drivers/net/mlx4/mlx4_mr.c @@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl, { struct mr_find_contig_memsegs_data *data = arg; + if (msl->external) + return 0; + if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len) return 0; /* Found, save it and stop walking. */ diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index ec63bc6e2..d9ed15880 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver; static void *uar_base; static int -find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused, +find_lower_va_bound(const struct rte_memseg_list *msl, const struct rte_memseg *ms, void *arg) { void **addr = arg; + if (msl->external) + return 0; + if (*addr == NULL) *addr = ms->addr; else diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index 1d1bcb5fe..fd4345f9c 100644 --- a/drivers/net/mlx5/mlx5_mr.c +++ b/drivers/net/mlx5/mlx5_mr.c @@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl, { struct mr_find_contig_memsegs_data *data = arg; + if (msl->external) + return 0; + if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len) return 0; /* Found, save it and stop walking. */ diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c index b2444096c..885c59c8a 100644 --- a/drivers/net/virtio/virtio_user/vhost_kernel.c +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c @@ -75,13 +75,16 @@ struct walk_arg { uint32_t region_nr; }; static int -add_memory_region(const struct rte_memseg_list *msl __rte_unused, +add_memory_region(const struct rte_memseg_list *msl, const struct rte_memseg *ms, size_t len, void *arg) { struct walk_arg *wa = arg; struct vhost_memory_region *mr; void *start_addr; + if (msl->external) + return 0; + if (wa->region_nr >= max_re
[dpdk-dev] [PATCH 07/16] sched: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov --- lib/librte_sched/rte_sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c index 9269e5c71..d4e2189c7 100644 --- a/lib/librte_sched/rte_sched.c +++ b/lib/librte_sched/rte_sched.c @@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params) return -1; /* socket */ - if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES)) + if (params->socket < 0) return -3; /* rate */ -- 2.17.1
[dpdk-dev] [PATCH 11/16] malloc: allow destroying heaps
Add an API to destroy specified heap. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 23 + lib/librte_eal/common/malloc_heap.c| 22 lib/librte_eal/common/malloc_heap.h| 3 ++ lib/librte_eal/common/rte_malloc.c | 58 ++ lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 107 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 182afab1c..8a8cc1e6d 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket, int __rte_experimental rte_malloc_heap_create(const char *heap_name); +/** + * Destroys a previously created malloc heap with specified name. + * + * @note This function will return a failure result if not all memory allocated + * from the heap has been freed back to the heap + * + * @note This function will return a failure result if not all memory segments + * were removed from the heap prior to its destruction + * + * @param heap_name + * Name of the heap to create. + * + * @return + * - 0 on success + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - ``heap_name`` was NULL, empty or too long + * ENOENT - heap by the name of ``heap_name`` was not found + * EPERM - attempting to destroy reserved heap + * EBUSY - heap still contains data + */ +int __rte_experimental +rte_malloc_heap_destroy(const char *heap_name); + /** * Find socket ID corresponding to a named heap. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 2742f7b11..471094cd1 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -1036,6 +1036,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name) return 0; } +int +malloc_heap_destroy(struct malloc_heap *heap) +{ + if (heap->alloc_count != 0) { + RTE_LOG(ERR, EAL, "Heap is still in use\n"); + rte_errno = EBUSY; + return -1; + } + if (heap->first != NULL || heap->last != NULL) { + RTE_LOG(ERR, EAL, "Heap still contains memory segments\n"); + rte_errno = EBUSY; + return -1; + } + if (heap->total_size != 0) + RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n"); + + /* after this, the lock will be dropped */ + memset(heap, 0, sizeof(*heap)); + + return 0; +} + int rte_eal_malloc_heap_init(void) { diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index eebee16dc..75278da3c 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags, int malloc_heap_create(struct malloc_heap *heap, const char *heap_name); +int +malloc_heap_destroy(struct malloc_heap *heap); + int malloc_heap_free(struct malloc_elem *elem); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index ade5af406..d135f9730 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -288,6 +288,21 @@ rte_malloc_virt2iova(const void *addr) return ms->iova + RTE_PTR_DIFF(addr, ms->addr); } +static struct malloc_heap * +find_named_heap(const char *name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + unsigned int i; + + for (i = 0; i < RTE_MAX_HEAPS; i++) { + struct malloc_heap *heap = &mcfg->malloc_heaps[i]; + + if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN)) + return heap; + } + return NULL; +} + int rte_malloc_heap_create(const char *heap_name) { @@ -338,3 +353,46 @@ rte_malloc_heap_create(const char *heap_name) return ret; } + +int +rte_malloc_heap_destroy(const char *heap_name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + int ret; + + if (heap_name == NULL || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); + + /* start from non-socket heaps */ + heap = find_named_heap(heap_name); + if (heap == NULL) { + RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name); + rte_errno = ENOENT; + ret = -1; + goto unlock; + } + /* we shouldn't be able to destroy internal heaps */ + if (heap->socket
[dpdk-dev] [PATCH 14/16] malloc: allow attaching to external memory chunks
In order to use external memory in multiple processes, we need to attach to primary process's memseg lists, so add a new API to do that. It is the responsibility of the user to ensure that memory is accessible and that it has been previously added to the malloc heap by another process. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 32 + lib/librte_eal/common/rte_malloc.c | 83 ++ lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 116 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 9bbe8e3af..37af0e481 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket, * * @note Multiple memory chunks can be added to the same heap * + * @note Before accessing this memory in other processes, it needs to be + * attached in each of those processes by calling + * ``rte_malloc_heap_memory_attach`` in each other process. + * * @note Memory must be previously allocated for DPDK to be able to use it as a * malloc heap. Failing to do so will result in undefined behavior, up to and * including segmentation faults. @@ -329,12 +333,38 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, int __rte_experimental rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len); +/** + * Attach to an already existing chunk of external memory in another process. + * + * @note This function must be called before any attempt is made to use an + * already existing external memory chunk. This function does *not* need to + * be called if a call to ``rte_malloc_heap_memory_add`` was made in the + * current process. + * + * @param heap_name + * Heap name to which this chunk of memory belongs + * @param va_addr + * Start address of memory chunk to attach to + * @param len + * Length of memory chunk to attach to + * @return + * 0 on successful attach + * -1 on unsuccessful attach, with rte_errno set to indicate cause for error: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to attach memory to a reserved heap + * ENOENT - heap or memory chunk was not found + */ +int __rte_experimental +rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len); + /** * Creates a new empty malloc heap with a specified name. * * @note Heaps created via this call will automatically get assigned a unique * socket ID, which can be found using ``rte_malloc_heap_get_socket()`` * + * @note This function has to only be called in one process. + * * @param heap_name * Name of the heap to create. * @@ -357,6 +387,8 @@ rte_malloc_heap_create(const char *heap_name); * @note This function will return a failure result if not all memory segments * were removed from the heap prior to its destruction * + * @note This function has to only be called in one process. + * * @param heap_name * Name of the heap to create. * diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 5093c4a46..2ed173466 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -393,6 +393,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len) return ret; } +struct sync_mem_walk_arg { + void *va_addr; + size_t len; + int result; +}; + +static int +attach_mem_walk(const struct rte_memseg_list *msl, void *arg) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct sync_mem_walk_arg *wa = arg; + size_t len = msl->page_sz * msl->memseg_arr.len; + + if (msl->base_va == wa->va_addr && + len == wa->len) { + struct rte_memseg_list *found_msl; + int msl_idx, ret; + + /* msl is const */ + msl_idx = msl - mcfg->memsegs; + found_msl = &mcfg->memsegs[msl_idx]; + + ret = rte_fbarray_attach(&found_msl->memseg_arr); + + if (ret < 0) + wa->result = -rte_errno; + else + wa->result = 0; + return 1; + } + return 0; +} + +int +rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + struct sync_mem_walk_arg wa; + int ret; + + if (heap_name == NULL || va_addr == NULL || len == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; +
[dpdk-dev] [PATCH 12/16] malloc: allow adding memory to named heaps
Add an API to add externally allocated memory to malloc heap. The memory will be stored in memseg lists like regular DPDK memory. Multiple segments are allowed within a heap. If IOVA table is not provided, IOVA addresses are filled in with RTE_BAD_IOVA. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 39 lib/librte_eal/common/malloc_heap.c| 74 ++ lib/librte_eal/common/malloc_heap.h| 4 ++ lib/librte_eal/common/rte_malloc.c | 51 +++ lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 169 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 8a8cc1e6d..47f867a05 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -263,6 +263,45 @@ int rte_malloc_get_socket_stats(int socket, struct rte_malloc_socket_stats *socket_stats); +/** + * Add memory chunk to a heap with specified name. + * + * @note Multiple memory chunks can be added to the same heap + * + * @note Memory must be previously allocated for DPDK to be able to use it as a + * malloc heap. Failing to do so will result in undefined behavior, up to and + * including segmentation faults. + * + * @note Calling this function will erase any contents already present at the + * supplied memory address. + * + * @param heap_name + * Name of the heap to add memory chunk to + * @param va_addr + * Start of virtual area to add to the heap + * @param len + * Length of virtual area to add to the heap + * @param iova_addrs + * Array of page IOVA addresses corresponding to each page in this memory + * area. Can be NULL, in which case page IOVA addresses will be set to + * RTE_BAD_IOVA. + * @param n_pages + * Number of elements in the iova_addrs array. Ignored if ``iova_addrs`` + * is NULL. + * @param page_sz + * Page size of the underlying memory + * + * @return + * - 0 on success + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to add memory to a reserved heap + * ENOSPC - no more space in internal config to store a new memory chunk + */ +int __rte_experimental +rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, + rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz); + /** * Creates a new empty malloc heap with a specified name. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 471094cd1..af2476504 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -1010,6 +1010,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f) rte_spinlock_unlock(&heap->lock); } +int +malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, + rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + char fbarray_name[RTE_FBARRAY_NAME_LEN]; + struct rte_memseg_list *msl = NULL; + struct rte_fbarray *arr; + size_t seg_len = n_pages * page_sz; + unsigned int i; + + /* first, find a free memseg list */ + for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { + struct rte_memseg_list *tmp = &mcfg->memsegs[i]; + if (tmp->base_va == NULL) { + msl = tmp; + break; + } + } + if (msl == NULL) { + RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n"); + rte_errno = ENOSPC; + return -1; + } + + snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p", + heap->name, va_addr); + + /* create the backing fbarray */ + if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages, + sizeof(struct rte_memseg)) < 0) { + RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n"); + return -1; + } + arr = &msl->memseg_arr; + + /* fbarray created, fill it up */ + for (i = 0; i < n_pages; i++) { + struct rte_memseg *ms; + + rte_fbarray_set_used(arr, i); + ms = rte_fbarray_get(arr, i); + ms->addr = RTE_PTR_ADD(va_addr, n_pages * page_sz); + ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i]; + ms->hugepage_sz = page_sz; + ms->len = page_sz; + ms->nchannel = rte_memory_get_nchannel(); + ms->nrank = rte_memory_get_nrank(); + ms->socket_id = heap->socket_id; + } + + /* set up the memseg list */ + msl->base_va = va_addr; + msl->page_sz = page_sz; + msl->socket_id = heap->socket_id; +
[dpdk-dev] [PATCH 09/16] malloc: add function to query socket ID of named heap
When we will be creating external heaps, they will have their own "fake" socket ID, so add a function that will map the heap name to its socket ID. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 14 lib/librte_eal/common/rte_malloc.c | 37 ++ lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 52 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index a9fb7e452..8870732a6 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -263,6 +263,20 @@ int rte_malloc_get_socket_stats(int socket, struct rte_malloc_socket_stats *socket_stats); +/** + * Find socket ID corresponding to a named heap. + * + * @param name + * Heap name to find socket ID for + * @return + * Socket ID in case of success (a non-negative number) + * -1 in case of error, with rte_errno set to one of the following: + * EINVAL - ``name`` was NULL + * ENOENT - heap identified by the name ``name`` was not found + */ +int __rte_experimental +rte_malloc_heap_get_socket(const char *name); + /** * Dump statistics. * diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 0515d47f3..b789333b3 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include @@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f) rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); } +int +rte_malloc_heap_get_socket(const char *name) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = NULL; + unsigned int idx; + int ret; + + if (name == NULL || + strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 || + strnlen(name, RTE_HEAP_NAME_MAX_LEN) == + RTE_HEAP_NAME_MAX_LEN) { + rte_errno = EINVAL; + return -1; + } + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + for (idx = 0; idx < RTE_MAX_HEAPS; idx++) { + struct malloc_heap *tmp = &mcfg->malloc_heaps[idx]; + + if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN)) { + heap = tmp; + break; + } + } + + if (heap != NULL) { + ret = heap->socket_id; + } else { + rte_errno = ENOENT; + ret = -1; + } + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} + /* * Print stats on memory type. If type is NULL, info on all types is printed */ diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 344a43d32..6fd729b8b 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -311,6 +311,7 @@ EXPERIMENTAL { rte_fbarray_set_used; rte_log_register_type_and_pick_level; rte_malloc_dump_heaps; + rte_malloc_heap_get_socket; rte_mem_alloc_validator_register; rte_mem_alloc_validator_unregister; rte_mem_event_callback_register; -- 2.17.1
[dpdk-dev] [PATCH 13/16] malloc: allow removing memory from named heaps
Add an API to remove memory from specified heaps. This will first check if all elements within the region are free, and that the region is the original region that was added to the heap (by comparing its length to length of memory addressed by the underlying memseg list). Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 27 +++ lib/librte_eal/common/malloc_heap.c| 54 ++ lib/librte_eal/common/malloc_heap.h| 4 ++ lib/librte_eal/common/rte_malloc.c | 39 lib/librte_eal/rte_eal_version.map | 1 + 5 files changed, 125 insertions(+) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 47f867a05..9bbe8e3af 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -302,6 +302,33 @@ int __rte_experimental rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz); +/** + * Remove memory chunk from heap with specified name. + * + * @note Memory chunk being removed must be the same as one that was added; + * partially removing memory chunks is not supported + * + * @note Memory area must not contain any allocated elements to allow its + * removal from the heap + * + * @param heap_name + * Name of the heap to remove memory from + * @param va_addr + * Virtual address to remove from the heap + * @param len + * Length of virtual area to remove from the heap + * + * @return + * - 0 on success + * - -1 in case of error, with rte_errno set to one of the following: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to remove memory from a reserved heap + * ENOENT - heap or memory chunk was not found + * EBUSY - memory chunk still contains data + */ +int __rte_experimental +rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len); + /** * Creates a new empty malloc heap with a specified name. * diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index af2476504..7d1d4a290 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -1010,6 +1010,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f) rte_spinlock_unlock(&heap->lock); } +static int +destroy_seg(struct malloc_elem *elem, size_t len) +{ + struct malloc_heap *heap = elem->heap; + struct rte_memseg_list *msl; + + msl = elem->msl; + + /* this element can be removed */ + malloc_elem_free_list_remove(elem); + malloc_elem_hide_region(elem, elem, len); + + heap->total_size -= len; + + memset(elem, 0, sizeof(*elem)); + + /* destroy the fbarray backing this memory */ + if (rte_fbarray_destroy(&msl->memseg_arr) < 0) + return -1; + + /* reset the memseg list */ + memset(msl, 0, sizeof(*msl)); + + return 0; +} + int malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz) @@ -1084,6 +1110,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, return 0; } +int +malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr, + size_t len) +{ + struct malloc_elem *elem = heap->first; + + /* find element with specified va address */ + while (elem != NULL && elem != va_addr) { + elem = elem->next; + /* stop if we've blown past our VA */ + if (elem > (struct malloc_elem *)va_addr) { + rte_errno = ENOENT; + return -1; + } + } + /* check if element was found */ + if (elem == NULL || elem->msl->len != len) { + rte_errno = ENOENT; + return -1; + } + /* if element's size is not equal to segment len, segment is busy */ + if (elem->state == ELEM_BUSY || elem->size != len) { + rte_errno = EBUSY; + return -1; + } + return destroy_seg(elem, len); +} + int malloc_heap_create(struct malloc_heap *heap, const char *heap_name) { diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h index 237ce9dc2..e48996d52 100644 --- a/lib/librte_eal/common/malloc_heap.h +++ b/lib/librte_eal/common/malloc_heap.h @@ -43,6 +43,10 @@ int malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr, rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz); +int +malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr, + size_t len); + int malloc_heap_free(struct malloc_elem *elem); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c in
[dpdk-dev] [PATCH 03/16] malloc: index heaps using heap ID rather than NUMA node
Switch over all parts of EAL to use heap ID instead of NUMA node ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA node's index within the detected NUMA node list. Heap ID for external heaps will be order of their creation. Signed-off-by: Anatoly Burakov --- config/common_base| 1 + config/rte_config.h | 1 + .../common/include/rte_eal_memconfig.h| 4 +- .../common/include/rte_malloc_heap.h | 1 + lib/librte_eal/common/malloc_heap.c | 85 +-- lib/librte_eal/common/malloc_heap.h | 3 + lib/librte_eal/common/rte_malloc.c| 41 ++--- 7 files changed, 94 insertions(+), 42 deletions(-) diff --git a/config/common_base b/config/common_base index 4bcbaf923..e96c52054 100644 --- a/config/common_base +++ b/config/common_base @@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64 CONFIG_RTE_LIBRTE_EAL=y CONFIG_RTE_MAX_LCORE=128 CONFIG_RTE_MAX_NUMA_NODES=8 +CONFIG_RTE_MAX_HEAPS=32 CONFIG_RTE_MAX_MEMSEG_LISTS=64 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller diff --git a/config/rte_config.h b/config/rte_config.h index a8e479774..1f330c24e 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -21,6 +21,7 @@ /** library defines / /* EAL defines */ +#define RTE_MAX_HEAPS 32 #define RTE_MAX_MEMSEG_LISTS 128 #define RTE_MAX_MEMSEG_PER_LIST 8192 #define RTE_MAX_MEM_MB_PER_LIST 32768 diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index 76faf9a4a..5c6bd4bc3 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -72,8 +72,8 @@ struct rte_mem_config { struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */ - /* Heaps of Malloc per socket */ - struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES]; + /* Heaps of Malloc */ + struct malloc_heap malloc_heaps[RTE_MAX_HEAPS]; /* address of mem_config in primary process. used to map shared config into * exact same address the primary process maps it. diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h index d43fa9097..e7ac32d42 100644 --- a/lib/librte_eal/common/include/rte_malloc_heap.h +++ b/lib/librte_eal/common/include/rte_malloc_heap.h @@ -27,6 +27,7 @@ struct malloc_heap { unsigned alloc_count; size_t total_size; + unsigned int socket_id; } __rte_cache_aligned; #endif /* _RTE_MALLOC_HEAP_H_ */ diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 8c37b9d7c..0a868f61d 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz) return check_flag & flags; } +int +malloc_socket_to_heap_id(unsigned int socket_id) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int i; + + for (i = 0; i < RTE_MAX_HEAPS; i++) { + struct malloc_heap *heap = &mcfg->malloc_heaps[i]; + + if (heap->socket_id == socket_id) + return i; + } + return -1; +} + /* * Expand the heap with a memory area. */ @@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl, struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; struct rte_memseg_list *found_msl; struct malloc_heap *heap; - int msl_idx; + int msl_idx, heap_idx; if (msl->external) return 0; - heap = &mcfg->malloc_heaps[msl->socket_id]; + heap_idx = malloc_socket_to_heap_id(msl->socket_id); + heap = &mcfg->malloc_heaps[heap_idx]; /* msl is const, so find it */ msl_idx = msl - mcfg->memsegs; @@ -111,6 +127,7 @@ malloc_add_seg(const struct rte_memseg_list *msl, malloc_heap_add_memory(heap, found_msl, ms->addr, len); heap->total_size += len; + heap->socket_id = msl->socket_id; RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20, msl->socket_id); @@ -563,12 +580,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket, /* this will try lower page sizes first */ static void * -heap_alloc_on_socket(const char *type, size_t size, int socket, - unsigned int flags, size_t align, size_t bound, bool contig) +malloc_heap_alloc_on_heap_id(const char *type, size_t size, + unsigned int heap_id, unsigned int flags, size_t align, + size_t bound, bool contig) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - struct malloc_heap *heap = &mcfg->mallo
[dpdk-dev] [PATCH 05/16] flow_classify: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov --- lib/librte_flow_classify/rte_flow_classify.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c index 4c3469da1..fb652a2b7 100644 --- a/lib/librte_flow_classify/rte_flow_classify.c +++ b/lib/librte_flow_classify/rte_flow_classify.c @@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params) } /* socket */ - if ((params->socket_id < 0) || - (params->socket_id >= RTE_MAX_NUMA_NODES)) { + if (params->socket_id < 0) { RTE_FLOW_CLASSIFY_LOG(ERR, "%s: Incorrect value for parameter socket_id\n", __func__); -- 2.17.1
[dpdk-dev] [PATCH 08/16] malloc: add name to malloc heaps
We will need to refer to external heaps in some way. While we use heap ID's internally, for external API use it has to be something more user-friendly. So, we will be using a string to uniquely identify a heap. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++ lib/librte_eal/common/malloc_heap.c | 15 ++- lib/librte_eal/common/rte_malloc.c | 1 + 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h index e7ac32d42..1c08ef3e0 100644 --- a/lib/librte_eal/common/include/rte_malloc_heap.h +++ b/lib/librte_eal/common/include/rte_malloc_heap.h @@ -12,6 +12,7 @@ /* Number of free lists per heap, grouped by size. */ #define RTE_HEAP_NUM_FREELISTS 13 +#define RTE_HEAP_NAME_MAX_LEN 32 /* dummy definition, for pointers */ struct malloc_elem; @@ -28,6 +29,7 @@ struct malloc_heap { unsigned alloc_count; size_t total_size; unsigned int socket_id; + char name[RTE_HEAP_NAME_MAX_LEN]; } __rte_cache_aligned; #endif /* _RTE_MALLOC_HEAP_H_ */ diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 0a868f61d..813961f0c 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl, malloc_heap_add_memory(heap, found_msl, ms->addr, len); heap->total_size += len; - heap->socket_id = msl->socket_id; RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20, msl->socket_id); @@ -1011,6 +1010,20 @@ int rte_eal_malloc_heap_init(void) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + unsigned int i; + + /* assign names to default DPDK heaps */ + for (i = 0; i < rte_socket_count(); i++) { + struct malloc_heap *heap = &mcfg->malloc_heaps[i]; + char heap_name[RTE_HEAP_NAME_MAX_LEN]; + int socket_id = rte_socket_id_by_idx(i); + + snprintf(heap_name, sizeof(heap_name) - 1, + "socket_%i", socket_id); + strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN); + heap->socket_id = socket_id; + } + if (register_mp_requests()) { RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n"); diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 458c44ba6..0515d47f3 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type) malloc_heap_get_stats(heap, &sock_stats); fprintf(f, "Heap id:%u\n", heap_id); + fprintf(f, "\tHeap name:%s\n", heap->name); fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes); fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes); fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes); -- 2.17.1
[dpdk-dev] [PATCH 15/16] malloc: allow detaching from external memory
Add API to detach from existing chunk of external memory in a process. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/include/rte_malloc.h | 27 ++ lib/librte_eal/common/rte_malloc.c | 27 ++ lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 50 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h index 37af0e481..0794f58e5 100644 --- a/lib/librte_eal/common/include/rte_malloc.h +++ b/lib/librte_eal/common/include/rte_malloc.h @@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len, * @note Memory area must not contain any allocated elements to allow its * removal from the heap * + * @note All other processes must detach from the memory chunk prior to it being + * removed from the heap. + * * @param heap_name * Name of the heap to remove memory from * @param va_addr @@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len); int __rte_experimental rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len); +/** + * Detach from a chunk of external memory in secondary process. + * + * @note This function must be called in before any attempt is made to remove + * external memory from the heap in another process. This function does *not* + * need to be called if a call to ``rte_malloc_heap_memory_remove`` will be + * called in current process. + * + * @param heap_name + * Heap name to which this chunk of memory belongs + * @param va_addr + * Start address of memory chunk to attach to + * @param len + * Length of memory chunk to attach to + * @return + * 0 on successful detach + * -1 on unsuccessful detach, with rte_errno set to indicate cause for error: + * EINVAL - one of the parameters was invalid + * EPERM - attempted to detach memory from a reserved heap + * ENOENT - heap or memory chunk was not found + */ +int __rte_experimental +rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len); + /** * Creates a new empty malloc heap with a specified name. * diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index 2ed173466..08571e5a0 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -397,10 +397,11 @@ struct sync_mem_walk_arg { void *va_addr; size_t len; int result; + bool attach; }; static int -attach_mem_walk(const struct rte_memseg_list *msl, void *arg) +sync_mem_walk(const struct rte_memseg_list *msl, void *arg) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; struct sync_mem_walk_arg *wa = arg; @@ -415,7 +416,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg) msl_idx = msl - mcfg->memsegs; found_msl = &mcfg->memsegs[msl_idx]; - ret = rte_fbarray_attach(&found_msl->memseg_arr); + if (wa->attach) + ret = rte_fbarray_attach(&found_msl->memseg_arr); + else + ret = rte_fbarray_detach(&found_msl->memseg_arr); if (ret < 0) wa->result = -rte_errno; @@ -426,8 +430,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg) return 0; } -int -rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) +static int +sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; struct malloc_heap *heap = NULL; @@ -461,9 +465,10 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) wa.va_addr = va_addr; wa.len = len; wa.result = -ENOENT; /* fail unless explicitly told to succeed */ + wa.attach = attach; /* we're already holding a read lock */ - rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa); + rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa); if (wa.result < 0) { rte_errno = -wa.result; @@ -476,6 +481,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) return ret; } +int +rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len) +{ + return sync_memory(heap_name, va_addr, len, true); +} + +int +rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len) +{ + return sync_memory(heap_name, va_addr, len, false); +} + int rte_malloc_heap_create(const char *heap_name) { diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 822c5693a..73fecb912 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -316,6 +316,7 @@ EXPERIMENTAL
[dpdk-dev] [PATCH 04/16] mem: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and malloc will now be able to verify if a supplied socket ID is in fact a valid one, rendering parameter checks for sockets obsolete. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_memzone.c | 8 +--- lib/librte_eal/common/rte_malloc.c | 4 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c index 7300fe05d..b7081afbf 100644 --- a/lib/librte_eal/common/eal_common_memzone.c +++ b/lib/librte_eal/common/eal_common_memzone.c @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len, return NULL; } - if ((socket_id != SOCKET_ID_ANY) && - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) { + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) { rte_errno = EINVAL; return NULL; } - if (!rte_eal_has_hugepages()) + /* only set socket to SOCKET_ID_ANY if we aren't allocating for an +* external heap. +*/ + if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES) socket_id = SOCKET_ID_ANY; contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0; diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index dfcdf380a..458c44ba6 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align, if (!rte_eal_has_hugepages()) socket_arg = SOCKET_ID_ANY; - /* Check socket parameter */ - if (socket_arg >= RTE_MAX_NUMA_NODES) - return NULL; - return malloc_heap_alloc(type, size, socket_arg, 0, align == 0 ? 1 : align, 0, false); } -- 2.17.1
Re: [dpdk-dev] [PATCH] net/bonding: don't ignore RSS key on device configuration
On Wed, Aug 29, 2018 at 3:51 AM Andrew Rybchenko wrote: > From: Igor Romanov > > Bonding driver ignores the value of RSS key (that is set in the port RSS > configuration) in bond_ethdev_configure(). So the only way to set > non-default RSS key is by using rss_hash_update(). This is not an > expected behaviour. > > Make the bond_ethdev_configure() set default RSS key only if > requested key is set to NULL. > > Fixes: 734ce47f71e0 ("bonding: support RSS dynamic configuration") > Cc: sta...@dpdk.org > > Signed-off-by: Igor Romanov > Signed-off-by: Andrew Rybchenko > Acked-by: Chas Williams > --- > drivers/net/bonding/rte_eth_bond_pmd.c | 27 ++ > 1 file changed, 19 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c > b/drivers/net/bonding/rte_eth_bond_pmd.c > index b84f32263..ad670cc20 100644 > --- a/drivers/net/bonding/rte_eth_bond_pmd.c > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c > @@ -1778,12 +1778,11 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev, > > /* If RSS is enabled for bonding, try to enable it for slaves */ > if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode & > ETH_MQ_RX_RSS_FLAG) { > - if > (bonded_eth_dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key_len > - != 0) { > + if (internals->rss_key_len != 0) { > > slave_eth_dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key_len = > - > bonded_eth_dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key_len; > + internals->rss_key_len; > > slave_eth_dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key = > - > bonded_eth_dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key; > + internals->rss_key; > } else { > > slave_eth_dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key = NULL; > } > @@ -3284,11 +3283,23 @@ bond_ethdev_configure(struct rte_eth_dev *dev) > > unsigned i, j; > > - /* If RSS is enabled, fill table and key with default values */ > + /* > +* If RSS is enabled, fill table with default values and > +* set key to the the value specified in port RSS configuration. > +* Fall back to default RSS key if the key is not specified > +*/ > if (dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS) { > - dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key = > internals->rss_key; > - dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key_len = 0; > - memcpy(internals->rss_key, default_rss_key, 40); > + if (dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key != > NULL) { > + internals->rss_key_len = > + > dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key_len; > + memcpy(internals->rss_key, > + > dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key, > + internals->rss_key_len); > + } else { > + internals->rss_key_len = sizeof(default_rss_key); > + memcpy(internals->rss_key, default_rss_key, > + internals->rss_key_len); > + } > > for (i = 0; i < RTE_DIM(internals->reta_conf); i++) { > internals->reta_conf[i].mask = ~0LL; > -- > 2.17.1 > >
Re: [dpdk-dev] [PATCH] net/bonding: use evenly distributed default RSS RETA
On Wed, Aug 29, 2018 at 3:48 AM Andrew Rybchenko wrote: > From: Igor Romanov > > Default Redirection Table that is set in bonding driver is distributed > evenly over all Rx queues only within every RETA group (the first RETA > entries in every group are always start with zero). But in the most > drivers, default RETA is distributed over all Rx queues without sequence > resets in the beginning of a new group, which implies more balanced > per-core load. > > Change the default RETA to be evenly distributed over all Rx queues > considering the whole table. > > Fixes: 734ce47f71e0 ("bonding: support RSS dynamic configuration") > Cc: sta...@dpdk.org > > Signed-off-by: Igor Romanov > Signed-off-by: Andrew Rybchenko > Acked-by: Chas Williams > --- > drivers/net/bonding/rte_eth_bond_pmd.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c > b/drivers/net/bonding/rte_eth_bond_pmd.c > index b84f32263..0f5ab09e3 100644 > --- a/drivers/net/bonding/rte_eth_bond_pmd.c > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c > @@ -3293,7 +3293,9 @@ bond_ethdev_configure(struct rte_eth_dev *dev) > for (i = 0; i < RTE_DIM(internals->reta_conf); i++) { > internals->reta_conf[i].mask = ~0LL; > for (j = 0; j < RTE_RETA_GROUP_SIZE; j++) > - internals->reta_conf[i].reta[j] = j % > dev->data->nb_rx_queues; > + internals->reta_conf[i].reta[j] = > + (i * RTE_RETA_GROUP_SIZE + > j) % > + dev->data->nb_rx_queues; > } > } > > -- > 2.17.1 > >
Re: [dpdk-dev] Tx vlan offload problem with igb and DPDK v17.11
Forget about it, I found a bug in my software. Once solved, no problem with PKT_TX_VLAN_PKT at all. Regards, El lun., 3 sept. 2018 a las 19:32, Victor Huertas () escribió: > Hi all, > > I have realized that the PKT_TX_VLAN_PKT flag for Tx Vlan Offload doesn't > work in my application. > > According to the NICs I have (IGB) there seems to be a problem with this > vlan offload tx feature and this version of DPDK according to the Bug 17 : > https://bugs.dpdk.org/show_bug.cgi?id=17 > > I have tested it using vfio_pci and igb_uio drivers as well as SW vlan > insertion (rte_vlan_insert) and the result is exactly the same. > > Have this bug been solved so far? > > These are my NICs: > 04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network > Connection (rev 01) > Subsystem: Super Micro Computer Inc Device 10c9 > Flags: fast devsel, IRQ 17 > Memory at fafe (32-bit, non-prefetchable) [disabled] [size=128K] > Memory at fafc (32-bit, non-prefetchable) [disabled] [size=128K] > I/O ports at ec00 [disabled] [size=32] > Memory at fafbc000 (32-bit, non-prefetchable) [disabled] [size=16K] > [virtual] Expansion ROM at faf8 [disabled] [size=128K] > Capabilities: [40] Power Management version 3 > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Capabilities: [70] MSI-X: Enable- Count=10 Masked- > Capabilities: [a0] Express Endpoint, MSI 00 > Capabilities: [100] Advanced Error Reporting > Capabilities: [140] Device Serial Number 00-30-48-ff-ff-bb-17-02 > Capabilities: [150] Alternative Routing-ID Interpretation (ARI) > Capabilities: [160] Single Root I/O Virtualization (SR-IOV) > Kernel driver in use: vfio-pci > Kernel modules: igb > > 04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network > Connection (rev 01) > Subsystem: Super Micro Computer Inc Device 10c9 > Flags: fast devsel, IRQ 16 > Memory at faf6 (32-bit, non-prefetchable) [disabled] [size=128K] > Memory at faf4 (32-bit, non-prefetchable) [disabled] [size=128K] > I/O ports at e880 [disabled] [size=32] > Memory at faf3c000 (32-bit, non-prefetchable) [disabled] [size=16K] > [virtual] Expansion ROM at faf0 [disabled] [size=128K] > Capabilities: [40] Power Management version 3 > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Capabilities: [70] MSI-X: Enable- Count=10 Masked- > Capabilities: [a0] Express Endpoint, MSI 00 > Capabilities: [100] Advanced Error Reporting > Capabilities: [140] Device Serial Number 00-30-48-ff-ff-bb-17-02 > Capabilities: [150] Alternative Routing-ID Interpretation (ARI) > Capabilities: [160] Single Root I/O Virtualization (SR-IOV) > Kernel driver in use: vfio-pci > Kernel modules: igb > > Thanks for your attention > > Regards, > > PD: BTW, I have observed that capturing a, for example, an ARP message in > an rx queue which the VLAN stripped the answer is sent correctly if I set > the PKT_TX_VLAN_PKT flag and the VLAN_TCI is the same... However, if I try > to set the VLAN header from a non-VLAN stripped frame then it doesnt work. > > > > -- > Victor > -- Victor
[dpdk-dev] [PATCH v2 00/12] net/mvpp2: add new features
This patch series introduces fixes and adds support for traffic metering, traffic manager and Tx S/G. Additionally it aligns with for MUSDK 18.09. Changes since v2: * Align with MUSDK 18.09 library * Add support for Tx Gather. * Add documentation related to MTR and TM. * Align documentation with MUSDK 18.09 Natalie Samsonov (4): net/mvpp2: initialize ppio only once net/mvpp2: update MTU and MRU related calculations net/mvpp2: align documentation with MUSDK 18.09 net/mvpp2: document MTR and TM usage Tomasz Duszynski (6): net/mvpp2: move common code net/mvpp2: add metering support net/mvpp2: change default policer configuration net/mvpp2: add init and deinit to flow net/mvpp2: add traffic manager support net/mvpp2: align with MUSDK 18.09 Yuval Caduri (1): net/mvpp2: detach tx_qos from rx cls/qos config Zyta Szpak (1): net/mvpp2: add Tx S/G support doc/guides/nics/img/mvpp2_tm.png | Bin 0 -> 5355 bytes doc/guides/nics/mvpp2.rst| 433 +--- drivers/net/mvpp2/Makefile |2 + drivers/net/mvpp2/meson.build|4 +- drivers/net/mvpp2/mrvl_ethdev.c | 427 +--- drivers/net/mvpp2/mrvl_ethdev.h | 123 - drivers/net/mvpp2/mrvl_flow.c| 132 +++-- drivers/net/mvpp2/mrvl_flow.h| 15 + drivers/net/mvpp2/mrvl_mtr.c | 512 +++ drivers/net/mvpp2/mrvl_mtr.h | 15 + drivers/net/mvpp2/mrvl_qos.c | 246 +- drivers/net/mvpp2/mrvl_qos.h |2 +- drivers/net/mvpp2/mrvl_tm.c | 1009 ++ drivers/net/mvpp2/mrvl_tm.h | 15 + 14 files changed, 2624 insertions(+), 311 deletions(-) create mode 100644 doc/guides/nics/img/mvpp2_tm.png create mode 100644 drivers/net/mvpp2/mrvl_flow.h create mode 100644 drivers/net/mvpp2/mrvl_mtr.c create mode 100644 drivers/net/mvpp2/mrvl_mtr.h create mode 100644 drivers/net/mvpp2/mrvl_tm.c create mode 100644 drivers/net/mvpp2/mrvl_tm.h -- 2.7.4
[dpdk-dev] [PATCH v2 01/12] net/mvpp2: initialize ppio only once
From: Natalie Samsonov This changes stop/start/configure behavior due to issue in MUSDK library itself. From now on, ppio can be reconfigured only after interface is closed. Signed-off-by: Natalie Samsonov Reviewed-by: Yuval Caduri --- drivers/net/mvpp2/mrvl_ethdev.c | 53 + 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index 6824445..f022cad 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -304,6 +304,11 @@ mrvl_dev_configure(struct rte_eth_dev *dev) struct mrvl_priv *priv = dev->data->dev_private; int ret; + if (priv->ppio) { + MRVL_LOG(INFO, "Device reconfiguration is not supported"); + return -EINVAL; + } + if (dev->data->dev_conf.rxmode.mq_mode != ETH_MQ_RX_NONE && dev->data->dev_conf.rxmode.mq_mode != ETH_MQ_RX_RSS) { MRVL_LOG(INFO, "Unsupported rx multi queue mode %d", @@ -525,6 +530,9 @@ mrvl_dev_start(struct rte_eth_dev *dev) char match[MRVL_MATCH_LEN]; int ret = 0, i, def_init_size; + if (priv->ppio) + return mrvl_dev_set_link_up(dev); + snprintf(match, sizeof(match), "ppio-%d:%d", priv->pp_id, priv->ppio_id); priv->ppio_params.match = match; @@ -749,28 +757,7 @@ mrvl_flush_bpool(struct rte_eth_dev *dev) static void mrvl_dev_stop(struct rte_eth_dev *dev) { - struct mrvl_priv *priv = dev->data->dev_private; - mrvl_dev_set_link_down(dev); - mrvl_flush_rx_queues(dev); - mrvl_flush_tx_shadow_queues(dev); - if (priv->cls_tbl) { - pp2_cls_tbl_deinit(priv->cls_tbl); - priv->cls_tbl = NULL; - } - if (priv->qos_tbl) { - pp2_cls_qos_tbl_deinit(priv->qos_tbl); - priv->qos_tbl = NULL; - } - if (priv->ppio) - pp2_ppio_deinit(priv->ppio); - priv->ppio = NULL; - - /* policer must be released after ppio deinitialization */ - if (priv->policer) { - pp2_cls_plcr_deinit(priv->policer); - priv->policer = NULL; - } } /** @@ -785,6 +772,9 @@ mrvl_dev_close(struct rte_eth_dev *dev) struct mrvl_priv *priv = dev->data->dev_private; size_t i; + mrvl_flush_rx_queues(dev); + mrvl_flush_tx_shadow_queues(dev); + for (i = 0; i < priv->ppio_params.inqs_params.num_tcs; ++i) { struct pp2_ppio_tc_params *tc_params = &priv->ppio_params.inqs_params.tcs_params[i]; @@ -795,7 +785,28 @@ mrvl_dev_close(struct rte_eth_dev *dev) } } + if (priv->cls_tbl) { + pp2_cls_tbl_deinit(priv->cls_tbl); + priv->cls_tbl = NULL; + } + + if (priv->qos_tbl) { + pp2_cls_qos_tbl_deinit(priv->qos_tbl); + priv->qos_tbl = NULL; + } + mrvl_flush_bpool(dev); + + if (priv->ppio) { + pp2_ppio_deinit(priv->ppio); + priv->ppio = NULL; + } + + /* policer must be released after ppio deinitialization */ + if (priv->policer) { + pp2_cls_plcr_deinit(priv->policer); + priv->policer = NULL; + } } /** -- 2.7.4
[dpdk-dev] [PATCH v2 02/12] net/mvpp2: move common code
Cleanup sources by moving common code to the pmd header file. Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov Reviewed-by: Liron Himi --- drivers/net/mvpp2/mrvl_ethdev.c | 9 - drivers/net/mvpp2/mrvl_ethdev.h | 11 +++ drivers/net/mvpp2/mrvl_flow.c | 5 - drivers/net/mvpp2/mrvl_qos.c| 9 - 4 files changed, 11 insertions(+), 23 deletions(-) diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index f022cad..adb07d0 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -10,15 +10,6 @@ #include #include -/* Unluckily, container_of is defined by both DPDK and MUSDK, - * we'll declare only one version. - * - * Note that it is not used in this PMD anyway. - */ -#ifdef container_of -#undef container_of -#endif - #include #include #include diff --git a/drivers/net/mvpp2/mrvl_ethdev.h b/drivers/net/mvpp2/mrvl_ethdev.h index 3726f78..2204be2 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.h +++ b/drivers/net/mvpp2/mrvl_ethdev.h @@ -10,12 +10,23 @@ #include #include +/* + * container_of is defined by both DPDK and MUSDK, + * we'll declare only one version. + * + * Note that it is not used in this PMD anyway. + */ +#ifdef container_of +#undef container_of +#endif + #include #include #include #include #include #include +#include "env/mv_common.h" /* for BIT() */ /** Maximum number of rx queues per port */ #define MRVL_PP2_RXQ_MAX 32 diff --git a/drivers/net/mvpp2/mrvl_flow.c b/drivers/net/mvpp2/mrvl_flow.c index 13295e6..db750f4 100644 --- a/drivers/net/mvpp2/mrvl_flow.c +++ b/drivers/net/mvpp2/mrvl_flow.c @@ -11,13 +11,8 @@ #include -#ifdef container_of -#undef container_of -#endif - #include "mrvl_ethdev.h" #include "mrvl_qos.h" -#include "env/mv_common.h" /* for BIT() */ /** Number of rules in the classifier table. */ #define MRVL_CLS_MAX_NUM_RULES 20 diff --git a/drivers/net/mvpp2/mrvl_qos.c b/drivers/net/mvpp2/mrvl_qos.c index 71856c1..eeb46f8 100644 --- a/drivers/net/mvpp2/mrvl_qos.c +++ b/drivers/net/mvpp2/mrvl_qos.c @@ -15,15 +15,6 @@ #include #include -/* Unluckily, container_of is defined by both DPDK and MUSDK, - * we'll declare only one version. - * - * Note that it is not used in this PMD anyway. - */ -#ifdef container_of -#undef container_of -#endif - #include "mrvl_qos.h" /* Parsing tokens. Defined conveniently, so that any correction is easy. */ -- 2.7.4
[dpdk-dev] [PATCH v2 03/12] net/mvpp2: add metering support
Add support for configuring plcr via DPDK generic metering API. Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov Reviewed-by: Liron Himi --- drivers/net/mvpp2/Makefile | 1 + drivers/net/mvpp2/meson.build | 3 +- drivers/net/mvpp2/mrvl_ethdev.c | 24 ++ drivers/net/mvpp2/mrvl_ethdev.h | 71 ++ drivers/net/mvpp2/mrvl_flow.c | 91 +++ drivers/net/mvpp2/mrvl_mtr.c| 512 drivers/net/mvpp2/mrvl_mtr.h| 15 ++ 7 files changed, 673 insertions(+), 44 deletions(-) create mode 100644 drivers/net/mvpp2/mrvl_mtr.c create mode 100644 drivers/net/mvpp2/mrvl_mtr.h diff --git a/drivers/net/mvpp2/Makefile b/drivers/net/mvpp2/Makefile index 211d398..4848d65 100644 --- a/drivers/net/mvpp2/Makefile +++ b/drivers/net/mvpp2/Makefile @@ -39,5 +39,6 @@ LDLIBS += -lrte_bus_vdev -lrte_common_mvep SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_ethdev.c SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_qos.c SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_flow.c +SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_mtr.c include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/mvpp2/meson.build b/drivers/net/mvpp2/meson.build index 3620659..f475511 100644 --- a/drivers/net/mvpp2/meson.build +++ b/drivers/net/mvpp2/meson.build @@ -19,7 +19,8 @@ endif sources = files( 'mrvl_ethdev.c', 'mrvl_flow.c', - 'mrvl_qos.c' + 'mrvl_qos.c', + 'mrvl_mtr.c' ) deps += ['cfgfile', 'common_mvep'] diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index adb07d0..a4951d3 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -23,6 +23,7 @@ #include #include "mrvl_ethdev.h" #include "mrvl_qos.h" +#include "mrvl_mtr.h" /* bitmask with reserved hifs */ #define MRVL_MUSDK_HIFS_RESERVED 0x0F @@ -627,6 +628,8 @@ mrvl_dev_start(struct rte_eth_dev *dev) goto out; } + mrvl_mtr_init(dev); + return 0; out: MRVL_LOG(ERR, "Failed to start device"); @@ -765,6 +768,7 @@ mrvl_dev_close(struct rte_eth_dev *dev) mrvl_flush_rx_queues(dev); mrvl_flush_tx_shadow_queues(dev); + mrvl_mtr_deinit(dev); for (i = 0; i < priv->ppio_params.inqs_params.num_tcs; ++i) { struct pp2_ppio_tc_params *tc_params = @@ -1868,6 +1872,25 @@ mrvl_eth_filter_ctrl(struct rte_eth_dev *dev __rte_unused, } } +/** + * DPDK callback to get rte_mtr callbacks. + * + * @param dev + * Pointer to the device structure. + * @param ops + * Pointer to pass the mtr ops. + * + * @return + * Always 0. + */ +static int +mrvl_mtr_ops_get(struct rte_eth_dev *dev __rte_unused, void *ops) +{ + *(const void **)ops = &mrvl_mtr_ops; + + return 0; +} + static const struct eth_dev_ops mrvl_ops = { .dev_configure = mrvl_dev_configure, .dev_start = mrvl_dev_start, @@ -1905,6 +1928,7 @@ static const struct eth_dev_ops mrvl_ops = { .rss_hash_update = mrvl_rss_hash_update, .rss_hash_conf_get = mrvl_rss_hash_conf_get, .filter_ctrl = mrvl_eth_filter_ctrl, + .mtr_ops_get = mrvl_mtr_ops_get, }; /** diff --git a/drivers/net/mvpp2/mrvl_ethdev.h b/drivers/net/mvpp2/mrvl_ethdev.h index 2204be2..ecb8fdc 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.h +++ b/drivers/net/mvpp2/mrvl_ethdev.h @@ -9,6 +9,7 @@ #include #include +#include /* * container_of is defined by both DPDK and MUSDK, @@ -70,6 +71,69 @@ /** Minimum number of sent buffers to release from shadow queue to BM */ #define MRVL_PP2_BUF_RELEASE_BURST_SIZE64 +/** Maximum length of a match string */ +#define MRVL_MATCH_LEN 16 + +/** Parsed fields in processed rte_flow_item. */ +enum mrvl_parsed_fields { + /* eth flags */ + F_DMAC = BIT(0), + F_SMAC = BIT(1), + F_TYPE = BIT(2), + /* vlan flags */ + F_VLAN_PRI = BIT(3), + F_VLAN_ID = BIT(4), + F_VLAN_TCI = BIT(5), /* not supported by MUSDK yet */ + /* ip4 flags */ + F_IP4_TOS = BIT(6), + F_IP4_SIP = BIT(7), + F_IP4_DIP = BIT(8), + F_IP4_PROTO =BIT(9), + /* ip6 flags */ + F_IP6_TC = BIT(10), /* not supported by MUSDK yet */ + F_IP6_SIP = BIT(11), + F_IP6_DIP = BIT(12), + F_IP6_FLOW = BIT(13), + F_IP6_NEXT_HDR = BIT(14), + /* tcp flags */ + F_TCP_SPORT =BIT(15), + F_TCP_DPORT =BIT(16), + /* udp flags */ + F_UDP_SPORT =BIT(17), + F_UDP_DPORT =BIT(18), +}; + +/** PMD-specific definition of a flow rule handle. */ +struct mrvl_mtr; +struct rte_flow { + LIST_ENTRY(rte_flow) next; + struct mrvl_mtr *mtr; + + enum mrvl_parsed_fields pattern; + + struct pp2_cls_tbl_rule rule; + struct pp2_cls_cos_desc cos; + struct pp2_cls_tbl_action action; +}; + +struct mrvl_mtr_profile { + L
[dpdk-dev] [PATCH v2 04/12] net/mvpp2: change default policer configuration
Change QoS configuration file syntax for port's default policer setup. Since default policer configuration is performed before any other policer configuration we can pick a default id. This simplifies default policer configuration since user no longer has to choose ids from range [0, PP2_CLS_PLCR_NUM]. Explicitly document values for rate_limit_enable field. Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov Reviewed-by: Liron Himi --- doc/guides/nics/mvpp2.rst | 31 --- drivers/net/mvpp2/mrvl_ethdev.c | 6 +- drivers/net/mvpp2/mrvl_ethdev.h | 2 +- drivers/net/mvpp2/mrvl_qos.c| 198 ++-- drivers/net/mvpp2/mrvl_qos.h| 2 +- 5 files changed, 134 insertions(+), 105 deletions(-) diff --git a/doc/guides/nics/mvpp2.rst b/doc/guides/nics/mvpp2.rst index 0408752..a452c8a 100644 --- a/doc/guides/nics/mvpp2.rst +++ b/doc/guides/nics/mvpp2.rst @@ -152,20 +152,23 @@ Configuration syntax .. code-block:: console - [port default] - default_tc = - mapping_priority = - policer_enable = + [policer ] token_unit = color = cir = ebs = cbs = + [port default] + default_tc = + mapping_priority = + rate_limit_enable = rate_limit = burst_size = + default_policer = + [port tc ] rxq = pcp = @@ -201,7 +204,9 @@ Where: - : List of DSCP values to handle in particular TC (e.g. 0-12 32-48 63). -- : Enable ingress policer. +- : Id of the policer configuration section to be used as default. + +- : Id of the policer configuration section (0..31). - : Policer token unit (`bytes` or `packets`). @@ -215,7 +220,7 @@ Where: - : Default color for specific tc. -- : Enables per port or per txq rate limiting. +- : Enables per port or per txq rate limiting (`0`/`1` to disable/enable). - : Committed information rate, in kilo bits per second. @@ -234,6 +239,13 @@ Configuration file example .. code-block:: console + [policer 0] + token_unit = bytes + color = blind + cir = 10 + ebs = 64 + cbs = 64 + [port 0 default] default_tc = 0 mapping_priority = ip @@ -265,12 +277,7 @@ Configuration file example default_tc = 0 mapping_priority = vlan/ip - policer_enable = 1 - token_unit = bytes - color = blind - cir = 10 - ebs = 64 - cbs = 64 + default_policer = 0 [port 1 tc 0] rxq = 0 diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index a4951d3..1464385 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -798,9 +798,9 @@ mrvl_dev_close(struct rte_eth_dev *dev) } /* policer must be released after ppio deinitialization */ - if (priv->policer) { - pp2_cls_plcr_deinit(priv->policer); - priv->policer = NULL; + if (priv->default_policer) { + pp2_cls_plcr_deinit(priv->default_policer); + priv->default_policer = NULL; } } diff --git a/drivers/net/mvpp2/mrvl_ethdev.h b/drivers/net/mvpp2/mrvl_ethdev.h index ecb8fdc..de423a9 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.h +++ b/drivers/net/mvpp2/mrvl_ethdev.h @@ -168,7 +168,7 @@ struct mrvl_priv { uint32_t cls_tbl_pattern; LIST_HEAD(mrvl_flows, rte_flow) flows; - struct pp2_cls_plcr *policer; + struct pp2_cls_plcr *default_policer; LIST_HEAD(profiles, mrvl_mtr_profile) profiles; LIST_HEAD(mtrs, mrvl_mtr) mtrs; diff --git a/drivers/net/mvpp2/mrvl_qos.c b/drivers/net/mvpp2/mrvl_qos.c index eeb46f8..e039635 100644 --- a/drivers/net/mvpp2/mrvl_qos.c +++ b/drivers/net/mvpp2/mrvl_qos.c @@ -42,7 +42,8 @@ #define MRVL_TOK_WRR_WEIGHT "wrr_weight" /* policer specific configuration tokens */ -#define MRVL_TOK_PLCR_ENABLE "policer_enable" +#define MRVL_TOK_PLCR "policer" +#define MRVL_TOK_PLCR_DEFAULT "default_policer" #define MRVL_TOK_PLCR_UNIT "token_unit" #define MRVL_TOK_PLCR_UNIT_BYTES "bytes" #define MRVL_TOK_PLCR_UNIT_PACKETS "packets" @@ -368,6 +369,9 @@ parse_tc_cfg(struct rte_cfgfile *file, int port, int tc, cfg->port[port].tc[tc].dscps = n; } + if (!cfg->port[port].setup_policer) + return 0; + entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_PLCR_DEFAULT_COLOR); if (entry) { @@ -390,6 +394,85 @@ parse_tc_cfg(struct rte_cfgfile *file, int port, int tc, } /** + * Parse default port policer. + * + * @param file Config file handle. + * @param sec_name Section name with policer configuration + * @param port Port number. + * @param cfg[out] Parsing results. + * @returns 0 in case of success, negative value otherwise. + */ +static int +parse_policer(struct rte_cfgfile *file, int port, const char *sec_name, + struct mrvl_qos_cfg *cfg) +{ + const char *entry; + uint32_t val; + + /* Read policer token uni
[dpdk-dev] [PATCH v2 05/12] net/mvpp2: add init and deinit to flow
Add init and deinit functionality to flow implementation. Init puts structures used by flow in a sane sate. Deinit deallocates all resources used by flow. Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov Reviewed-by: Liron Himi Reviewed-by: Shlomi Gridish --- drivers/net/mvpp2/mrvl_ethdev.c | 3 +++ drivers/net/mvpp2/mrvl_flow.c | 33 - drivers/net/mvpp2/mrvl_flow.h | 15 +++ 3 files changed, 50 insertions(+), 1 deletion(-) create mode 100644 drivers/net/mvpp2/mrvl_flow.h diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index 1464385..5e3a106 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -23,6 +23,7 @@ #include #include "mrvl_ethdev.h" #include "mrvl_qos.h" +#include "mrvl_flow.h" #include "mrvl_mtr.h" /* bitmask with reserved hifs */ @@ -628,6 +629,7 @@ mrvl_dev_start(struct rte_eth_dev *dev) goto out; } + mrvl_flow_init(dev); mrvl_mtr_init(dev); return 0; @@ -768,6 +770,7 @@ mrvl_dev_close(struct rte_eth_dev *dev) mrvl_flush_rx_queues(dev); mrvl_flush_tx_shadow_queues(dev); + mrvl_flow_deinit(dev); mrvl_mtr_deinit(dev); for (i = 0; i < priv->ppio_params.inqs_params.num_tcs; ++i) { diff --git a/drivers/net/mvpp2/mrvl_flow.c b/drivers/net/mvpp2/mrvl_flow.c index e6953e4..065b1aa 100644 --- a/drivers/net/mvpp2/mrvl_flow.c +++ b/drivers/net/mvpp2/mrvl_flow.c @@ -11,7 +11,7 @@ #include -#include "mrvl_ethdev.h" +#include "mrvl_flow.h" #include "mrvl_qos.h" /** Number of rules in the classifier table. */ @@ -2790,3 +2790,34 @@ const struct rte_flow_ops mrvl_flow_ops = { .flush = mrvl_flow_flush, .isolate = mrvl_flow_isolate }; + +/** + * Initialize flow resources. + * + * @param dev Pointer to the device. + */ +void +mrvl_flow_init(struct rte_eth_dev *dev) +{ + struct mrvl_priv *priv = dev->data->dev_private; + + LIST_INIT(&priv->flows); +} + +/** + * Cleanup flow resources. + * + * @param dev Pointer to the device. + */ +void +mrvl_flow_deinit(struct rte_eth_dev *dev) +{ + struct mrvl_priv *priv = dev->data->dev_private; + + mrvl_flow_flush(dev, NULL); + + if (priv->cls_tbl) { + pp2_cls_tbl_deinit(priv->cls_tbl); + priv->cls_tbl = NULL; + } +} diff --git a/drivers/net/mvpp2/mrvl_flow.h b/drivers/net/mvpp2/mrvl_flow.h new file mode 100644 index 000..f63747c --- /dev/null +++ b/drivers/net/mvpp2/mrvl_flow.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Marvell International Ltd. + * Copyright(c) 2018 Semihalf. + * All rights reserved. + */ + +#ifndef _MRVL_FLOW_H_ +#define _MRVL_FLOW_H_ + +#include "mrvl_ethdev.h" + +void mrvl_flow_init(struct rte_eth_dev *dev); +void mrvl_flow_deinit(struct rte_eth_dev *dev); + +#endif /* _MRVL_FLOW_H_ */ -- 2.7.4
[dpdk-dev] [PATCH v2 09/12] net/mvpp2: align with MUSDK 18.09
This patch introduces necessary changes required by MUSDK 18.09 library. * As of MUSDK 18.09, pp2_cookie_t is no longer available. Now RX descriptor cookie is defined as plain u64 so existing cast is no longer valid. * MUSDK 18.09 increased number of available bpools (buffer hw pools) by introducing dma regions support. Update mvpp2 driver accordingly. * replace MV_NET_IP4_F_TOS with MV_NET_IP4_F_DSCP Before this patch, API allowed to configure a classification rule according to IPv4 TOS, which was not supported in classifier. This patch fixes this by using proper field. * use 48 bit address mask We cannot get pointers exceeding 48 bits thus using 48 bit mask for extracting higher IOVA address bits is enough. Signed-off-by: Natalie Samsonov Signed-off-by: Yuval Caduri Signed-off-by: Tomasz Duszynski Reviewed-by: Shlomi Gridish Reviewed-by: Alan Winkowski Reviewed-by: Liron Himi --- drivers/net/mvpp2/mrvl_ethdev.c | 10 -- drivers/net/mvpp2/mrvl_flow.c | 3 ++- drivers/net/mvpp2/mrvl_qos.c| 2 +- 3 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index 035ee81..899a9e4 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -54,9 +54,7 @@ #define MRVL_ARP_LENGTH 28 #define MRVL_COOKIE_ADDR_INVALID ~0ULL - -#define MRVL_COOKIE_HIGH_ADDR_SHIFT(sizeof(pp2_cookie_t) * 8) -#define MRVL_COOKIE_HIGH_ADDR_MASK (~0ULL << MRVL_COOKIE_HIGH_ADDR_SHIFT) +#define MRVL_COOKIE_HIGH_ADDR_MASK 0xff00 /** Port Rx offload capabilities */ #define MRVL_RX_OFFLOADS (DEV_RX_OFFLOAD_VLAN_FILTER | \ @@ -1544,7 +1542,7 @@ mrvl_fill_bpool(struct mrvl_rxq *rxq, int num) entries[i].buff.addr = rte_mbuf_data_iova_default(mbufs[i]); - entries[i].buff.cookie = (pp2_cookie_t)(uint64_t)mbufs[i]; + entries[i].buff.cookie = (uint64_t)mbufs[i]; entries[i].bpool = bpool; } @@ -2180,7 +2178,7 @@ mrvl_rx_pkt_burst(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) if (unlikely(status != PP2_DESC_ERR_OK)) { struct pp2_buff_inf binf = { .addr = rte_mbuf_data_iova_default(mbuf), - .cookie = (pp2_cookie_t)(uint64_t)mbuf, + .cookie = (uint64_t)mbuf, }; pp2_bpool_put_buff(hif, bpool, &binf); @@ -2441,7 +2439,7 @@ mrvl_tx_pkt_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) rte_mbuf_prefetch_part2(pref_pkt_hdr); } - sq->ent[sq->head].buff.cookie = (pp2_cookie_t)(uint64_t)mbuf; + sq->ent[sq->head].buff.cookie = (uint64_t)mbuf; sq->ent[sq->head].buff.addr = rte_mbuf_data_iova_default(mbuf); sq->ent[sq->head].bpool = diff --git a/drivers/net/mvpp2/mrvl_flow.c b/drivers/net/mvpp2/mrvl_flow.c index 065b1aa..ffd1dab 100644 --- a/drivers/net/mvpp2/mrvl_flow.c +++ b/drivers/net/mvpp2/mrvl_flow.c @@ -2437,7 +2437,8 @@ mrvl_create_cls_table(struct rte_eth_dev *dev, struct rte_flow *first_flow) if (first_flow->pattern & F_IP4_TOS) { key->proto_field[key->num_fields].proto = MV_NET_PROTO_IP4; - key->proto_field[key->num_fields].field.ipv4 = MV_NET_IP4_F_TOS; + key->proto_field[key->num_fields].field.ipv4 = + MV_NET_IP4_F_DSCP; key->key_size += 1; key->num_fields += 1; } diff --git a/drivers/net/mvpp2/mrvl_qos.c b/drivers/net/mvpp2/mrvl_qos.c index 5d80c3e..7fd9703 100644 --- a/drivers/net/mvpp2/mrvl_qos.c +++ b/drivers/net/mvpp2/mrvl_qos.c @@ -654,7 +654,7 @@ setup_tc(struct pp2_ppio_tc_params *param, uint8_t inqs, struct pp2_ppio_inq_params *inq_params; param->pkt_offset = MRVL_PKT_OFFS; - param->pools[0] = bpool; + param->pools[0][0] = bpool; param->default_color = color; inq_params = rte_zmalloc_socket("inq_params", -- 2.7.4
[dpdk-dev] [PATCH v2 08/12] net/mvpp2: update MTU and MRU related calculations
From: Natalie Samsonov This commit updates MTU and MRU related calculations. Signed-off-by: Natalie Samsonov Reviewed-by: Yelena Krivosheev Reviewed-by: Dmitri Epshtein --- drivers/net/mvpp2/mrvl_ethdev.c | 70 +++-- drivers/net/mvpp2/mrvl_ethdev.h | 7 + 2 files changed, 61 insertions(+), 16 deletions(-) diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index 5643e7d..035ee81 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -325,7 +325,7 @@ mrvl_dev_configure(struct rte_eth_dev *dev) if (dev->data->dev_conf.rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len - -ETHER_HDR_LEN - ETHER_CRC_LEN; +MRVL_PP2_ETH_HDRS_LEN; ret = mrvl_configure_rxqs(priv, dev->data->port_id, dev->data->nb_rx_queues); @@ -375,21 +375,55 @@ static int mrvl_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) { struct mrvl_priv *priv = dev->data->dev_private; - /* extra MV_MH_SIZE bytes are required for Marvell tag */ - uint16_t mru = mtu + MV_MH_SIZE + ETHER_HDR_LEN + ETHER_CRC_LEN; + uint16_t mru; + uint16_t mbuf_data_size = 0; /* SW buffer size */ int ret; - if (mtu < ETHER_MIN_MTU || mru > MRVL_PKT_SIZE_MAX) + mru = MRVL_PP2_MTU_TO_MRU(mtu); + /* +* min_rx_buf_size is equal to mbuf data size +* if pmd didn't set it differently +*/ + mbuf_data_size = dev->data->min_rx_buf_size - RTE_PKTMBUF_HEADROOM; + /* Prevent PMD from: +* - setting mru greater than the mbuf size resulting in +* hw and sw buffer size mismatch +* - setting mtu that requires the support of scattered packets +* when this feature has not been enabled/supported so far +* (TODO check scattered_rx flag here once scattered RX is supported). +*/ + if (mru + MRVL_PKT_OFFS > mbuf_data_size) { + mru = mbuf_data_size - MRVL_PKT_OFFS; + mtu = MRVL_PP2_MRU_TO_MTU(mru); + MRVL_LOG(WARNING, "MTU too big, max MTU possible limitted " + "by current mbuf size: %u. Set MTU to %u, MRU to %u", + mbuf_data_size, mtu, mru); + } + + if (mtu < ETHER_MIN_MTU || mru > MRVL_PKT_SIZE_MAX) { + MRVL_LOG(ERR, "Invalid MTU [%u] or MRU [%u]", mtu, mru); return -EINVAL; + } + + dev->data->mtu = mtu; + dev->data->dev_conf.rxmode.max_rx_pkt_len = mru - MV_MH_SIZE; if (!priv->ppio) return 0; ret = pp2_ppio_set_mru(priv->ppio, mru); - if (ret) + if (ret) { + MRVL_LOG(ERR, "Failed to change MRU"); return ret; + } + + ret = pp2_ppio_set_mtu(priv->ppio, mtu); + if (ret) { + MRVL_LOG(ERR, "Failed to change MTU"); + return ret; + } - return pp2_ppio_set_mtu(priv->ppio, mtu); + return 0; } /** @@ -600,6 +634,9 @@ mrvl_dev_start(struct rte_eth_dev *dev) } priv->vlan_flushed = 1; } + ret = mrvl_mtu_set(dev, dev->data->mtu); + if (ret) + MRVL_LOG(ERR, "Failed to set MTU to %d", dev->data->mtu); /* For default QoS config, don't start classifier. */ if (mrvl_qos_cfg && @@ -1552,8 +1589,8 @@ mrvl_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, { struct mrvl_priv *priv = dev->data->dev_private; struct mrvl_rxq *rxq; - uint32_t min_size, -max_rx_pkt_len = dev->data->dev_conf.rxmode.max_rx_pkt_len; + uint32_t frame_size, buf_size = rte_pktmbuf_data_room_size(mp); + uint32_t max_rx_pkt_len = dev->data->dev_conf.rxmode.max_rx_pkt_len; int ret, tc, inq; uint64_t offloads; @@ -1568,15 +1605,16 @@ mrvl_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, return -EFAULT; } - min_size = rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM - - MRVL_PKT_EFFEC_OFFS; - if (min_size < max_rx_pkt_len) { - MRVL_LOG(ERR, - "Mbuf size must be increased to %u bytes to hold up to %u bytes of data.", - max_rx_pkt_len + RTE_PKTMBUF_HEADROOM + - MRVL_PKT_EFFEC_OFFS, + frame_size = buf_size - RTE_PKTMBUF_HEADROOM - MRVL_PKT_EFFEC_OFFS; + if (frame_size < max_rx_pkt_len) { + MRVL_LOG(WARNING, + "Mbuf size must be increased to %u bytes to hold up " + "to %u bytes of data.", + buf_size + max_rx_pkt_len - frame_size, max_rx_pkt_len); - return -EINVAL; +
[dpdk-dev] [PATCH v2 07/12] net/mvpp2: detach tx_qos from rx cls/qos config
From: Yuval Caduri Functional change: Open receive cls/qos related features, only if the config file contains an rx_related configuration entry. This allows to configure tx_related entries, w/o unintentionally opening rx cls/qos. Code: 'use_global_defaults' is by default set to '1'. Only if an rx_related entry was configured, it is updated to '0'. rx cls/qos is performed only if 'use_global_defaults' is '0'. Default TC configuration is now only mandatory when 'use_global_defaults' is '0'. Signed-off-by: Yuval Caduri Reviewed-by: Natalie Samsonov Tested-by: Natalie Samsonov --- drivers/net/mvpp2/mrvl_ethdev.c | 3 ++- drivers/net/mvpp2/mrvl_qos.c| 41 +++-- 2 files changed, 25 insertions(+), 19 deletions(-) diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index a1dc6b1..5643e7d 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -602,7 +602,8 @@ mrvl_dev_start(struct rte_eth_dev *dev) } /* For default QoS config, don't start classifier. */ - if (mrvl_qos_cfg) { + if (mrvl_qos_cfg && + mrvl_qos_cfg->port[dev->data->port_id].use_global_defaults == 0) { ret = mrvl_start_qos_mapping(priv); if (ret) { MRVL_LOG(ERR, "Failed to setup QoS mapping"); diff --git a/drivers/net/mvpp2/mrvl_qos.c b/drivers/net/mvpp2/mrvl_qos.c index e039635..5d80c3e 100644 --- a/drivers/net/mvpp2/mrvl_qos.c +++ b/drivers/net/mvpp2/mrvl_qos.c @@ -324,6 +324,7 @@ parse_tc_cfg(struct rte_cfgfile *file, int port, int tc, if (rte_cfgfile_num_sections(file, sec_name, strlen(sec_name)) <= 0) return 0; + cfg->port[port].use_global_defaults = 0; entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_RXQ); if (entry) { n = get_entry_values(entry, @@ -421,7 +422,7 @@ parse_policer(struct rte_cfgfile *file, int port, const char *sec_name, cfg->port[port].policer_params.token_unit = PP2_CLS_PLCR_PACKETS_TOKEN_UNIT; } else { - RTE_LOG(ERR, PMD, "Unknown token: %s\n", entry); + MRVL_LOG(ERR, "Unknown token: %s", entry); return -1; } } @@ -438,7 +439,7 @@ parse_policer(struct rte_cfgfile *file, int port, const char *sec_name, cfg->port[port].policer_params.color_mode = PP2_CLS_PLCR_COLOR_AWARE_MODE; } else { - RTE_LOG(ERR, PMD, "Error in parsing: %s\n", entry); + MRVL_LOG(ERR, "Error in parsing: %s", entry); return -1; } } @@ -518,28 +519,15 @@ mrvl_get_qoscfg(const char *key __rte_unused, const char *path, snprintf(sec_name, sizeof(sec_name), "%s %d %s", MRVL_TOK_PORT, n, MRVL_TOK_DEFAULT); + /* Use global defaults, unless an override occurs */ + (*cfg)->port[n].use_global_defaults = 1; + /* Skip ports non-existing in configuration. */ if (rte_cfgfile_num_sections(file, sec_name, strlen(sec_name)) <= 0) { - (*cfg)->port[n].use_global_defaults = 1; - (*cfg)->port[n].mapping_priority = - PP2_CLS_QOS_TBL_VLAN_IP_PRI; continue; } - entry = rte_cfgfile_get_entry(file, sec_name, - MRVL_TOK_DEFAULT_TC); - if (entry) { - if (get_val_securely(entry, &val) < 0 || - val > USHRT_MAX) - return -1; - (*cfg)->port[n].default_tc = (uint8_t)val; - } else { - MRVL_LOG(ERR, - "Default Traffic Class required in custom configuration!"); - return -1; - } - /* * Read per-port rate limiting. Setting that will * disable per-queue rate limiting. @@ -573,6 +561,7 @@ mrvl_get_qoscfg(const char *key __rte_unused, const char *path, entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_MAPPING_PRIORITY); if (entry) { + (*cfg)->port[n].use_global_defaults = 0; if (!strncmp(entry, MRVL_TOK_VLAN_IP, sizeof(MRVL_TOK_VLAN_IP))) (*cfg)->port[n].mapping_priority = @@ -602,6 +591,7 @@ mrvl_get_qoscfg(const char *key __rte_unused, const char *path, entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_PLCR_DEFAULT);
[dpdk-dev] [PATCH v2 06/12] net/mvpp2: add traffic manager support
Add traffic manager support. Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov Reviewed-by: Liron Himi --- drivers/net/mvpp2/Makefile |1 + drivers/net/mvpp2/meson.build |3 +- drivers/net/mvpp2/mrvl_ethdev.c | 26 + drivers/net/mvpp2/mrvl_ethdev.h | 31 ++ drivers/net/mvpp2/mrvl_tm.c | 1009 +++ drivers/net/mvpp2/mrvl_tm.h | 15 + 6 files changed, 1084 insertions(+), 1 deletion(-) create mode 100644 drivers/net/mvpp2/mrvl_tm.c create mode 100644 drivers/net/mvpp2/mrvl_tm.h diff --git a/drivers/net/mvpp2/Makefile b/drivers/net/mvpp2/Makefile index 4848d65..661d2cd 100644 --- a/drivers/net/mvpp2/Makefile +++ b/drivers/net/mvpp2/Makefile @@ -40,5 +40,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_ethdev.c SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_qos.c SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_flow.c SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_mtr.c +SRCS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mrvl_tm.c include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/mvpp2/meson.build b/drivers/net/mvpp2/meson.build index f475511..70ef2d6 100644 --- a/drivers/net/mvpp2/meson.build +++ b/drivers/net/mvpp2/meson.build @@ -20,7 +20,8 @@ sources = files( 'mrvl_ethdev.c', 'mrvl_flow.c', 'mrvl_qos.c', - 'mrvl_mtr.c' + 'mrvl_mtr.c', + 'mrvl_tm.c' ) deps += ['cfgfile', 'common_mvep'] diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index 5e3a106..a1dc6b1 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -25,6 +25,7 @@ #include "mrvl_qos.h" #include "mrvl_flow.h" #include "mrvl_mtr.h" +#include "mrvl_tm.h" /* bitmask with reserved hifs */ #define MRVL_MUSDK_HIFS_RESERVED 0x0F @@ -340,6 +341,10 @@ mrvl_dev_configure(struct rte_eth_dev *dev) priv->ppio_params.maintain_stats = 1; priv->nb_rx_queues = dev->data->nb_rx_queues; + ret = mrvl_tm_init(dev); + if (ret < 0) + return ret; + if (dev->data->nb_rx_queues == 1 && dev->data->dev_conf.rxmode.mq_mode == ETH_MQ_RX_RSS) { MRVL_LOG(WARNING, "Disabling hash for 1 rx queue"); @@ -794,6 +799,7 @@ mrvl_dev_close(struct rte_eth_dev *dev) } mrvl_flush_bpool(dev); + mrvl_tm_deinit(dev); if (priv->ppio) { pp2_ppio_deinit(priv->ppio); @@ -1894,6 +1900,25 @@ mrvl_mtr_ops_get(struct rte_eth_dev *dev __rte_unused, void *ops) return 0; } +/** + * DPDK callback to get rte_tm callbacks. + * + * @param dev + * Pointer to the device structure. + * @param ops + * Pointer to pass the tm ops. + * + * @return + * Always 0. + */ +static int +mrvl_tm_ops_get(struct rte_eth_dev *dev __rte_unused, void *ops) +{ + *(const void **)ops = &mrvl_tm_ops; + + return 0; +} + static const struct eth_dev_ops mrvl_ops = { .dev_configure = mrvl_dev_configure, .dev_start = mrvl_dev_start, @@ -1932,6 +1957,7 @@ static const struct eth_dev_ops mrvl_ops = { .rss_hash_conf_get = mrvl_rss_hash_conf_get, .filter_ctrl = mrvl_eth_filter_ctrl, .mtr_ops_get = mrvl_mtr_ops_get, + .tm_ops_get = mrvl_tm_ops_get, }; /** diff --git a/drivers/net/mvpp2/mrvl_ethdev.h b/drivers/net/mvpp2/mrvl_ethdev.h index de423a9..984f31e 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.h +++ b/drivers/net/mvpp2/mrvl_ethdev.h @@ -10,6 +10,7 @@ #include #include #include +#include /* * container_of is defined by both DPDK and MUSDK, @@ -134,6 +135,29 @@ struct mrvl_mtr { struct pp2_cls_plcr *plcr; }; +struct mrvl_tm_shaper_profile { + LIST_ENTRY(mrvl_tm_shaper_profile) next; + uint32_t id; + int refcnt; + struct rte_tm_shaper_params params; +}; + +enum { + MRVL_NODE_PORT, + MRVL_NODE_QUEUE, +}; + +struct mrvl_tm_node { + LIST_ENTRY(mrvl_tm_node) next; + uint32_t id; + uint32_t type; + int refcnt; + struct mrvl_tm_node *parent; + struct mrvl_tm_shaper_profile *profile; + uint8_t weight; + uint64_t stats_mask; +}; + struct mrvl_priv { /* Hot fields, used in fast path. */ struct pp2_bpool *bpool; /**< BPool pointer */ @@ -173,6 +197,10 @@ struct mrvl_priv { LIST_HEAD(profiles, mrvl_mtr_profile) profiles; LIST_HEAD(mtrs, mrvl_mtr) mtrs; uint32_t used_plcrs; + + LIST_HEAD(shaper_profiles, mrvl_tm_shaper_profile) shaper_profiles; + LIST_HEAD(nodes, mrvl_tm_node) nodes; + uint64_t rate_max; }; /** Flow operations forward declaration. */ @@ -181,6 +209,9 @@ extern const struct rte_flow_ops mrvl_flow_ops; /** Meter operations forward declaration. */ extern const struct rte_mtr_ops mrvl_mtr_ops; +/** Traffic manager operations forward declaration. */ +extern const struct rte_tm_ops mrvl_tm_ops; + /** Current log type. */ extern int mrvl_logtype; diff --git a/drivers/net/mvp
[dpdk-dev] [PATCH v2 11/12] net/mvpp2: document MTR and TM usage
From: Natalie Samsonov Document MTR (metering) and TM (traffic management) usage plus do some small updates here and there. Signed-off-by: Natalie Samsonov --- doc/guides/nics/img/mvpp2_tm.png | Bin 0 -> 5355 bytes doc/guides/nics/mvpp2.rst| 386 +-- 2 files changed, 335 insertions(+), 51 deletions(-) create mode 100644 doc/guides/nics/img/mvpp2_tm.png diff --git a/doc/guides/nics/img/mvpp2_tm.png b/doc/guides/nics/img/mvpp2_tm.png new file mode 100644 index ..3c709bab43950c39578a6726ab570008bc722e2f GIT binary patch literal 5355 zcmeHL=U-FX(%vBudXXkgk$yk~Q3z53C{4tG2UHXUA}B>dITY!TXatcaBA^CDq^Ssr zCm^CUrK6#X3B82Sh0sfYyYXJ{?|ttda6jD-N!EJy+B5T;HS^3&;x1U3^KwaW0RX^j zan{5J03dj9oy`FQzty7x?ZF>NfQ|VXprlP|8r-nDpGKYrfOpB<%Y8zcaP zDFOfn2LM|jgqZ?>pyL2Ag8~5ECjfBJFZ1J{`T!v8X<>5uVu;gxmWHpCwopu(RsOlO zTRdWBT=}-w#kM&8XB8m>H1-e8(tgE_MyK7NFETZ_#dTX7dceJvyv!n4WfgP@VK0ZDLsi{Kr==N21k2U?z zh;Odm{jBgv8E*7eTXazv8@$#ngcrb9yRn0SArZn-XZ+&Qf!{!Mk#1cFrf*RKfJc(~ zC@WdfC@5=8tWlBdZ(#3oP)rV_Y*Y(CZ#_FioYvc`g>mrkV=tTj8z`RPF^1l%^8zR< zT6(DN;9V@QfY@P{;s^f*9KMUkzA^7TLel(_bQNZI8>u+HAP1|wJ z7w}3L2I)+K!Xqc8DFAjYN{<@(BvW>q6N*HmjqQ`NEc zh1S`Rut$KW-A~TUfg;xlU)L~+hKQd<=76v+epG8j1n#0MBfx6HA{5$e@O8GU`0RSF z<3ppHe5d-TI08h{s9O+SCyD&p;^trw-!FuAstODG;oWSUUe7AnmB``YR2>n@E*5%> z8Of38ot0Q==k#Mj6E$v|ygegS8`B2fZpa;ENVQq*i%AqK=1Ba0U@0$dqG=_t7o zn+p(0)PE>A;qKw^7R>FD#yx*o}`^4G8lk9kP z=W6D#DN7d_r>Mx_=#uH^KsJfY4clk`rLa1AuA z(yEfCIPOB3@|Up7Eq1Q)V!Y6!H23*N@J1!`A2GjSqaR@IHoAK@XDY1BDXrTzno5rn zeshqL@J6ZHog^Jpnm4RRnL5~<4s)ncGrwwCT;-ILyRACDs&Z&{M{YMtl#Muzx7!aj zht0kjl-xtkQQ2V)fsu~U1eq)GU%=QcgM#5+V~Qjrh9);4`67gt>Vdy3K8)_$0nPE{F|CO z*+SVp=^T<0#-TqiNmM*T$9pgGo(Rv-}&^;F(HNv^JAsx?r7m-F-zisCuB$Z zH#Ugt{j!TR%LOpgR`wYZr%B1r$CKgj<##Y z1fK6hPFJ@YFskqxtk-e{!0aprZ(!LuH^#yGRu%k+DNMl;W_O-`;hnzZ z9xy!P{2<5f!?8hv@=)@I^=Q|0g3ORGwlDz#8o;t|^|{0zvD8R8D{PA(Px|e*W;|DI z`hyJ9*HV8p{Q>fbcKbg)Qh}zF_G!F783Q0_;^1{i>|aOmEs$W>RsU-o#R@k!H}Bru ztY|T=#xZx+hw=64k?)iC=BT9UmQ*FT?daRL;aQoP5zCbt=YxCNbt*d(o`o2!8@mdP zTtTR*s~2{6cYklJ`S|f!QgSkR&6&QlwIxBB(?Mg&&y~bptr*(BcPbzywV$wy=AAb_ z1eHYHg(YaAc%ic!wJ=H#=8%~yT2t$hNkY)UPg*>0$D#P3+4lHo;6Kg*qW?Sh-}DRZ zSmWyIsywVo&YN%lmY0{8eLYJxj(JfAhK8(SV`GD}u}xc(BV%JOTU)C|PY-KPOHstc zF=_8Y?#DA|l1V_5N^mbRWoLJTA;*3&+#eGZ5utBUek?o&W{FO92qcgo0)df0N5zs#%HZ0pROPe-^d99RuSCitaEr_-%D=LMwW)(&-HyBtrA3uK~iA35S-dXGSMV5~B zjvN;^H1{emFHZ&Bx{aNboV}M7FvE|N$H!flARa%fc4mJNg4yNpD@(gO+lpZj`27LWoTEGoL{N1HEa9J_ObFQ4^*mw6VJymlyo8#O^Jd_5)IC;TjMpLG2BrH9m5|F$&1x-U82A9-4 z&O!Z`SYF9r?qYW->LD&<3q_TLcx zGX0>>3=0pcI|Y8_EN@h1A2i%!cI*2hX4dKSf`~_zqD+o$g%@!P_e6$@(Aj(H>p@GD zOOB3?gFy$Y=(|*s`{4UyI^sD$RSm`)8XDx+-_PdbMf=mXiHIns8088_vo5_kcym6e zsV_gXmCQ3S(9-oGG9v2}CS&{6lbz~puhbH=8F)B3N;>|gcQz+w;KRuv)ZL878yv)| zjEaOuIKn&vEQKLms#!=K4=gY7uR#XrLMXbGcc^>d*DCWCCIG&BKE16#U#Yh7b!UjK z>~w{CAUqI`SUu<~2|CR?jK+c$fr?N-!n5^Rd`?elb89f0vp9Qsn+1(YOnZNBtc&{i zEG}+Ma*q1idrYTx~7f)7h(r4?OLSoEHS`!P9P*mVE{%9qr&32U5Z@ zG_`y7OGEa=PAi*bf?>8{bT!9z{!Y~D%y;>}^0u!@;Fq_mosS&pOIns`_bS$}jNFEY zP8`uvuPs2JRRdo}9eGzb5WUOP@oG@+CEs13=6x~4sla5%8_ripZ2H&Of-}?es&G%{ zwzr0ju4J6j3r;1yT|fMqLRm;rH;&*Wk`-dA#ND7dw@p0?V>BEuRdC;Bd^Y?W*^UJaqu?bHqo}}vA zjWo~xl_8@j=ZZ`CTgXzw1|=jqsCc#&ep5Pt%MI!3^~KDv{M*e_?k?xA+sP6vN+x9X zGGb}6u(-o)p4URF%nI`uYlq-De6GhI(@k6+T%W?q5|v|(y|Z8a)uz}RN3)N#nfX&s zWHYO$SU=+<51tLNc+vV&d;0L~!nvSSHc6_6j00NrdN{%Lj2I3$7nj0W zCb74mw@BY^zIks|)8y@4zk9|Ig@yjrBMRK`kAeB~Rl7Y_rlzm5tT(QuLujGrcu=W| zeWJvaOpBji;*Y!d-9IwUcj7P<-Q}rDVs6T4<^dg)$Kf$6CMb@}XOws91GSPcXl!e3 zZ<5-~9M(@!|E^;wJl)c>K!+qLDZq#fs3ams^-HS0%8iSdzJ@l(`!X;2PK!K850qUn zk0Gn+Ts?`dw@oj3X$i4ted4(KbUTWh;xRwH^fatgt;iy>&CyY`{w9Zl^D0Y0|PmYU$EytdRRNX>>$|6wx(stD*RB3ZO}74JU< z@DC)2A66WFTDK~4_H{TXGNVpLAQCRDH8U&B` z?Y4Cq5nae9fn8?2ltcXuaq?9e2l-I426&8-=d6qO?_iNcjc-N3_FKk+fe{4x(Eh6N zHSB`~IlBcU*-2K)DJ=sy1&Rwq^bb3}{%8arj}KXy#o ti<%uDA3wda5=aON0EaxVnuTof6LM+ivt4^5!G9wG3sWnTk~63~{{Y^!D~|vG literal 0 HcmV?d1 diff --git a/doc/guides/nics/mvpp2.rst b/doc/guides/nics/mvpp2.rst index 3b3f8c6..e7f45c3 100644 --- a/doc/guides/nics/mvpp2.rst +++ b/doc/guides/nics/mvpp2.rst @@ -56,7 +56,7 @@ Features of the MVPP2 PMD are: - Speed capabilities - Link status -- Queue start/stop +- Tx Queue start/stop - MTU update - Jumbo frame - Promiscuous mode @@ -70,12 +70,13 @@ Features of the MVPP2 PMD are: - L4 checksum offload - Packet type parsing - Basic stats -- Extended stats -- QoS +- :ref:`Extended stats ` - RX flow control -- TX queue start/stop - Scattered TX frames - +- :ref:`QoS ` +- :ref:`Flow API ` +- :ref:`Traffic metering and policing ` +- :ref:`Traffic Management API ` Limitations --- @@ -89,6 +90,20 @@ Limitations functionality. Current workaround is to reset board so that PPv2 has a chance to start in a sane state. +- MUSDK architecture does not support changing configuration in run time. + All nessesary configurations should be done before first dev_start(). + +- RX queue start/stop is not supported. + +- Current implementation does not support replacement of buffers in the HW buffer pool +
[dpdk-dev] [PATCH v2 10/12] net/mvpp2: align documentation with MUSDK 18.09
From: Natalie Samsonov Update documentation to align with MUSDK 18.09. Signed-off-by: Natalie Samsonov --- doc/guides/nics/mvpp2.rst | 26 -- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/doc/guides/nics/mvpp2.rst b/doc/guides/nics/mvpp2.rst index a452c8a..3b3f8c6 100644 --- a/doc/guides/nics/mvpp2.rst +++ b/doc/guides/nics/mvpp2.rst @@ -74,6 +74,7 @@ Features of the MVPP2 PMD are: - QoS - RX flow control - TX queue start/stop +- Scattered TX frames Limitations @@ -96,19 +97,19 @@ Prerequisites .. code-block:: console - git clone https://github.com/MarvellEmbeddedProcessors/linux-marvell.git -b linux-4.4.52-armada-17.10 + git clone https://github.com/MarvellEmbeddedProcessors/linux-marvell.git -b linux-4.4.120-armada-18.09 - Out of tree `mvpp2x_sysfs` kernel module sources .. code-block:: console - git clone https://github.com/MarvellEmbeddedProcessors/mvpp2x-marvell.git -b mvpp2x-armada-17.10 + git clone https://github.com/MarvellEmbeddedProcessors/mvpp2x-marvell.git -b mvpp2x-armada-18.09 - MUSDK (Marvell User-Space SDK) sources .. code-block:: console - git clone https://github.com/MarvellEmbeddedProcessors/musdk-marvell.git -b musdk-armada-17.10 + git clone https://github.com/MarvellEmbeddedProcessors/musdk-marvell.git -b musdk-armada-18.09 MUSDK is a light-weight library that provides direct access to Marvell's PPv2 (Packet Processor v2). Alternatively prebuilt MUSDK library can be @@ -119,12 +120,6 @@ Prerequisites To get better understanding of the library one can consult documentation available in the ``doc`` top level directory of the MUSDK sources. - MUSDK must be configured with the following features: - - .. code-block:: console - - --enable-bpool-dma=64 - - DPDK environment Follow the DPDK :ref:`Getting Started Guide for Linux ` to setup @@ -140,6 +135,9 @@ The following options can be modified in the ``config`` file. Toggle compilation of the librte mvpp2 driver. +.. Note:: + + When MVPP2 PMD is enabled ``CONFIG_RTE_LIBRTE_MVNETA_PMD`` must be disabled QoS Configuration - @@ -314,7 +312,7 @@ Driver needs precompiled MUSDK library during compilation. export CROSS_COMPILE=/bin/aarch64-linux-gnu- ./bootstrap - ./configure --host=aarch64-linux-gnu --enable-bpool-dma=64 + ./configure --host=aarch64-linux-gnu make install MUSDK will be installed to `usr/local` under current directory. @@ -328,7 +326,8 @@ the path to the MUSDK installation directory needs to be exported. export LIBMUSDK_PATH=/usr/local export CROSS=aarch64-linux-gnu- make config T=arm64-armv8a-linuxapp-gcc - sed -ri 's,(MVPP2_PMD=)n,\1y,' build/.config + sed -i "s/MVNETA_PMD=y/MVNETA_PMD=n/" build/.config + sed -i "s/MVPP2_PMD=n/MVPP2_PMD=y/" build/.config make Flow API @@ -500,15 +499,14 @@ Usage Example - MVPP2 PMD requires extra out of tree kernel modules to function properly. -`musdk_uio` and `mv_pp_uio` sources are part of the MUSDK. Please consult +`musdk_cma` sources are part of the MUSDK. Please consult ``doc/musdk_get_started.txt`` for the detailed build instructions. For `mvpp2x_sysfs` please consult ``Documentation/pp22_sysfs.txt`` for the detailed build instructions. .. code-block:: console - insmod musdk_uio.ko - insmod mv_pp_uio.ko + insmod musdk_cma.ko insmod mvpp2x_sysfs.ko Additionally interfaces used by DPDK application need to be put up: -- 2.7.4
[dpdk-dev] [PATCH v2 12/12] net/mvpp2: add Tx S/G support
From: Zyta Szpak The patch introduces scatter/gather support on transmit path. A separate Tx callback is added and set if the application requests multisegment Tx offload. Multiple descriptors are sent per one packet. Signed-off-by: Zyta Szpak Signed-off-by: Natalie Samsonov Reviewed-by: Yelena Krivosheev --- drivers/net/mvpp2/mrvl_ethdev.c | 231 drivers/net/mvpp2/mrvl_ethdev.h | 1 + 2 files changed, 212 insertions(+), 20 deletions(-) diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c index 899a9e4..56b190e 100644 --- a/drivers/net/mvpp2/mrvl_ethdev.c +++ b/drivers/net/mvpp2/mrvl_ethdev.c @@ -65,7 +65,8 @@ /** Port Tx offloads capabilities */ #define MRVL_TX_OFFLOADS (DEV_TX_OFFLOAD_IPV4_CKSUM | \ DEV_TX_OFFLOAD_UDP_CKSUM | \ - DEV_TX_OFFLOAD_TCP_CKSUM) + DEV_TX_OFFLOAD_TCP_CKSUM | \ + DEV_TX_OFFLOAD_MULTI_SEGS) static const char * const valid_args[] = { MRVL_IFACE_NAME_ARG, @@ -105,7 +106,9 @@ struct mrvl_shadow_txq { int head; /* write index - used when sending buffers */ int tail; /* read index - used when releasing buffers */ u16 size; /* queue occupied size */ - u16 num_to_release; /* number of buffers sent, that can be released */ + u16 num_to_release; /* number of descriptors sent, that can be +* released +*/ struct buff_release_entry ent[MRVL_PP2_TX_SHADOWQ_SIZE]; /* q entries */ }; @@ -137,6 +140,12 @@ static inline void mrvl_free_sent_buffers(struct pp2_ppio *ppio, struct pp2_hif *hif, unsigned int core_id, struct mrvl_shadow_txq *sq, int qid, int force); +static uint16_t mrvl_tx_pkt_burst(void *txq, struct rte_mbuf **tx_pkts, + uint16_t nb_pkts); +static uint16_t mrvl_tx_sg_pkt_burst(void *txq,struct rte_mbuf **tx_pkts, +uint16_t nb_pkts); + + #define MRVL_XSTATS_TBL_ENTRY(name) { \ #name, offsetof(struct pp2_ppio_statistics, name), \ sizeof(((struct pp2_ppio_statistics *)0)->name) \ @@ -163,6 +172,31 @@ static struct { MRVL_XSTATS_TBL_ENTRY(tx_errors) }; +static inline void +mrvl_fill_shadowq(struct mrvl_shadow_txq *sq, struct rte_mbuf *buf) +{ + sq->ent[sq->head].buff.cookie = (uint64_t)buf; + sq->ent[sq->head].buff.addr = buf ? + rte_mbuf_data_iova_default(buf) : 0; + + sq->ent[sq->head].bpool = + (unlikely(!buf || buf->port >= RTE_MAX_ETHPORTS || +buf->refcnt > 1)) ? NULL : +mrvl_port_to_bpool_lookup[buf->port]; + + sq->head = (sq->head + 1) & MRVL_PP2_TX_SHADOWQ_MASK; + sq->size++; +} + +static inline void +mrvl_fill_desc(struct pp2_ppio_desc *desc, struct rte_mbuf *buf) +{ + pp2_ppio_outq_desc_reset(desc); + pp2_ppio_outq_desc_set_phys_addr(desc, rte_pktmbuf_iova(buf)); + pp2_ppio_outq_desc_set_pkt_offset(desc, 0); + pp2_ppio_outq_desc_set_pkt_len(desc, rte_pktmbuf_data_len(buf)); +} + static inline int mrvl_get_bpool_size(int pp2_id, int pool_id) { @@ -242,6 +276,27 @@ mrvl_get_hif(struct mrvl_priv *priv, int core_id) } /** + * Set tx burst function according to offload flag + * + * @param dev + * Pointer to Ethernet device structure. + */ +static void +mrvl_set_tx_function(struct rte_eth_dev *dev) +{ + struct mrvl_priv *priv = dev->data->dev_private; + + /* Use a simple Tx queue (no offloads, no multi segs) if possible */ + if (priv->multiseg) { + RTE_LOG(INFO, PMD, "Using multi-segment tx callback\n"); + dev->tx_pkt_burst = mrvl_tx_sg_pkt_burst; + } else { + RTE_LOG(INFO, PMD, "Using single-segment tx callback\n"); + dev->tx_pkt_burst = mrvl_tx_pkt_burst; + } +} + +/** * Configure rss based on dpdk rss configuration. * * @param priv @@ -325,6 +380,9 @@ mrvl_dev_configure(struct rte_eth_dev *dev) dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len - MRVL_PP2_ETH_HDRS_LEN; + if (dev->data->dev_conf.txmode.offloads & DEV_TX_OFFLOAD_MULTI_SEGS) + priv->multiseg = 1; + ret = mrvl_configure_rxqs(priv, dev->data->port_id, dev->data->nb_rx_queues); if (ret < 0) @@ -672,6 +730,7 @@ mrvl_dev_start(struct rte_eth_dev *dev) mrvl_flow_init(dev); mrvl_mtr_init(dev); + mrvl_set_tx_function(dev); return 0; out: @@ -2439,22 +2498,8 @@ mrvl_tx_pkt_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) rte_mbuf_prefetch_part2(pref_pkt_hdr); } - sq->ent[sq->head].buff.cookie = (uint64_t)m
Re: [dpdk-dev] [PATCH] test/bpf: use hton instead of __builtin_bswap
Hi, > > Convert host machine endianness to networking endianness for > comparison of incoming packets with BPF filter > > > Signed-off-by: Malvika Gupta > Reviewed-by: Gavin Hu > Reviewed-by: Brian Brooks > Suggested-by: Brian Brooks > --- > test/bpf/t1.c | 7 --- > test/bpf/t3.c | 3 ++- > 2 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/test/bpf/t1.c b/test/bpf/t1.c > index 60f9434ab..7943fcf34 100644 > --- a/test/bpf/t1.c > +++ b/test/bpf/t1.c > @@ -28,24 +28,25 @@ > #include > #include > #include > +#include > > uint64_t > entry(void *pkt) > { > struct ether_header *ether_header = (void *)pkt; > > - if (ether_header->ether_type != __builtin_bswap16(0x0800)) > + if (ether_header->ether_type != htons(0x0800)) Which version of clang do you use? With my one I get: $ clang -O2 -target bpf -c t1.c t1.c:37:34: error: couldn't allocate output register for constraint 'r' if (ether_header->ether_type != ntohs(0x0800)) ^ /usr/include/netinet/in.h:402:21: note: expanded from macro 'ntohs' # define ntohs(x) __bswap_16 (x) ^ /usr/include/bits/byteswap-16.h:31:14: note: expanded from macro '__bswap_16' __asm__ ("rorw $8, %w0" With '-O0' it compiles ok. $ clang -v clang version 4.0.1 (tags/RELEASE_401/final) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/7 Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/7 Selected GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/7 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Selected multilib: .;@m64 Konstantin > return 0; > > struct iphdr *iphdr = (void *)(ether_header + 1); > if (iphdr->protocol != 17 || (iphdr->frag_off & 0x1) != 0 || > - iphdr->daddr != __builtin_bswap32(0x1020304)) > + iphdr->daddr != htonl(0x1020304)) > return 0; > > int hlen = iphdr->ihl * 4; > struct udphdr *udphdr = (void *)iphdr + hlen; > > - if (udphdr->dest != __builtin_bswap16(5000)) > + if (udphdr->dest != htons(5000)) > return 0; > > return 1; > diff --git a/test/bpf/t3.c b/test/bpf/t3.c > index 531b9cb8c..24298b7c7 100644 > --- a/test/bpf/t3.c > +++ b/test/bpf/t3.c > @@ -17,6 +17,7 @@ > #include > #include > #include "mbuf.h" > +#include > > extern void rte_pktmbuf_dump(FILE *, const struct rte_mbuf *, unsigned int); > > @@ -29,7 +30,7 @@ entry(const void *pkt) > mb = pkt; > eth = rte_pktmbuf_mtod(mb, const struct ether_header *); > > - if (eth->ether_type == __builtin_bswap16(ETHERTYPE_ARP)) > + if (eth->ether_type == htons(ETHERTYPE_ARP)) > rte_pktmbuf_dump(stdout, mb, 64); > > return 1; > -- > 2.17.1
[dpdk-dev] [PATCH 2/3] crypto/mvsam: update features list
Add following features to the device's features list and update documentation accordingly: * OOP SGL in LB out * OOP LB in LB out Signed-off-by: Tomasz Duszynski --- doc/guides/cryptodevs/features/mvsam.ini | 2 ++ drivers/crypto/mvsam/rte_mrvl_pmd.c | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/doc/guides/cryptodevs/features/mvsam.ini b/doc/guides/cryptodevs/features/mvsam.ini index dd17a4c..0cc90a5 100644 --- a/doc/guides/cryptodevs/features/mvsam.ini +++ b/doc/guides/cryptodevs/features/mvsam.ini @@ -6,6 +6,8 @@ Symmetric crypto = Y Sym operation chaining = Y HW Accelerated = Y +OOP SGL In LB Out = Y +OOP LB In LB Out = Y ; ; Supported crypto algorithms of a default crypto driver. diff --git a/drivers/crypto/mvsam/rte_mrvl_pmd.c b/drivers/crypto/mvsam/rte_mrvl_pmd.c index 73eff75..21c3a95 100644 --- a/drivers/crypto/mvsam/rte_mrvl_pmd.c +++ b/drivers/crypto/mvsam/rte_mrvl_pmd.c @@ -729,7 +729,9 @@ cryptodev_mrvl_crypto_create(const char *name, dev->feature_flags = RTE_CRYPTODEV_FF_SYMMETRIC_CRYPTO | RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING | - RTE_CRYPTODEV_FF_HW_ACCELERATED; + RTE_CRYPTODEV_FF_HW_ACCELERATED | + RTE_CRYPTODEV_FF_OOP_SGL_IN_LB_OUT | + RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT; /* Set vector instructions mode supported */ internals = dev->data->dev_private; -- 2.7.4
[dpdk-dev] [PATCH 1/3] doc: update mvsam documentation
From: Dmitri Epshtein Update mvsam documentation. Signed-off-by: Dmitri Epshtein Signed-off-by: Tomasz Duszynski Signed-off-by: Natalie Samsonov --- doc/guides/cryptodevs/features/mvsam.ini | 10 +++ doc/guides/cryptodevs/mvsam.rst | 147 ++- 2 files changed, 57 insertions(+), 100 deletions(-) diff --git a/doc/guides/cryptodevs/features/mvsam.ini b/doc/guides/cryptodevs/features/mvsam.ini index b7c105a..dd17a4c 100644 --- a/doc/guides/cryptodevs/features/mvsam.ini +++ b/doc/guides/cryptodevs/features/mvsam.ini @@ -5,17 +5,22 @@ [Features] Symmetric crypto = Y Sym operation chaining = Y +HW Accelerated = Y ; ; Supported crypto algorithms of a default crypto driver. ; [Cipher] +NULL = Y AES CBC (128) = Y AES CBC (192) = Y AES CBC (256) = Y AES CTR (128) = Y AES CTR (192) = Y AES CTR (256) = Y +AES ECB (128) = Y +AES ECB (192) = Y +AES ECB (256) = Y 3DES CBC = Y 3DES CTR = Y @@ -23,10 +28,13 @@ AES CTR (256) = Y ; Supported authentication algorithms of a default crypto driver. ; [Auth] +NULL = Y MD5 = Y MD5 HMAC = Y SHA1 = Y SHA1 HMAC= Y +SHA224 = Y +SHA224 HMAC = Y SHA256 = Y SHA256 HMAC = Y SHA384 = Y @@ -40,3 +48,5 @@ AES GMAC = Y ; [AEAD] AES GCM (128) = Y +AES GCM (192) = Y +AES GCM (256) = Y diff --git a/doc/guides/cryptodevs/mvsam.rst b/doc/guides/cryptodevs/mvsam.rst index fd418c2..7acae19 100644 --- a/doc/guides/cryptodevs/mvsam.rst +++ b/doc/guides/cryptodevs/mvsam.rst @@ -37,32 +37,50 @@ support by utilizing MUSDK library, which provides cryptographic operations acceleration by using Security Acceleration Engine (EIP197) directly from user-space with minimum overhead and high performance. +Detailed information about SoCs that use MVSAM crypto driver can be obtained here: + +* https://www.marvell.com/embedded-processors/armada-70xx/ +* https://www.marvell.com/embedded-processors/armada-80xx/ +* https://www.marvell.com/embedded-processors/armada-3700/ + + Features MVSAM CRYPTO PMD has support for: -* Symmetric crypto -* Sym operation chaining -* AES CBC (128) -* AES CBC (192) -* AES CBC (256) -* AES CTR (128) -* AES CTR (192) -* AES CTR (256) -* 3DES CBC -* 3DES CTR -* MD5 -* MD5 HMAC -* SHA1 -* SHA1 HMAC -* SHA256 -* SHA256 HMAC -* SHA384 -* SHA384 HMAC -* SHA512 -* SHA512 HMAC -* AES GCM (128) +Cipher algorithms: + +* ``RTE_CRYPTO_CIPHER_NULL`` +* ``RTE_CRYPTO_CIPHER_AES_CBC`` +* ``RTE_CRYPTO_CIPHER_AES_CTR`` +* ``RTE_CRYPTO_CIPHER_AES_ECB`` +* ``RTE_CRYPTO_CIPHER_3DES_CBC`` +* ``RTE_CRYPTO_CIPHER_3DES_CTR`` +* ``RTE_CRYPTO_CIPHER_3DES_ECB`` + +Hash algorithms: + +* ``RTE_CRYPTO_AUTH_NULL`` +* ``RTE_CRYPTO_AUTH_MD5`` +* ``RTE_CRYPTO_AUTH_MD5_HMAC`` +* ``RTE_CRYPTO_AUTH_SHA1`` +* ``RTE_CRYPTO_AUTH_SHA1_HMAC`` +* ``RTE_CRYPTO_AUTH_SHA224`` +* ``RTE_CRYPTO_AUTH_SHA224_HMAC`` +* ``RTE_CRYPTO_AUTH_SHA256`` +* ``RTE_CRYPTO_AUTH_SHA256_HMAC`` +* ``RTE_CRYPTO_AUTH_SHA384`` +* ``RTE_CRYPTO_AUTH_SHA384_HMAC`` +* ``RTE_CRYPTO_AUTH_SHA512`` +* ``RTE_CRYPTO_AUTH_SHA512_HMAC`` +* ``RTE_CRYPTO_AUTH_AES_GMAC`` + +AEAD algorithms: + +* ``RTE_CRYPTO_AEAD_AES_GCM`` + +For supported feature flags please consult :doc:`overview`. Limitations --- @@ -77,25 +95,18 @@ MVSAM CRYPTO PMD driver compilation is disabled by default due to external depen Currently there are two driver specific compilation options in ``config/common_base`` available: -- ``CONFIG_RTE_LIBRTE_MVSAM_CRYPTO`` (default ``n``) +- ``CONFIG_RTE_LIBRTE_PMD_MVSAM_CRYPTO`` (default: ``n``) Toggle compilation of the librte_pmd_mvsam driver. -- ``CONFIG_RTE_LIBRTE_MVSAM_CRYPTO_DEBUG`` (default ``n``) - -Toggle display of debugging messages. - -For a list of prerequisites please refer to `Prerequisites` section in -:ref:`MVPP2 Poll Mode Driver ` guide. - MVSAM CRYPTO PMD requires MUSDK built with EIP197 support thus following extra option must be passed to the library configuration script: .. code-block:: console - --enable-sam + --enable-sam [--enable-sam-statistics] [--enable-sam-debug] -For `crypto_safexcel.ko` module build instructions please refer +For instructions how to build required kernel modules please refer to `doc/musdk_get_started.txt`. Initialization @@ -106,17 +117,15 @@ loaded: .. code-block:: console - insmod musdk_uio.ko - insmod mvpp2x_sysfs.ko - insmod mv_pp_uio.ko + insmod musdk_cma.ko + insmod crypto_safexcel.ko rings=0,0 insmod mv_sam_uio.ko - insmod crypto_safexcel.ko The following parameters (all optional) are exported by the driver: -* max_nb_queue_pairs: maximum number of queue pairs in the device (8 by default). -* max_nb_sessions: maximum number of sessions that can be created (2048 by default). -* socket_id: socket on which to allocate the device resources on. +- ``max_nb_queue_pairs``: maximum number of queue pairs in the device (default: 8 - A8K, 4 - A7K/
[dpdk-dev] [PATCH 0/3] crypto/mvsam: align with MUSDK 18.09
This patch series aligns MVSAM PMD with MUSDK 18.09. Dmitri Epshtein (2): doc: update mvsam documentation crypto/mvsam: get number of CIOs dynamically Tomasz Duszynski (1): crypto/mvsam: update features list doc/guides/cryptodevs/features/mvsam.ini | 12 +++ doc/guides/cryptodevs/mvsam.rst | 147 ++- drivers/crypto/mvsam/rte_mrvl_pmd.c | 6 +- 3 files changed, 63 insertions(+), 102 deletions(-) -- 2.7.4
[dpdk-dev] [PATCH 3/3] crypto/mvsam: get number of CIOs dynamically
From: Dmitri Epshtein MUSDK 18.09 introduced API for getting CIOs number dynamically. Use that instead of predefined constant. Signed-off-by: Dmitri Epshtein Reviewed-by: Natalie Samsonov Tested-by: Natalie Samsonov --- drivers/crypto/mvsam/rte_mrvl_pmd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/mvsam/rte_mrvl_pmd.c b/drivers/crypto/mvsam/rte_mrvl_pmd.c index 21c3a95..9a85fd9 100644 --- a/drivers/crypto/mvsam/rte_mrvl_pmd.c +++ b/drivers/crypto/mvsam/rte_mrvl_pmd.c @@ -866,7 +866,7 @@ cryptodev_mrvl_crypto_init(struct rte_vdev_device *vdev) .private_data_size = sizeof(struct mrvl_crypto_private), .max_nb_queue_pairs = - sam_get_num_inst() * SAM_HW_RING_NUM, + sam_get_num_inst() * sam_get_num_cios(0), .socket_id = rte_socket_id() }, .max_nb_sessions = MRVL_PMD_DEFAULT_MAX_NB_SESSIONS -- 2.7.4
[dpdk-dev] [PATCH v7 2/4] examples/l3fwd-power: simple app update for new API
Add the support for new traffic pattern aware power control power management API. Example: ./l3fwd-power -l xxx -n 4 -w :xx:00.0 -w :xx:00.1 -- -p 0x3 -P --config="(0,0,xx),(1,0,xx)" --empty-poll -l 14 -m 9 -h 1 Please Reference l3fwd-power document for all parameter except empty-poll. the option "l", "m", "h" are used to set the power index for LOW, MED, HIGH power state. only is useful after enable empty-poll Once enable empty-poll. The system will start with training phase. There should not has any traffic pass-through during training phase. When training phase complete, system transfer to normal phase. System will running with modest power stat at beginning. If the system busyness percentage above 70%, then system will adjust power state move to High power state. If the traffic become lower(eg. The system busyness percentage drop below 30%), system will fallback to the modest power state. Example code use master thread to monitoring worker thread busyness. the default timer resolution is 10ms. ChangeLog: v2 fix some coding style issues v3 rename the API. v6 re-work the API. v7 no change. Signed-off-by: Liang Ma Reviewed-by: Lei Yao --- examples/l3fwd-power/Makefile| 3 + examples/l3fwd-power/main.c | 253 --- examples/l3fwd-power/meson.build | 1 + 3 files changed, 240 insertions(+), 17 deletions(-) diff --git a/examples/l3fwd-power/Makefile b/examples/l3fwd-power/Makefile index d7e39a3..772ec7b 100644 --- a/examples/l3fwd-power/Makefile +++ b/examples/l3fwd-power/Makefile @@ -23,6 +23,8 @@ CFLAGS += -O3 $(shell pkg-config --cflags libdpdk) LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk) LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk) +CFLAGS += -DALLOW_EXPERIMENTAL_API + build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED) @@ -54,6 +56,7 @@ please change the definition of the RTE_TARGET environment variable) all: else +CFLAGS += -DALLOW_EXPERIMENTAL_API CFLAGS += -O3 CFLAGS += $(WERROR_FLAGS) diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index d15cd52..f1e254b 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -43,6 +43,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -55,6 +56,8 @@ /* 100 ms interval */ #define TIMER_NUMBER_PER_SECOND 10 +/* (10ms) */ +#define INTERVALS_PER_SECOND 100 /* 10 us */ #define SCALING_PERIOD(100/TIMER_NUMBER_PER_SECOND) #define SCALING_DOWN_TIME_RATIO_THRESHOLD 0.25 @@ -117,6 +120,9 @@ */ #define RTE_TEST_RX_DESC_DEFAULT 1024 #define RTE_TEST_TX_DESC_DEFAULT 1024 + + + static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT; static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT; @@ -132,6 +138,10 @@ static uint32_t enabled_port_mask = 0; static int promiscuous_on = 0; /* NUMA is enabled by default. */ static int numa_on = 1; +/* emptypoll is disabled by default. */ +static bool empty_poll_on; +volatile bool empty_poll_stop; +static struct ep_params *ep_params; static int parse_ptype; /**< Parse packet type using rx callback, and */ /**< disabled by default */ @@ -331,6 +341,13 @@ static inline uint32_t power_idle_heuristic(uint32_t zero_rx_packet_count); static inline enum freq_scale_hint_t power_freq_scaleup_heuristic( \ unsigned int lcore_id, uint16_t port_id, uint16_t queue_id); +static uint8_t freq_tlb[] = {14, 9, 1}; + +static int is_done(void) +{ + return empty_poll_stop; +} + /* exit signal handler */ static void signal_exit_now(int sigtype) @@ -339,7 +356,15 @@ signal_exit_now(int sigtype) unsigned int portid; int ret; + RTE_SET_USED(lcore_id); + RTE_SET_USED(portid); + RTE_SET_USED(ret); + if (sigtype == SIGINT) { + if (empty_poll_on) + empty_poll_stop = true; + + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { if (rte_lcore_is_enabled(lcore_id) == 0) continue; @@ -352,16 +377,19 @@ signal_exit_now(int sigtype) "core%u\n", lcore_id); } - RTE_ETH_FOREACH_DEV(portid) { - if ((enabled_port_mask & (1 << portid)) == 0) - continue; + if (!empty_poll_on) { + RTE_ETH_FOREACH_DEV(portid) { + if ((enabled_port_mask & (1 << portid)) == 0) + continue; - rte_eth_dev_stop(portid); - rte_eth_dev_close(portid); + rte_eth_dev_stop(portid); + rte_eth_dev_close(portid); +
[dpdk-dev] [PATCH v7 3/4] doc/guides/proguides/power-man: update the power API
update the document for empty poll API. Signed-off-by: Liang Ma --- doc/guides/prog_guide/power_man.rst | 87 + 1 file changed, 87 insertions(+) diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index eba1cc6..d8a4ef7 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -106,6 +106,93 @@ User Cases The power management mechanism is used to save power when performing L3 forwarding. + +Empty Poll API +-- + +Abstract + + +For packet processing workloads such as DPDK polling is continuous. +This means CPU cores always show 100% busy independent of how much work +those cores are doing. It is critical to accurately determine how busy +a core is hugely important for the following reasons: + +* No indication of overload conditions +* User do not know how much real load is on a system meaning + resulted in wasted energy as no power management is utilized + +Compared to the original l3fwd-power design, instead of going to sleep +after detecting an empty poll, the new mechanism just lowers the core frequency. +As a result, the application does not stop polling the device, which leads +to improved handling of bursts of traffic. + +When the system become busy, the empty poll mechanism can also increase the core +frequency (including turbo) to do best effort for intensive traffic. This gives +us more flexible and balanced traffic awareness over the standard l3fwd-power +application. + + +Proposed Solution +~ +The proposed solution focuses on how many times empty polls are executed. +The less the number of empty polls, means current core is busy with processing +workload, therefore, the higher frequency is needed. The high empty poll number +indicates the current core not doing any real work therefore, we can lower the +frequency to safe power. + +In the current implementation, each core has 1 empty-poll counter which assume +1 core is dedicated to 1 queue. This will need to be expanded in the future to +support multiple queues per core. + +Power state definition: +^^^ + +* LOW: Not currently used, reserved for future use. + +* MED: the frequency is used to process modest traffic workload. + +* HIGH: the frequency is used to process busy traffic workload. + +There are two phases to establish the power management system: +^^ +* Initialization/Training phase. The training phase is necessary + in order to figure out the system polling baseline numbers from + idle to busy. The highest poll count will be during idle, where all + polls are empty. These poll counts will be different between + systems due to the many possible processor micro-arch, cache + and device configurations, hence the training phase. + In the training phase, traffic is blocked so the training algorithm + can average the empty-poll numbers for the LOW, MED and + HIGH power states in order to create a baseline. + The core's counter are collected every 10ms, and the Training + phase will take 2 seconds. + +* Normal phase. When the training phase is complete, traffic is + started. The run-time poll counts are compared with the + baseline and the decision will be taken to move to MED power + state or HIGH power state. The counters are calculated every + 10ms. + + +API Overview for Empty Poll Power Management + +* **State Init**: initialize the power management system. + +* **State Free**: free the resource hold by power management system. + +* **Update Empty Poll Counter**: update the empty poll counter. + +* **Update Valid Poll Counter**: update the valid poll counter. + +* **Set the Fequence Index**: update the power state/frequency mapping. + +* **Detect empty poll state change**: empty poll state change detection algorithm. + +User Cases +-- +The mechanism can applied to any device which is based on polling. e.g. NIC, FPGA. + References -- -- 2.7.5
[dpdk-dev] [PATCH v7 1/4] lib/librte_power: traffic pattern aware power control
1. Abstract For packet processing workloads such as DPDK polling is continuous. This means CPU cores always show 100% busy independent of how much work those cores are doing. It is critical to accurately determine how busy a core is hugely important for the following reasons: * No indication of overload conditions * User do not know how much real load is on a system meaning resulted in wasted energy as no power management is utilized Compared to the original l3fwd-power design, instead of going to sleep after detecting an empty poll, the new mechanism just lowers the core frequency. As a result, the application does not stop polling the device, which leads to improved handling of bursts of traffic. When the system become busy, the empty poll mechanism can also increase the core frequency (including turbo) to do best effort for intensive traffic. This gives us more flexible and balanced traffic awareness over the standard l3fwd-power application. 2. Proposed solution The proposed solution focuses on how many times empty polls are executed. The less the number of empty polls, means current core is busy with processing workload, therefore, the higher frequency is needed. The high empty poll number indicates the current core not doing any real work therefore, we can lower the frequency to safe power. In the current implementation, each core has 1 empty-poll counter which assume 1 core is dedicated to 1 queue. This will need to be expanded in the future to support multiple queues per core. 2.1 Power state definition: LOW: Not currently used, reserved for future use. MED: the frequency is used to process modest traffic workload. HIGH: the frequency is used to process busy traffic workload. 2.2 There are two phases to establish the power management system: a.Initialization/Training phase. The training phase is necessary in order to figure out the system polling baseline numbers from idle to busy. The highest poll count will be during idle, where all polls are empty. These poll counts will be different between systems due to the many possible processor micro-arch, cache and device configurations, hence the training phase. In the training phase, traffic is blocked so the training algorithm can average the empty-poll numbers for the LOW, MED and HIGH power states in order to create a baseline. The core's counter are collected every 10ms, and the Training phase will take 2 seconds. b.Normal phase. When the training phase is complete, traffic is started. The run-time poll counts are compared with the baseline and the decision will be taken to move to MED power state or HIGH power state. The counters are calculated every 10ms. 3. Proposed API 1. rte_power_empty_poll_stat_init(void); which is used to initialize the power management system. 2. rte_power_empty_poll_stat_free(void); which is used to free the resource hold by power management system. 3. rte_power_empty_poll_stat_update(unsigned int lcore_id); which is used to update specific core empty poll counter, not thread safe 4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); which is used to update specific core valid poll counter, not thread safe 5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id); which is used to get specific core empty poll counter. 6. rte_power_poll_stat_fetch(unsigned int lcore_id); which is used to get specific core valid poll counter. 7. rte_empty_poll_detection(void); which is used to detect empty poll state changes. ChangeLog: v2: fix some coding style issues. v3: rename the filename, API name. v4: no change. v5: no change. v6: re-work the code layout, update API. v7: fix minor typo and lift node num limit. Signed-off-by: Liang Ma Reviewed-by: Lei Yao --- lib/librte_power/Makefile | 6 +- lib/librte_power/meson.build| 5 +- lib/librte_power/rte_power_empty_poll.c | 500 lib/librte_power/rte_power_empty_poll.h | 205 + lib/librte_power/rte_power_version.map | 13 + 5 files changed, 725 insertions(+), 4 deletions(-) create mode 100644 lib/librte_power/rte_power_empty_poll.c create mode 100644 lib/librte_power/rte_power_empty_poll.h diff --git a/lib/librte_power/Makefile b/lib/librte_power/Makefile index 6f85e88..a8f1301 100644 --- a/lib/librte_power/Makefile +++ b/lib/librte_power/Makefile @@ -6,8 +6,9 @@ include $(RTE_SDK)/mk/rte.vars.mk # library name LIB = librte_power.a +CFLAGS += -DALLOW_EXPERIMENTAL_API CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -fno-strict-aliasing -LDLIBS += -lrte_eal +LDLIBS += -lrte_eal -lrte_timer EXPORT_MAP := rte_power_version.map @@ -16,8 +17,9 @@ LIBABIVER := 1 # all source are stored in SRCS-y SRCS-$(CONFIG_RTE_LIBRTE_POWER) := rte_power.c power_acpi_cpufreq.c SRCS-$(CONFI
[dpdk-dev] [PATCH v7 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: empty poll update
add empty poll mode command line example Signed-off-by: Liang Ma --- doc/guides/sample_app_ug/l3_forward_power_man.rst | 21 + 1 file changed, 21 insertions(+) diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst index 795a570..7bea0a8 100644 --- a/doc/guides/sample_app_ug/l3_forward_power_man.rst +++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst @@ -362,3 +362,24 @@ The algorithm has the following sleeping behavior depending on the idle counter: If a thread polls multiple Rx queues and different queue returns different sleep duration values, the algorithm controls the sleep time in a conservative manner by sleeping for the least possible time in order to avoid a potential performance impact. + +Empty Poll Mode +- +There is a new Mode which is added recently. Empty poll mode can be enabled by +command option --empty-poll. + +See "Power Management" chapter in the DPDK Programmer's Guide for empty poll mode details. + +.. code-block:: console + +./l3fwd-power -l xxx -n 4 -w :xx:00.0 -w :xx:00.1 -- -p 0x3 -P --config="(0,0,xx),(1,0,xx)" --empty-poll -l 14 -m 9 -h 1 + +Where, + +--empty-poll: Enable the empty poll mode instead of original algorithm + +-l : optional, set up the LOW power state frequency index + +-m : optional, set up the MED power state frequency index + +-h : optional, set up the HIGH power state frequency index -- 2.7.5
[dpdk-dev] [PATCH v2 1/4] app/test-eventdev: fix minor typos
Fix minor typos. Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker functions") Signed-off-by: Pavan Nikhilesh --- v2 Changes: - remove stray newlines in logs. - update pipeline description (Nikhil). - add Tx adapter error condition when some ethdev, eventdev pair have internal port capability and others don't (Nikhil). - cleanup code formatting, simplify worker logic. - update doc to reflect new pipeline flow with Tx adapter. app/test-eventdev/test_pipeline_atq.c| 16 app/test-eventdev/test_pipeline_common.h | 8 app/test-eventdev/test_pipeline_queue.c | 16 3 files changed, 20 insertions(+), 20 deletions(-) diff --git a/app/test-eventdev/test_pipeline_atq.c b/app/test-eventdev/test_pipeline_atq.c index 26dc79f90..f0b2f9015 100644 --- a/app/test-eventdev/test_pipeline_atq.c +++ b/app/test-eventdev/test_pipeline_atq.c @@ -18,7 +18,7 @@ pipeline_atq_nb_event_queues(struct evt_options *opt) static int pipeline_atq_worker_single_stage_tx(void *arg) { - PIPELINE_WROKER_SINGLE_STAGE_INIT; + PIPELINE_WORKER_SINGLE_STAGE_INIT; while (t->done == false) { uint16_t event = rte_event_dequeue_burst(dev, port, &ev, 1, 0); @@ -43,7 +43,7 @@ pipeline_atq_worker_single_stage_tx(void *arg) static int pipeline_atq_worker_single_stage_fwd(void *arg) { - PIPELINE_WROKER_SINGLE_STAGE_INIT; + PIPELINE_WORKER_SINGLE_STAGE_INIT; const uint8_t tx_queue = t->tx_service.queue_id; while (t->done == false) { @@ -66,7 +66,7 @@ pipeline_atq_worker_single_stage_fwd(void *arg) static int pipeline_atq_worker_single_stage_burst_tx(void *arg) { - PIPELINE_WROKER_SINGLE_STAGE_BURST_INIT; + PIPELINE_WORKER_SINGLE_STAGE_BURST_INIT; while (t->done == false) { uint16_t nb_rx = rte_event_dequeue_burst(dev, port, ev, @@ -98,7 +98,7 @@ pipeline_atq_worker_single_stage_burst_tx(void *arg) static int pipeline_atq_worker_single_stage_burst_fwd(void *arg) { - PIPELINE_WROKER_SINGLE_STAGE_BURST_INIT; + PIPELINE_WORKER_SINGLE_STAGE_BURST_INIT; const uint8_t tx_queue = t->tx_service.queue_id; while (t->done == false) { @@ -126,7 +126,7 @@ pipeline_atq_worker_single_stage_burst_fwd(void *arg) static int pipeline_atq_worker_multi_stage_tx(void *arg) { - PIPELINE_WROKER_MULTI_STAGE_INIT; + PIPELINE_WORKER_MULTI_STAGE_INIT; const uint8_t nb_stages = t->opt->nb_stages; @@ -161,7 +161,7 @@ pipeline_atq_worker_multi_stage_tx(void *arg) static int pipeline_atq_worker_multi_stage_fwd(void *arg) { - PIPELINE_WROKER_MULTI_STAGE_INIT; + PIPELINE_WORKER_MULTI_STAGE_INIT; const uint8_t nb_stages = t->opt->nb_stages; const uint8_t tx_queue = t->tx_service.queue_id; @@ -192,7 +192,7 @@ pipeline_atq_worker_multi_stage_fwd(void *arg) static int pipeline_atq_worker_multi_stage_burst_tx(void *arg) { - PIPELINE_WROKER_MULTI_STAGE_BURST_INIT; + PIPELINE_WORKER_MULTI_STAGE_BURST_INIT; const uint8_t nb_stages = t->opt->nb_stages; while (t->done == false) { @@ -234,7 +234,7 @@ pipeline_atq_worker_multi_stage_burst_tx(void *arg) static int pipeline_atq_worker_multi_stage_burst_fwd(void *arg) { - PIPELINE_WROKER_MULTI_STAGE_BURST_INIT; + PIPELINE_WORKER_MULTI_STAGE_BURST_INIT; const uint8_t nb_stages = t->opt->nb_stages; const uint8_t tx_queue = t->tx_service.queue_id; diff --git a/app/test-eventdev/test_pipeline_common.h b/app/test-eventdev/test_pipeline_common.h index 5fb91607d..9cd6b905b 100644 --- a/app/test-eventdev/test_pipeline_common.h +++ b/app/test-eventdev/test_pipeline_common.h @@ -65,14 +65,14 @@ struct test_pipeline { #define BURST_SIZE 16 -#define PIPELINE_WROKER_SINGLE_STAGE_INIT \ +#define PIPELINE_WORKER_SINGLE_STAGE_INIT \ struct worker_data *w = arg; \ struct test_pipeline *t = w->t; \ const uint8_t dev = w->dev_id;\ const uint8_t port = w->port_id; \ struct rte_event ev -#define PIPELINE_WROKER_SINGLE_STAGE_BURST_INIT \ +#define PIPELINE_WORKER_SINGLE_STAGE_BURST_INIT \ int i; \ struct worker_data *w = arg; \ struct test_pipeline *t = w->t; \ @@ -80,7 +80,7 @@ struct test_pipeline { const uint8_t port = w->port_id;\ struct rte_event ev[BURST_SIZE + 1] -#define PIPELINE_WROKER_MULTI_STAGE_INIT \ +#define PIPELINE_WORKER_MULTI_STAGE_INIT \ struct worker_data *w = arg;\ struct test_pipeline *t = w->t; \ uint8_t cq_id; \ @@ -90,7 +90,7 @@ struct test_pipeline { uint8_t *const sched_type_list = &t->sched_type_list[0]; \ struct rte_event ev -#define PIPELINE_WROKER_MULTI_STAGE_BURST_INIT
[dpdk-dev] [PATCH v2 3/4] app/test-eventdev: add Tx adapter support
Convert existing Tx service based pipeline to Tx adapter based APIs and simplify worker functions. Signed-off-by: Pavan Nikhilesh --- app/test-eventdev/test_pipeline_atq.c| 269 --- app/test-eventdev/test_pipeline_common.c | 206 + app/test-eventdev/test_pipeline_common.h | 62 +++--- app/test-eventdev/test_pipeline_queue.c | 241 ++-- 4 files changed, 367 insertions(+), 411 deletions(-) diff --git a/app/test-eventdev/test_pipeline_atq.c b/app/test-eventdev/test_pipeline_atq.c index f0b2f9015..01af298f3 100644 --- a/app/test-eventdev/test_pipeline_atq.c +++ b/app/test-eventdev/test_pipeline_atq.c @@ -15,7 +15,7 @@ pipeline_atq_nb_event_queues(struct evt_options *opt) return rte_eth_dev_count_avail(); } -static int +static __rte_noinline int pipeline_atq_worker_single_stage_tx(void *arg) { PIPELINE_WORKER_SINGLE_STAGE_INIT; @@ -28,23 +28,18 @@ pipeline_atq_worker_single_stage_tx(void *arg) continue; } - if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) { - pipeline_tx_pkt(ev.mbuf); - w->processed_pkts++; - continue; - } - pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC); - pipeline_event_enqueue(dev, port, &ev); + pipeline_event_tx(dev, port, &ev); + w->processed_pkts++; } return 0; } -static int +static __rte_noinline int pipeline_atq_worker_single_stage_fwd(void *arg) { PIPELINE_WORKER_SINGLE_STAGE_INIT; - const uint8_t tx_queue = t->tx_service.queue_id; + const uint8_t *tx_queue = t->tx_evqueue_id; while (t->done == false) { uint16_t event = rte_event_dequeue_burst(dev, port, &ev, 1, 0); @@ -54,16 +49,16 @@ pipeline_atq_worker_single_stage_fwd(void *arg) continue; } - w->processed_pkts++; - ev.queue_id = tx_queue; + ev.queue_id = tx_queue[ev.mbuf->port]; pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC); pipeline_event_enqueue(dev, port, &ev); + w->processed_pkts++; } return 0; } -static int +static __rte_noinline int pipeline_atq_worker_single_stage_burst_tx(void *arg) { PIPELINE_WORKER_SINGLE_STAGE_BURST_INIT; @@ -79,27 +74,21 @@ pipeline_atq_worker_single_stage_burst_tx(void *arg) for (i = 0; i < nb_rx; i++) { rte_prefetch0(ev[i + 1].mbuf); - if (ev[i].sched_type == RTE_SCHED_TYPE_ATOMIC) { - - pipeline_tx_pkt(ev[i].mbuf); - ev[i].op = RTE_EVENT_OP_RELEASE; - w->processed_pkts++; - } else - pipeline_fwd_event(&ev[i], - RTE_SCHED_TYPE_ATOMIC); + rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0); } - pipeline_event_enqueue_burst(dev, port, ev, nb_rx); + pipeline_event_tx_burst(dev, port, ev, nb_rx); + w->processed_pkts += nb_rx; } return 0; } -static int +static __rte_noinline int pipeline_atq_worker_single_stage_burst_fwd(void *arg) { PIPELINE_WORKER_SINGLE_STAGE_BURST_INIT; - const uint8_t tx_queue = t->tx_service.queue_id; + const uint8_t *tx_queue = t->tx_evqueue_id; while (t->done == false) { uint16_t nb_rx = rte_event_dequeue_burst(dev, port, ev, @@ -112,23 +101,22 @@ pipeline_atq_worker_single_stage_burst_fwd(void *arg) for (i = 0; i < nb_rx; i++) { rte_prefetch0(ev[i + 1].mbuf); - ev[i].queue_id = tx_queue; + rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0); + ev[i].queue_id = tx_queue[ev[i].mbuf->port]; pipeline_fwd_event(&ev[i], RTE_SCHED_TYPE_ATOMIC); - w->processed_pkts++; } pipeline_event_enqueue_burst(dev, port, ev, nb_rx); + w->processed_pkts += nb_rx; } return 0; } -static int +static __rte_noinline int pipeline_atq_worker_multi_stage_tx(void *arg) { PIPELINE_WORKER_MULTI_STAGE_INIT; - const uint8_t nb_stages = t->opt->nb_stages; - while (t->done == false) { uint16_t event = rte_event_dequeue_burst(dev, port, &ev, 1, 0); @@ -141,29 +129,24 @@ pipeline_atq_worker_multi_stage_tx(void *arg) cq_id = ev.sub_event_type % nb_stages; if (cq_id == last_queue) { - if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) { - - pipeline_tx_pkt(ev.mbuf); - w->pr
[dpdk-dev] [PATCH v2 2/4] app/test-eventdev: remove redundant newlines
Signed-off-by: Pavan Nikhilesh --- app/test-eventdev/test_pipeline_common.c | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/app/test-eventdev/test_pipeline_common.c b/app/test-eventdev/test_pipeline_common.c index a54068df3..832ab8b6e 100644 --- a/app/test-eventdev/test_pipeline_common.c +++ b/app/test-eventdev/test_pipeline_common.c @@ -65,12 +65,12 @@ pipeline_test_result(struct evt_test *test, struct evt_options *opt) uint64_t total = 0; struct test_pipeline *t = evt_test_priv(test); - printf("Packet distribution across worker cores :\n"); + evt_info("Packet distribution across worker cores :"); for (i = 0; i < t->nb_workers; i++) total += t->worker[i].processed_pkts; for (i = 0; i < t->nb_workers; i++) - printf("Worker %d packets: "CLGRN"%"PRIx64" "CLNRM"percentage:" - CLGRN" %3.2f\n"CLNRM, i, + evt_info("Worker %d packets: "CLGRN"%"PRIx64""CLNRM" percentage:" + CLGRN" %3.2f"CLNRM, i, t->worker[i].processed_pkts, (((double)t->worker[i].processed_pkts)/total) * 100); @@ -234,7 +234,7 @@ pipeline_ethdev_setup(struct evt_test *test, struct evt_options *opt) RTE_SET_USED(opt); if (!rte_eth_dev_count_avail()) { - evt_err("No ethernet ports found.\n"); + evt_err("No ethernet ports found."); return -ENODEV; } @@ -253,7 +253,7 @@ pipeline_ethdev_setup(struct evt_test *test, struct evt_options *opt) if (local_port_conf.rx_adv_conf.rss_conf.rss_hf != port_conf.rx_adv_conf.rss_conf.rss_hf) { evt_info("Port %u modified RSS hash function based on hardware support," - "requested:%#"PRIx64" configured:%#"PRIx64"\n", + "requested:%#"PRIx64" configured:%#"PRIx64"", i, port_conf.rx_adv_conf.rss_conf.rss_hf, local_port_conf.rx_adv_conf.rss_conf.rss_hf); @@ -262,19 +262,19 @@ pipeline_ethdev_setup(struct evt_test *test, struct evt_options *opt) if (rte_eth_dev_configure(i, nb_queues, nb_queues, &local_port_conf) < 0) { - evt_err("Failed to configure eth port [%d]\n", i); + evt_err("Failed to configure eth port [%d]", i); return -EINVAL; } if (rte_eth_rx_queue_setup(i, 0, NB_RX_DESC, rte_socket_id(), &rx_conf, t->pool) < 0) { - evt_err("Failed to setup eth port [%d] rx_queue: %d.\n", + evt_err("Failed to setup eth port [%d] rx_queue: %d.", i, 0); return -EINVAL; } if (rte_eth_tx_queue_setup(i, 0, NB_TX_DESC, rte_socket_id(), NULL) < 0) { - evt_err("Failed to setup eth port [%d] tx_queue: %d.\n", + evt_err("Failed to setup eth port [%d] tx_queue: %d.", i, 0); return -EINVAL; } @@ -380,7 +380,7 @@ pipeline_event_rx_adapter_setup(struct evt_options *opt, uint8_t stride, ret = evt_service_setup(service_id); if (ret) { evt_err("Failed to setup service core" - " for Rx adapter\n"); + " for Rx adapter"); return ret; } } @@ -397,8 +397,7 @@ pipeline_event_rx_adapter_setup(struct evt_options *opt, uint8_t stride, evt_err("Rx adapter[%d] start failed", prod); return ret; } - printf("%s: Port[%d] using Rx adapter[%d] started\n", __func__, - prod, prod); + evt_info("Port[%d] using Rx adapter[%d] started", prod, prod); } return ret; -- 2.18.0
[dpdk-dev] [PATCH v2 4/4] doc: update eventdev application guide
Update eventdev application guide to reflect Tx adapter related changes. Signed-off-by: Pavan Nikhilesh --- .../eventdev_pipeline_atq_test_generic.svg| 848 +++--- ...ntdev_pipeline_atq_test_internal_port.svg} | 26 +- .../eventdev_pipeline_queue_test_generic.svg | 570 +++- ...dev_pipeline_queue_test_internal_port.svg} | 22 +- doc/guides/tools/testeventdev.rst | 42 +- 5 files changed, 930 insertions(+), 578 deletions(-) rename doc/guides/tools/img/{eventdev_pipeline_atq_test_lockfree.svg => eventdev_pipeline_atq_test_internal_port.svg} (99%) rename doc/guides/tools/img/{eventdev_pipeline_queue_test_lockfree.svg => eventdev_pipeline_queue_test_internal_port.svg} (99%) diff --git a/doc/guides/tools/img/eventdev_pipeline_atq_test_generic.svg b/doc/guides/tools/img/eventdev_pipeline_atq_test_generic.svg index e33367989..707b9b56b 100644 --- a/doc/guides/tools/img/eventdev_pipeline_atq_test_generic.svg +++ b/doc/guides/tools/img/eventdev_pipeline_atq_test_generic.svg @@ -20,7 +20,7 @@ height="288.34286" id="svg3868" version="1.1" - inkscape:version="0.92.2 (5c3e80d, 2017-08-06)" + inkscape:version="0.92.2 2405546, 2018-03-11" sodipodi:docname="eventdev_pipeline_atq_test_generic.svg" sodipodi:version="0.32" inkscape:output_extension="org.inkscape.output.svg.inkscape" @@ -42,22 +42,6 @@ d="M 5.77,0 -2.88,5 V -5 Z" id="path39725" /> - - - + gradientTransform="matrix(0.84881476,0,0,0.98593266,86.966576,5.0323108)" /> - - - - + + + + + + + + style="fill:#f78202;fill-opacity:1;fill-rule:evenodd;stroke:#f78202;stroke-width:1.0003pt;stroke-opacity:1" + transform="scale(0.4)" /> + style="fill:#f78202;fill-opacity:1;fill-rule:evenodd;stroke:#f78202;stroke-width:1.0003pt;stroke-opacity:1" + transform="scale(0.4)" /> + style="fill:#f78202;fill-opacity:1;fill-rule:evenodd;stroke:#f78202;stroke-width:1.0003pt;stroke-opacity:1" + transform="scale(0.4)" /> + + + refY="0" + refX="0" + id="marker35935-1-6-5-1-0" + style="overflow:visible" + inkscape:isstock="true" + inkscape:collect="always"> + style="fill:#ac14db;fill-opacity:1;fill-rule:evenodd;stroke:#ac14ff;stroke-width:1.0003pt;stroke-opacity:1" + transform="scale(0.4)" + inkscape:connector-curvature="0" /> - + style="fill:#ac14db;fill-opacity:1;fill-rule:evenodd;stroke:#ac14ff;stroke-width:1.0003pt;stroke-opacity:1" + transform="scale(0.4)" + inkscape:connector-curvature="0" /> + + + style="fill:#ac14db;fill-opacity:1;fill-rule:evenodd;stroke:#ac14ff;stroke-width:1.0003pt;stroke-opacity:1" + transform="scale(0.4)" + inkscape:connector-curvature="0" /> + + + + style="fill:#ac14db;fill-opacity:1;fill-rule:evenodd;stroke:#ac14ff;stroke-width:1.0003pt;stroke-opacity:1" + transform="scale(0.4)" + inkscape:connector-curvature="0" /> - - + + + + + + + + + port n+2 + style="font-size:10px;line-height:1.25">port n+1 port n+3 + style="font-size:10px;line-height:1.25">port n+2 total queues = number of ethernet dev + 1 + style="font-size:10px;line-height:1.25">total queues = 2 * number of ethernet dev +Event ethRx adptr 0 +Event ethRx adptr 1 +Event ethRx adptr q + + + + +(Tx Generic) + transform="translate(69.258261,-194.86398)"> Txq 0 + transform="translate(-12.211349,-3.253112)"> Txq 0 + transform="translate(-10.498979,-2.682322)"> Txq 0 -Event ethRx adptr 0 -Event ethRx adptr 1 -Event ethRx adptr q - - - Tx Serviceport n + 1 - - - - - + + x="502.77109" + y="189.40137" + id="tspan5223-0-9-02" + style="font-size:10px;line-height:1.25">port n+m+1 +Single link + style="display:inline;opacity:1;fill:#ff;fill-opacity:1;stroke:url(#linearGradient3995-8-9);stroke-width:1.2090857;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" + id="rect87-6-5-3-79-1" + width="72.081367" + height="32.405426" + x="499.944" + y="226.74811" + rx="16.175425" + ry="16.202713" /> Single port n+m+2 + +Link Q + x="512.51819" + y="301.5791" + id="tspan5223-0-9-0-4-2" + style="font-size:10px;line-height:1.25">port n+o + (Tx Generi
[dpdk-dev] [PATCH] net/pcap: physical interface MAC address support
Support for PCAP physical interface MAC with phy_mac=1 devarg. Signed-off-by: Juhamatti Kuusisaari --- drivers/net/pcap/rte_eth_pcap.c | 64 ++--- 1 file changed, 59 insertions(+), 5 deletions(-) diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c index e8810a171..b693f26e7 100644 --- a/drivers/net/pcap/rte_eth_pcap.c +++ b/drivers/net/pcap/rte_eth_pcap.c @@ -7,6 +7,9 @@ #include #include +#include +#include +#include #include @@ -17,6 +20,7 @@ #include #include #include +#include #define RTE_ETH_PCAP_SNAPSHOT_LEN 65535 #define RTE_ETH_PCAP_SNAPLEN ETHER_MAX_JUMBO_FRAME_LEN @@ -29,6 +33,7 @@ #define ETH_PCAP_RX_IFACE_IN_ARG "rx_iface_in" #define ETH_PCAP_TX_IFACE_ARG "tx_iface" #define ETH_PCAP_IFACE_ARG"iface" +#define ETH_PCAP_PHY_MAC_ARG "phy_mac" #define ETH_PCAP_ARG_MAXLEN64 @@ -87,6 +92,7 @@ static const char *valid_arguments[] = { ETH_PCAP_RX_IFACE_IN_ARG, ETH_PCAP_TX_IFACE_ARG, ETH_PCAP_IFACE_ARG, + ETH_PCAP_PHY_MAC_ARG, NULL }; @@ -909,7 +915,7 @@ eth_from_pcaps_common(struct rte_vdev_device *vdev, struct pmd_devargs *rx_queues, const unsigned int nb_rx_queues, struct pmd_devargs *tx_queues, const unsigned int nb_tx_queues, struct rte_kvargs *kvlist, struct pmd_internals **internals, - struct rte_eth_dev **eth_dev) + const int phy_mac, struct rte_eth_dev **eth_dev) { struct rte_kvargs_pair *pair = NULL; unsigned int k_idx; @@ -955,6 +961,26 @@ eth_from_pcaps_common(struct rte_vdev_device *vdev, else (*internals)->if_index = if_nametoindex(pair->value); + if (phy_mac) { + const unsigned int numa_node = vdev->device.numa_node; + const char *if_name = pair->value; + int if_fd = socket(AF_INET, SOCK_DGRAM, 0); + if (if_fd != -1) { + struct ifreq ifr; + strlcpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name)); + if (!ioctl(if_fd, SIOCGIFHWADDR, &ifr)) { + PMD_LOG(INFO, "Setting phy MAC for %s\n", + if_name); + (*eth_dev)->data->mac_addrs = + rte_zmalloc_socket(NULL, ETHER_ADDR_LEN, + 0, numa_node); + rte_memcpy((*eth_dev)->data->mac_addrs, + ifr.ifr_addr.sa_data, + ETHER_ADDR_LEN); + } + close(if_fd); + } + } return 0; } @@ -962,7 +988,7 @@ static int eth_from_pcaps(struct rte_vdev_device *vdev, struct pmd_devargs *rx_queues, const unsigned int nb_rx_queues, struct pmd_devargs *tx_queues, const unsigned int nb_tx_queues, - struct rte_kvargs *kvlist, int single_iface, + struct rte_kvargs *kvlist, int single_iface, int phy_mac, unsigned int using_dumpers) { struct pmd_internals *internals = NULL; @@ -970,7 +996,7 @@ eth_from_pcaps(struct rte_vdev_device *vdev, int ret; ret = eth_from_pcaps_common(vdev, rx_queues, nb_rx_queues, - tx_queues, nb_tx_queues, kvlist, &internals, ð_dev); + tx_queues, nb_tx_queues, kvlist, &internals, phy_mac, ð_dev); if (ret < 0) return ret; @@ -989,6 +1015,22 @@ eth_from_pcaps(struct rte_vdev_device *vdev, return 0; } +static int +select_phy_mac(const char *key, const char *value, void *extra_args) +{ + if (extra_args && strcmp(key, ETH_PCAP_PHY_MAC_ARG) == 0) { + const int phy_mac = atoi(value); + int *enable_phy_mac = extra_args; + + if (phy_mac != 0 && phy_mac != 1) + PMD_LOG(WARNING, "Value should be 0 or 1, set it as 1!"); + + if (phy_mac) + *enable_phy_mac = 1; + } + return 0; +} + static int pmd_pcap_probe(struct rte_vdev_device *dev) { @@ -999,6 +1041,7 @@ pmd_pcap_probe(struct rte_vdev_device *dev) struct pmd_devargs dumpers = {0}; struct rte_eth_dev *eth_dev; int single_iface = 0; + int phy_mac = 0; int ret; name = rte_vdev_device_name(dev); @@ -1026,6 +1069,16 @@ pmd_pcap_probe(struct rte_vdev_device *dev) if (kvlist == NULL) return -1; + /* +* We check whether we want to use phy MAC of pcap interface. +*/ + if (rte_kvargs_count(kvlist, ETH_PCAP_PHY_MAC_ARG)) { + ret = rte_kvargs_process(kvlist, ETH_PCAP_PHY_MAC_ARG, + &select_phy_mac, &phy_mac); +
[dpdk-dev] MLX5 should define the timestamp field in the doc
Hi all, As far as I know, MLX5 is the only driver to support hardware timestamping. It would be great to update the doc to explain what the hardware timestamp is supposed to be. If it's nanoseconds, then just a shift regarding system time is enough ? Does it also need a multiplication? Can we query that from hardware? Or provide a piece of code to be used. As it, the feature is useless... It would be interesting to normalize the hardware timestamping. I guess for any driver, an offset and a multiplication(shift+multiplier eventually) would be enough, and the API should be updated to provide a function to convert a hardware timestamp to software (or that should be part of the driver and done automatically if offloading is enabled?) and probably one to initialize the time, much like the Linux one at https://elixir.bootlin.com/linux/v4.18.5/source/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c#L488 . Thanks, Tom ?
Re: [dpdk-dev] [PATCH v2] bus/vdev: fix wrong error log on secondary device scan
Acked-by: Gage Eads Thanks! Gage > -Original Message- > From: Zhang, Qi Z > Sent: Monday, September 3, 2018 3:50 AM > To: dev@dpdk.org > Cc: tho...@monjalon.net; Burakov, Anatoly ; > Eads, Gage ; Zhang, Qi Z ; > sta...@dpdk.org > Subject: [PATCH v2] bus/vdev: fix wrong error log on secondary device scan > > When a secondary process handles VDEV_SCAN_ONE mp action, it is possible > the device is already be inserted. This happens when we have multiple > secondary > processes which cause multiple broadcasts from primary during > bus->scan. So we don't need to log any error for -EEXIST. > > Bugzilla ID: 84 > Fixes: cdb068f031c6 ("bus/vdev: scan by multi-process channel") > Cc: sta...@dpdk.org > > Reported-by: Eads Gage > Signed-off-by: Qi Zhang > --- > > v2: > - change log level to DEBUG for the case device already exist. > > drivers/bus/vdev/vdev.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c index > 6139dd551..69dee89a8 100644 > --- a/drivers/bus/vdev/vdev.c > +++ b/drivers/bus/vdev/vdev.c > @@ -346,6 +346,7 @@ vdev_action(const struct rte_mp_msg *mp_msg, const > void *peer) > const struct vdev_param *in = (const struct vdev_param *)mp_msg- > >param; > const char *devname; > int num; > + int ret; > > strlcpy(mp_resp.name, VDEV_MP_KEY, sizeof(mp_resp.name)); > mp_resp.len_param = sizeof(*ou); > @@ -380,7 +381,10 @@ vdev_action(const struct rte_mp_msg *mp_msg, const > void *peer) > break; > case VDEV_SCAN_ONE: > VDEV_LOG(INFO, "receive vdev, %s", in->name); > - if (insert_vdev(in->name, NULL, NULL) < 0) > + ret = insert_vdev(in->name, NULL, NULL); > + if (ret == -EEXIST) > + VDEV_LOG(DEBUG, "device already exist, %s", in- > >name); > + else if (ret < 0) > VDEV_LOG(ERR, "failed to add vdev, %s", in->name); > break; > default: > -- > 2.13.6
[dpdk-dev] [PATCH v2 4/9] memalloc: rename lock list to fd list
Previously, we were only using lock lists to store per-page lock fd's because we cannot use modern fcntl() file description locks to lock parts of the page in single file segments mode. Now, we will be using this list to store either lock fd's (along with memseg list fd) in single file segments mode, or per-page fd's (and set memseg list fd to -1), so rename the list accordingly. Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 66 -- 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index aa95551a8..14bc5dce9 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -57,25 +57,33 @@ const int anonymous_hugepages_supported = */ static int fallocate_supported = -1; /* unknown */ -/* for single-file segments, we need some kind of mechanism to keep track of +/* + * we have two modes - single file segments, and file-per-page mode. + * + * for single-file segments, we need some kind of mechanism to keep track of * which hugepages can be freed back to the system, and which cannot. we cannot * use flock() because they don't allow locking parts of a file, and we cannot * use fcntl() due to issues with their semantics, so we will have to rely on a - * bunch of lockfiles for each page. + * bunch of lockfiles for each page. so, we will use 'fds' array to keep track + * of per-page lockfiles. we will store the actual segment list fd in the + * 'memseg_list_fd' field. + * + * for file-per-page mode, each page will have its own fd, so 'memseg_list_fd' + * will be invalid (set to -1), and we'll use 'fds' to keep track of page fd's. * * we cannot know how many pages a system will have in advance, but we do know * that they come in lists, and we know lengths of these lists. so, simply store * a malloc'd array of fd's indexed by list and segment index. * * they will be initialized at startup, and filled as we allocate/deallocate - * segments. also, use this to track memseg list proper fd. + * segments. */ static struct { int *fds; /**< dynamically allocated array of segment lock fd's */ int memseg_list_fd; /**< memseg list fd */ int len; /**< total length of the array */ int count; /**< entries used in an array */ -} lock_fds[RTE_MAX_MEMSEG_LISTS]; +} fd_list[RTE_MAX_MEMSEG_LISTS]; /** local copy of a memory map, used to synchronize memory hotplug in MP */ static struct rte_memseg_list local_memsegs[RTE_MAX_MEMSEG_LISTS]; @@ -209,12 +217,12 @@ static int get_segment_lock_fd(int list_idx, int seg_idx) char path[PATH_MAX] = {0}; int fd; - if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds)) + if (list_idx < 0 || list_idx >= (int)RTE_DIM(fd_list)) return -1; - if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len) + if (seg_idx < 0 || seg_idx >= fd_list[list_idx].len) return -1; - fd = lock_fds[list_idx].fds[seg_idx]; + fd = fd_list[list_idx].fds[seg_idx]; /* does this lock already exist? */ if (fd >= 0) return fd; @@ -236,8 +244,8 @@ static int get_segment_lock_fd(int list_idx, int seg_idx) return -1; } /* store it for future reference */ - lock_fds[list_idx].fds[seg_idx] = fd; - lock_fds[list_idx].count++; + fd_list[list_idx].fds[seg_idx] = fd; + fd_list[list_idx].count++; return fd; } @@ -245,12 +253,12 @@ static int unlock_segment(int list_idx, int seg_idx) { int fd, ret; - if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds)) + if (list_idx < 0 || list_idx >= (int)RTE_DIM(fd_list)) return -1; - if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len) + if (seg_idx < 0 || seg_idx >= fd_list[list_idx].len) return -1; - fd = lock_fds[list_idx].fds[seg_idx]; + fd = fd_list[list_idx].fds[seg_idx]; /* upgrade lock to exclusive to see if we can remove the lockfile */ ret = lock(fd, LOCK_EX); @@ -270,8 +278,8 @@ static int unlock_segment(int list_idx, int seg_idx) * and remove it from list anyway. */ close(fd); - lock_fds[list_idx].fds[seg_idx] = -1; - lock_fds[list_idx].count--; + fd_list[list_idx].fds[seg_idx] = -1; + fd_list[list_idx].count--; if (ret < 0) return -1; @@ -288,7 +296,7 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi, /* create a hugepage file path */ eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx); - fd = lock_fds[list_idx].memseg_list_fd; + fd = fd_list[list_idx].memseg_list_fd; if (fd < 0) { fd = open(path, O_CREAT | O_RDWR, 0600); @@ -304,7 +312,7 @@ get_seg_f
[dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint
This patchset further improves DPDK support for running without hugetlbfs mountpoints. First of all, it enables using memfd-created hugepages in in-memory mode. This way, instead of anonymous hugepages, we can get proper fd's for each page (or for the entire segment, if we're using single-file segments). Memfd will be used automatically if support for it was compiled and is available at runtime, however DPDK will fall back to using anonymous hugepages if such support is not available. The other thing this patchset does is exposing segment fd's through an external API. There is a lot of ugliness in current virtio/vhost code that deals with finding hugepage files through procfs, while all virtio really needs are fd's referring to the pages, and their offsets. Using this API, virtio will be able to access segment fd's directly, without the procfs magic. As a bonus, because we enabled use of memfd (given that sufficiently recent kernel version is used), once virtio support for getting segment fd's using the new API is implemented, virtio will also be able to work without having hugetlbfs mountpoints. Virtio support is not provided in this patchset, coordination and implementation of it is up to virtio maintainers. Once virtio support for this is in place, DPDK will have one less barrier for adoption in container space. v1->v2: - Added a new API to retrieve segment offset into its fd Anatoly Burakov (9): fbarray: fix detach in noshconf mode eal: don't allow legacy mode with in-memory mode mem: raise maximum fd limit unconditionally memalloc: rename lock list to fd list memalloc: track page fd's in non-single file mode memalloc: add EAL-internal API to get and set segment fd's mem: add external API to retrieve page fd from EAL mem: allow querying offset into segment fd mem: support using memfd segments for in-memory mode lib/librte_eal/bsdapp/eal/eal_memalloc.c | 19 + lib/librte_eal/common/eal_common_fbarray.c | 4 + lib/librte_eal/common/eal_common_memory.c | 107 - lib/librte_eal/common/eal_common_options.c | 12 +- lib/librte_eal/common/eal_memalloc.h | 11 + lib/librte_eal/common/include/rte_memory.h | 97 + lib/librte_eal/linuxapp/eal/eal_memalloc.c | 449 + lib/librte_eal/linuxapp/eal/eal_memory.c | 64 ++- lib/librte_eal/rte_eal_version.map | 4 + 9 files changed, 669 insertions(+), 98 deletions(-) -- 2.17.1
[dpdk-dev] [PATCH v2 6/9] memalloc: add EAL-internal API to get and set segment fd's
Enable setting and retrieving segment fd's internally. For now, retrieving fd's will not be used anywhere until we get an external API, but it will be useful for things like virtio, where we wish to share segment fd's. Setting segment fd's will not be available as a public API at this time, but internally it is needed for legacy mode, because we're not allocating our hugepages in memalloc in legacy mode case, and we still need to store the fd. Another user of get segment fd API is memseg info dump, to show which pages use which fd's. Not supported on FreeBSD. Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal_memalloc.c | 12 + lib/librte_eal/common/eal_common_memory.c | 8 +-- lib/librte_eal/common/eal_memalloc.h | 6 +++ lib/librte_eal/linuxapp/eal/eal_memalloc.c | 60 +- lib/librte_eal/linuxapp/eal/eal_memory.c | 44 +--- 5 files changed, 109 insertions(+), 21 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c index f7f07abd6..a5fb09f71 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c @@ -47,6 +47,18 @@ eal_memalloc_sync_with_primary(void) return -1; } +int +eal_memalloc_get_seg_fd(int list_idx, int seg_idx) +{ + return -1; +} + +int +eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) +{ + return -1; +} + int eal_memalloc_init(void) { diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index fbfb1b055..034c2026a 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -294,7 +294,7 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms, void *arg) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - int msl_idx, ms_idx; + int msl_idx, ms_idx, fd; FILE *f = arg; msl_idx = msl - mcfg->memsegs; @@ -305,10 +305,11 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms, if (ms_idx < 0) return -1; + fd = eal_memalloc_get_seg_fd(msl_idx, ms_idx); fprintf(f, "Segment %i-%i: IOVA:0x%"PRIx64", len:%zu, " "virt:%p, socket_id:%"PRId32", " "hugepage_sz:%"PRIu64", nchannel:%"PRIx32", " - "nrank:%"PRIx32"\n", + "nrank:%"PRIx32" fd:%i\n", msl_idx, ms_idx, ms->iova, ms->len, @@ -316,7 +317,8 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms, ms->socket_id, ms->hugepage_sz, ms->nchannel, - ms->nrank); + ms->nrank, + fd); return 0; } diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h index 36bb1a027..a46c69c72 100644 --- a/lib/librte_eal/common/eal_memalloc.h +++ b/lib/librte_eal/common/eal_memalloc.h @@ -76,6 +76,12 @@ eal_memalloc_mem_alloc_validator_unregister(const char *name, int socket_id); int eal_memalloc_mem_alloc_validate(int socket_id, size_t new_len); +int +eal_memalloc_get_seg_fd(int list_idx, int seg_idx); + +int +eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd); + int eal_memalloc_init(void); diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 7d536350e..b820989e9 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -1334,16 +1334,10 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl, } static int -fd_list_create_walk(const struct rte_memseg_list *msl, - void *arg __rte_unused) +alloc_list(int list_idx, int len) { - struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - unsigned int i, len; - int msl_idx; int *data; - - msl_idx = msl - mcfg->memsegs; - len = msl->memseg_arr.len; + int i; /* ensure we have space to store fd per each possible segment */ data = malloc(sizeof(int) * len); @@ -1355,14 +1349,56 @@ fd_list_create_walk(const struct rte_memseg_list *msl, for (i = 0; i < len; i++) data[i] = -1; - fd_list[msl_idx].fds = data; - fd_list[msl_idx].len = len; - fd_list[msl_idx].count = 0; - fd_list[msl_idx].memseg_list_fd = -1; + fd_list[list_idx].fds = data; + fd_list[list_idx].len = len; + fd_list[list_idx].count = 0; + fd_list[list_idx].memseg_list_fd = -1; return 0; } +static int +fd_list_create_walk(const struct rte_memseg_list *msl, + void *arg __rte_unused) +{ + struct rte_mem_config *mcfg = rte_eal_get_configura
[dpdk-dev] [PATCH v2 3/9] mem: raise maximum fd limit unconditionally
Previously, when we allocated hugepages, we closed the fd's corresponding to them after we've done our mappings. Since we did mmap(), we didn't actually lose the reference, but file descriptors used for mmap() do not count against the fd limit. Since we are going to store all of our fd's, we will hit the fd limit much more often when using smaller page sizes. Fix this to raise the fd limit to maximum unconditionally. Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_memory.c | 20 1 file changed, 20 insertions(+) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index dbf19499e..dfb537f59 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -2204,6 +2205,25 @@ memseg_secondary_init(void) int rte_eal_memseg_init(void) { + /* increase rlimit to maximum */ + struct rlimit lim; + + if (getrlimit(RLIMIT_NOFILE, &lim) == 0) { + /* set limit to maximum */ + lim.rlim_cur = lim.rlim_max; + + if (setrlimit(RLIMIT_NOFILE, &lim) < 0) { + RTE_LOG(DEBUG, EAL, "Setting maximum number of open files failed: %s\n", + strerror(errno)); + } else { + RTE_LOG(DEBUG, EAL, "Setting maximum number of open files to %" + PRIu64 "\n", + (uint64_t)lim.rlim_cur); + } + } else { + RTE_LOG(ERR, EAL, "Cannot get current resource limits\n"); + } + return rte_eal_process_type() == RTE_PROC_PRIMARY ? #ifndef RTE_ARCH_64 memseg_primary_init_32() : -- 2.17.1
[dpdk-dev] [PATCH v2 5/9] memalloc: track page fd's in non-single file mode
Previously, we were only tracking lock file fd's in single-file segments mode, but did not track fd's in non-single file mode because we didn't need to (mmap() call still kept the lock). Now that we are going to expose these fd's to the world, we need to have access to them, so track them even in non-single file segments mode. We don't need to close fd's after mmap() because we're still tracking them in an fd list. Also, for anonymous hugepages mode, fd will always be -1 so exit early on error. Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 44 -- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 14bc5dce9..7d536350e 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -318,18 +318,24 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi, /* create a hugepage file path */ eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx * RTE_MAX_MEMSEG_PER_LIST + seg_idx); - fd = open(path, O_CREAT | O_RDWR, 0600); + + fd = fd_list[list_idx].fds[seg_idx]; + if (fd < 0) { - RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n", __func__, - strerror(errno)); - return -1; - } - /* take out a read lock */ - if (lock(fd, LOCK_SH) < 0) { - RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n", - __func__, strerror(errno)); - close(fd); - return -1; + fd = open(path, O_CREAT | O_RDWR, 0600); + if (fd < 0) { + RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n", + __func__, strerror(errno)); + return -1; + } + /* take out a read lock */ + if (lock(fd, LOCK_SH) < 0) { + RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n", + __func__, strerror(errno)); + close(fd); + return -1; + } + fd_list[list_idx].fds[seg_idx] = fd; } } return fd; @@ -601,10 +607,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, goto mapped; } #endif - /* for non-single file segments that aren't in-memory, we can close fd -* here */ - if (!internal_config.single_file_segments && !internal_config.in_memory) - close(fd); ms->addr = addr; ms->hugepage_sz = alloc_sz; @@ -634,7 +636,10 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, RTE_LOG(CRIT, EAL, "Can't mmap holes in our virtual address space\n"); } resized: - /* in-memory mode will never be single-file-segments mode */ + /* some codepaths will return negative fd, so exit early */ + if (fd < 0) + return -1; + if (internal_config.single_file_segments) { resize_hugefile(fd, path, list_idx, seg_idx, map_offset, alloc_sz, false); @@ -646,6 +651,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, lock(fd, LOCK_EX) == 1) unlink(path); close(fd); + fd_list[list_idx].fds[seg_idx] = -1; } return -1; } @@ -700,6 +706,7 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi, } /* closing fd will drop the lock */ close(fd); + fd_list[list_idx].fds[seg_idx] = -1; } memset(ms, 0, sizeof(*ms)); @@ -1364,8 +1371,7 @@ eal_memalloc_init(void) return -1; /* initialize all of the fd lists */ - if (internal_config.single_file_segments) - if (rte_memseg_list_walk(fd_list_create_walk, NULL)) - return -1; + if (rte_memseg_list_walk(fd_list_create_walk, NULL)) + return -1; return 0; } -- 2.17.1
[dpdk-dev] [PATCH v2 9/9] mem: support using memfd segments for in-memory mode
Enable using memfd-created segments if supported by the system. This will allow having real fd's for pages but without hugetlbfs mounts, which will enable in-memory mode to be used with virtio. The implementation is mostly piggy-backing on existing real-fd code, except that we no longer need to unlink any files or track per-page locks in single-file segments mode, because in-memory mode does not support secondary processes anyway. We move some checks from EAL command-line parsing code to memalloc because it is now possible to use single-file segments mode with in-memory mode, but only if memfd is supported. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_options.c | 6 +- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 265 ++--- 2 files changed, 235 insertions(+), 36 deletions(-) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 873099acc..ddd624110 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -1384,10 +1384,10 @@ eal_check_common_options(struct internal_config *internal_cfg) " is only supported in non-legacy memory mode\n"); } if (internal_cfg->single_file_segments && - internal_cfg->hugepage_unlink) { + internal_cfg->hugepage_unlink && + !internal_cfg->in_memory) { RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is " - "not compatible with neither --"OPT_IN_MEMORY" nor " - "--"OPT_HUGE_UNLINK"\n"); + "not compatible with --"OPT_HUGE_UNLINK"\n"); return -1; } if (internal_cfg->legacy_mem && diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 66e1d87b6..0422cbd8d 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -52,6 +52,23 @@ const int anonymous_hugepages_supported = #define RTE_MAP_HUGE_SHIFT 26 #endif +/* + * we don't actually care if memfd itself is supported - we only need to check + * if memfd supports hugetlbfs, as that already implies memfd support. + * + * also, this is not a constant, because while we may be *compiled* with memfd + * hugetlbfs support, we might not be *running* on a system that supports memfd + * and/or memfd with hugetlbfs, so we need to be able to adjust this flag at + * runtime, and fall back to anonymous memory. + */ +int memfd_create_supported = +#ifdef MFD_HUGETLB +#define MEMFD_SUPPORTED + 1; +#else + 0; +#endif + /* * not all kernel version support fallocate on hugetlbfs, so fall back to * ftruncate and disallow deallocation if fallocate is not supported. @@ -191,6 +208,31 @@ get_file_size(int fd) return st.st_size; } +static inline uint32_t +bsf64(uint64_t v) +{ + return (uint32_t)__builtin_ctzll(v); +} + +static inline uint32_t +log2_u64(uint64_t v) +{ + if (v == 0) + return 0; + v = rte_align64pow2(v); + return bsf64(v); +} + +static int +pagesz_flags(uint64_t page_sz) +{ + /* as per mmap() manpage, all page sizes are log2 of page size +* shifted by MAP_HUGE_SHIFT +*/ + int log2 = log2_u64(page_sz); + return log2 << RTE_MAP_HUGE_SHIFT; +} + /* returns 1 on successful lock, 0 on unsuccessful lock, -1 on error */ static int lock(int fd, int type) { @@ -287,12 +329,64 @@ static int unlock_segment(int list_idx, int seg_idx) return 0; } +static int +get_seg_memfd(struct hugepage_info *hi __rte_unused, + unsigned int list_idx __rte_unused, + unsigned int seg_idx __rte_unused) +{ +#ifdef MEMFD_SUPPORTED + int fd; + char segname[250]; /* as per manpage, limit is 249 bytes plus null */ + + if (internal_config.single_file_segments) { + fd = fd_list[list_idx].memseg_list_fd; + + if (fd < 0) { + int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz); + + snprintf(segname, sizeof(segname), "seg_%i", list_idx); + fd = memfd_create(segname, flags); + if (fd < 0) { + RTE_LOG(DEBUG, EAL, "%s(): memfd create failed: %s\n", + __func__, strerror(errno)); + return -1; + } + fd_list[list_idx].memseg_list_fd = fd; + } + } else { + fd = fd_list[list_idx].fds[seg_idx]; + + if (fd < 0) { + int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz); + + snprintf(segname, sizeof(segname), "seg_%i-%i", + list_idx, seg_idx); +
[dpdk-dev] [PATCH v2 2/9] eal: don't allow legacy mode with in-memory mode
In-memory mode was never meant to support legacy mode, because we cannot sort anonymous pages anyway. Fixes: 72b49ff623c4 ("mem: support --in-memory mode") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_options.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index dd5f97402..873099acc 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -1390,6 +1390,12 @@ eal_check_common_options(struct internal_config *internal_cfg) "--"OPT_HUGE_UNLINK"\n"); return -1; } + if (internal_cfg->legacy_mem && + internal_cfg->in_memory) { + RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible " + "with --"OPT_IN_MEMORY"\n"); + return -1; + } return 0; } -- 2.17.1
[dpdk-dev] [PATCH v2 8/9] mem: allow querying offset into segment fd
In a few cases, user may need to query offset into fd for a particular memory segment (for example, to selectively map pages). This commit adds a new API to do that. Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 +++ lib/librte_eal/common/eal_common_memory.c | 50 ++ lib/librte_eal/common/eal_memalloc.h | 3 ++ lib/librte_eal/common/include/rte_memory.h | 49 + lib/librte_eal/linuxapp/eal/eal_memalloc.c | 21 + lib/librte_eal/rte_eal_version.map | 2 + 6 files changed, 131 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c index 80e4c3d4f..06afbcc99 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c @@ -60,6 +60,12 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) return -ENOTSUP; } +int +eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset) +{ + return -ENOTSUP; +} + int eal_memalloc_init(void) { diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 4a80deaf5..0b69804ff 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -599,6 +599,56 @@ rte_memseg_get_fd(const struct rte_memseg *ms) return ret; } +int __rte_experimental +rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms, + size_t *offset) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct rte_memseg_list *msl; + struct rte_fbarray *arr; + int msl_idx, seg_idx, ret; + + if (ms == NULL || offset == NULL) { + rte_errno = EINVAL; + return -1; + } + + msl = rte_mem_virt2memseg_list(ms->addr); + if (msl == NULL) { + rte_errno = EINVAL; + return -1; + } + arr = &msl->memseg_arr; + + msl_idx = msl - mcfg->memsegs; + seg_idx = rte_fbarray_find_idx(arr, ms); + + if (!rte_fbarray_is_used(arr, seg_idx)) { + rte_errno = ENOENT; + return -1; + } + + ret = eal_memalloc_get_seg_fd_offset(msl_idx, seg_idx, offset); + if (ret < 0) { + rte_errno = -ret; + ret = -1; + } + return ret; +} + +int __rte_experimental +rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int ret; + + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + ret = rte_memseg_get_fd_offset_thread_unsafe(ms, offset); + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} + /* init memory subsystem */ int rte_eal_memory_init(void) diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h index 70a214de4..af917c2f9 100644 --- a/lib/librte_eal/common/eal_memalloc.h +++ b/lib/librte_eal/common/eal_memalloc.h @@ -84,6 +84,9 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx); int eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd); +int +eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset); + int eal_memalloc_init(void); diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h index 0d2a30056..14bd277a4 100644 --- a/lib/librte_eal/common/include/rte_memory.h +++ b/lib/librte_eal/common/include/rte_memory.h @@ -365,6 +365,55 @@ rte_memseg_get_fd(const struct rte_memseg *ms); int __rte_experimental rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms); +/** + * Get offset into segment file descriptor associated with a particular memseg + * (if available). + * + * @note This function read-locks the memory hotplug subsystem, and thus cannot + * be used within memory-related callback functions. + * + * @param ms + * A pointer to memseg for which to get file descriptor. + * @param offset + * A pointer to offset value where the result will be stored. + * + * @return + * Valid file descriptor in case of success. + * -1 in case of error, with ``rte_errno`` set to the following values: + * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg + * - EINVAL - ``offset`` pointer was NULL + * - ENODEV - ``ms`` fd is not available + * - ENOENT - ``ms`` is an unused segment + * - ENOTSUP - segment fd's are not supported + */ +int __rte_experimental +rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset); + +/** + * Get offset into segment file descriptor associated with a particular memseg + * (if available). + * + * @note This function does not perform any locking, and is only safe to call + * from within memory-related callback functions. + * + * @param ms + * A pointer to memseg for which to get file descriptor. + *
[dpdk-dev] [PATCH v2 1/9] fbarray: fix detach in noshconf mode
In noshconf mode, no shared files are created, but we're still trying to unlink them, resulting in detach/destroy failure even though it should have succeeded. Fix it by exiting early in noshconf mode. Fixes: 3ee2cde248a7 ("fbarray: support --no-shconf mode") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_fbarray.c | 4 1 file changed, 4 insertions(+) diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c index 43caf3ced..ba6c4ae39 100644 --- a/lib/librte_eal/common/eal_common_fbarray.c +++ b/lib/librte_eal/common/eal_common_fbarray.c @@ -878,6 +878,10 @@ rte_fbarray_destroy(struct rte_fbarray *arr) if (ret) return ret; + /* with no shconf, there were never any files to begin with */ + if (internal_config.no_shconf) + return 0; + /* try deleting the file */ eal_get_fbarray_path(path, sizeof(path), arr->name); -- 2.17.1
[dpdk-dev] [PATCH v2 7/9] mem: add external API to retrieve page fd from EAL
Now that we can retrieve page fd's internally, we can expose it as an external API. This will add two flavors of API - thread-safe and non-thread-safe. Fix up internal API's to return values we need without modifying rte_errno internally if called from within EAL. We do not want calling code to accidentally close an internal fd, so we make a duplicate of it before we return it to the user. Caller is therefore responsible for closing this fd. Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal_memalloc.c | 5 ++- lib/librte_eal/common/eal_common_memory.c | 49 ++ lib/librte_eal/common/eal_memalloc.h | 2 + lib/librte_eal/common/include/rte_memory.h | 48 + lib/librte_eal/linuxapp/eal/eal_memalloc.c | 21 ++ lib/librte_eal/rte_eal_version.map | 2 + 6 files changed, 118 insertions(+), 9 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c index a5fb09f71..80e4c3d4f 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c @@ -4,6 +4,7 @@ #include +#include #include #include @@ -50,13 +51,13 @@ eal_memalloc_sync_with_primary(void) int eal_memalloc_get_seg_fd(int list_idx, int seg_idx) { - return -1; + return -ENOTSUP; } int eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) { - return -1; + return -ENOTSUP; } int diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 034c2026a..4a80deaf5 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -550,6 +550,55 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg) return ret; } +int __rte_experimental +rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct rte_memseg_list *msl; + struct rte_fbarray *arr; + int msl_idx, seg_idx, ret; + + if (ms == NULL) { + rte_errno = EINVAL; + return -1; + } + + msl = rte_mem_virt2memseg_list(ms->addr); + if (msl == NULL) { + rte_errno = EINVAL; + return -1; + } + arr = &msl->memseg_arr; + + msl_idx = msl - mcfg->memsegs; + seg_idx = rte_fbarray_find_idx(arr, ms); + + if (!rte_fbarray_is_used(arr, seg_idx)) { + rte_errno = ENOENT; + return -1; + } + + ret = eal_memalloc_get_seg_fd(msl_idx, seg_idx); + if (ret < 0) { + rte_errno = -ret; + ret = -1; + } + return ret; +} + +int __rte_experimental +rte_memseg_get_fd(const struct rte_memseg *ms) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int ret; + + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + ret = rte_memseg_get_fd_thread_unsafe(ms); + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} + /* init memory subsystem */ int rte_eal_memory_init(void) diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h index a46c69c72..70a214de4 100644 --- a/lib/librte_eal/common/eal_memalloc.h +++ b/lib/librte_eal/common/eal_memalloc.h @@ -76,9 +76,11 @@ eal_memalloc_mem_alloc_validator_unregister(const char *name, int socket_id); int eal_memalloc_mem_alloc_validate(int socket_id, size_t new_len); +/* returns fd or -errno */ int eal_memalloc_get_seg_fd(int list_idx, int seg_idx); +/* returns 0 or -errno */ int eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd); diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h index c4b7f4cff..0d2a30056 100644 --- a/lib/librte_eal/common/include/rte_memory.h +++ b/lib/librte_eal/common/include/rte_memory.h @@ -317,6 +317,54 @@ rte_memseg_contig_walk_thread_unsafe(rte_memseg_contig_walk_t func, void *arg); int __rte_experimental rte_memseg_list_walk_thread_unsafe(rte_memseg_list_walk_t func, void *arg); +/** + * Return file descriptor associated with a particular memseg (if available). + * + * @note This function read-locks the memory hotplug subsystem, and thus cannot + * be used within memory-related callback functions. + * + * @note This returns an internal file descriptor. Performing any operations on + * this file descriptor is inherently dangerous, so it should be treated + * as read-only for all intents and purposes. + * + * @param ms + * A pointer to memseg for which to get file descriptor. + * + * @return + * Valid file descriptor in case of success. + * -1 in case of error, with ``rte_errno`` set to the following values: + * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg + * - ENODEV - ``ms`` fd is not available + * - ENOE
[dpdk-dev] [PATCH v3 0/9] Improve running DPDK without hugetlbfs mounpoint
This patchset further improves DPDK support for running without hugetlbfs mountpoints. First of all, it enables using memfd-created hugepages in in-memory mode. This way, instead of anonymous hugepages, we can get proper fd's for each page (or for the entire segment, if we're using single-file segments). Memfd will be used automatically if support for it was compiled and is available at runtime, however DPDK will fall back to using anonymous hugepages if such support is not available. The other thing this patchset does is exposing segment fd's through an external API. There is a lot of ugliness in current virtio/vhost code that deals with finding hugepage files through procfs, while all virtio really needs are fd's referring to the pages, and their offsets. Using this API, virtio will be able to access segment fd's directly, without the procfs magic. As a bonus, because we enabled use of memfd (given that sufficiently recent kernel version is used), once virtio support for getting segment fd's using the new API is implemented, virtio will also be able to work without having hugetlbfs mountpoints. Virtio support is not provided in this patchset, coordination and implementation of it is up to virtio maintainers. Once virtio support for this is in place, DPDK will have one less barrier for adoption in container space. v2->v3: - Fixed single file segments mode for fd offsets v1->v2: - Added a new API to retrieve segment offset into its fd Anatoly Burakov (9): fbarray: fix detach in noshconf mode eal: don't allow legacy mode with in-memory mode mem: raise maximum fd limit unconditionally memalloc: rename lock list to fd list memalloc: track page fd's in non-single file mode memalloc: add EAL-internal API to get and set segment fd's mem: add external API to retrieve page fd from EAL mem: allow querying offset into segment fd mem: support using memfd segments for in-memory mode lib/librte_eal/bsdapp/eal/eal_memalloc.c | 19 + lib/librte_eal/common/eal_common_fbarray.c | 4 + lib/librte_eal/common/eal_common_memory.c | 107 - lib/librte_eal/common/eal_common_options.c | 12 +- lib/librte_eal/common/eal_memalloc.h | 11 + lib/librte_eal/common/include/rte_memory.h | 97 + lib/librte_eal/linuxapp/eal/eal_memalloc.c | 452 + lib/librte_eal/linuxapp/eal/eal_memory.c | 64 ++- lib/librte_eal/rte_eal_version.map | 4 + 9 files changed, 672 insertions(+), 98 deletions(-) -- 2.17.1
[dpdk-dev] [PATCH v3 6/9] memalloc: add EAL-internal API to get and set segment fd's
Enable setting and retrieving segment fd's internally. For now, retrieving fd's will not be used anywhere until we get an external API, but it will be useful for things like virtio, where we wish to share segment fd's. Setting segment fd's will not be available as a public API at this time, but internally it is needed for legacy mode, because we're not allocating our hugepages in memalloc in legacy mode case, and we still need to store the fd. Another user of get segment fd API is memseg info dump, to show which pages use which fd's. Not supported on FreeBSD. Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal_memalloc.c | 12 + lib/librte_eal/common/eal_common_memory.c | 8 +-- lib/librte_eal/common/eal_memalloc.h | 6 +++ lib/librte_eal/linuxapp/eal/eal_memalloc.c | 60 +- lib/librte_eal/linuxapp/eal/eal_memory.c | 44 +--- 5 files changed, 109 insertions(+), 21 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c index f7f07abd6..a5fb09f71 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c @@ -47,6 +47,18 @@ eal_memalloc_sync_with_primary(void) return -1; } +int +eal_memalloc_get_seg_fd(int list_idx, int seg_idx) +{ + return -1; +} + +int +eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) +{ + return -1; +} + int eal_memalloc_init(void) { diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index fbfb1b055..034c2026a 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -294,7 +294,7 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms, void *arg) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - int msl_idx, ms_idx; + int msl_idx, ms_idx, fd; FILE *f = arg; msl_idx = msl - mcfg->memsegs; @@ -305,10 +305,11 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms, if (ms_idx < 0) return -1; + fd = eal_memalloc_get_seg_fd(msl_idx, ms_idx); fprintf(f, "Segment %i-%i: IOVA:0x%"PRIx64", len:%zu, " "virt:%p, socket_id:%"PRId32", " "hugepage_sz:%"PRIu64", nchannel:%"PRIx32", " - "nrank:%"PRIx32"\n", + "nrank:%"PRIx32" fd:%i\n", msl_idx, ms_idx, ms->iova, ms->len, @@ -316,7 +317,8 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms, ms->socket_id, ms->hugepage_sz, ms->nchannel, - ms->nrank); + ms->nrank, + fd); return 0; } diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h index 36bb1a027..a46c69c72 100644 --- a/lib/librte_eal/common/eal_memalloc.h +++ b/lib/librte_eal/common/eal_memalloc.h @@ -76,6 +76,12 @@ eal_memalloc_mem_alloc_validator_unregister(const char *name, int socket_id); int eal_memalloc_mem_alloc_validate(int socket_id, size_t new_len); +int +eal_memalloc_get_seg_fd(int list_idx, int seg_idx); + +int +eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd); + int eal_memalloc_init(void); diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 7d536350e..b820989e9 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -1334,16 +1334,10 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl, } static int -fd_list_create_walk(const struct rte_memseg_list *msl, - void *arg __rte_unused) +alloc_list(int list_idx, int len) { - struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - unsigned int i, len; - int msl_idx; int *data; - - msl_idx = msl - mcfg->memsegs; - len = msl->memseg_arr.len; + int i; /* ensure we have space to store fd per each possible segment */ data = malloc(sizeof(int) * len); @@ -1355,14 +1349,56 @@ fd_list_create_walk(const struct rte_memseg_list *msl, for (i = 0; i < len; i++) data[i] = -1; - fd_list[msl_idx].fds = data; - fd_list[msl_idx].len = len; - fd_list[msl_idx].count = 0; - fd_list[msl_idx].memseg_list_fd = -1; + fd_list[list_idx].fds = data; + fd_list[list_idx].len = len; + fd_list[list_idx].count = 0; + fd_list[list_idx].memseg_list_fd = -1; return 0; } +static int +fd_list_create_walk(const struct rte_memseg_list *msl, + void *arg __rte_unused) +{ + struct rte_mem_config *mcfg = rte_eal_get_configura
[dpdk-dev] [PATCH v3 1/9] fbarray: fix detach in noshconf mode
In noshconf mode, no shared files are created, but we're still trying to unlink them, resulting in detach/destroy failure even though it should have succeeded. Fix it by exiting early in noshconf mode. Fixes: 3ee2cde248a7 ("fbarray: support --no-shconf mode") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_fbarray.c | 4 1 file changed, 4 insertions(+) diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c index 43caf3ced..ba6c4ae39 100644 --- a/lib/librte_eal/common/eal_common_fbarray.c +++ b/lib/librte_eal/common/eal_common_fbarray.c @@ -878,6 +878,10 @@ rte_fbarray_destroy(struct rte_fbarray *arr) if (ret) return ret; + /* with no shconf, there were never any files to begin with */ + if (internal_config.no_shconf) + return 0; + /* try deleting the file */ eal_get_fbarray_path(path, sizeof(path), arr->name); -- 2.17.1
[dpdk-dev] [PATCH v3 5/9] memalloc: track page fd's in non-single file mode
Previously, we were only tracking lock file fd's in single-file segments mode, but did not track fd's in non-single file mode because we didn't need to (mmap() call still kept the lock). Now that we are going to expose these fd's to the world, we need to have access to them, so track them even in non-single file segments mode. We don't need to close fd's after mmap() because we're still tracking them in an fd list. Also, for anonymous hugepages mode, fd will always be -1 so exit early on error. Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 44 -- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 14bc5dce9..7d536350e 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -318,18 +318,24 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi, /* create a hugepage file path */ eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx * RTE_MAX_MEMSEG_PER_LIST + seg_idx); - fd = open(path, O_CREAT | O_RDWR, 0600); + + fd = fd_list[list_idx].fds[seg_idx]; + if (fd < 0) { - RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n", __func__, - strerror(errno)); - return -1; - } - /* take out a read lock */ - if (lock(fd, LOCK_SH) < 0) { - RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n", - __func__, strerror(errno)); - close(fd); - return -1; + fd = open(path, O_CREAT | O_RDWR, 0600); + if (fd < 0) { + RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n", + __func__, strerror(errno)); + return -1; + } + /* take out a read lock */ + if (lock(fd, LOCK_SH) < 0) { + RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n", + __func__, strerror(errno)); + close(fd); + return -1; + } + fd_list[list_idx].fds[seg_idx] = fd; } } return fd; @@ -601,10 +607,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, goto mapped; } #endif - /* for non-single file segments that aren't in-memory, we can close fd -* here */ - if (!internal_config.single_file_segments && !internal_config.in_memory) - close(fd); ms->addr = addr; ms->hugepage_sz = alloc_sz; @@ -634,7 +636,10 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, RTE_LOG(CRIT, EAL, "Can't mmap holes in our virtual address space\n"); } resized: - /* in-memory mode will never be single-file-segments mode */ + /* some codepaths will return negative fd, so exit early */ + if (fd < 0) + return -1; + if (internal_config.single_file_segments) { resize_hugefile(fd, path, list_idx, seg_idx, map_offset, alloc_sz, false); @@ -646,6 +651,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, lock(fd, LOCK_EX) == 1) unlink(path); close(fd); + fd_list[list_idx].fds[seg_idx] = -1; } return -1; } @@ -700,6 +706,7 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi, } /* closing fd will drop the lock */ close(fd); + fd_list[list_idx].fds[seg_idx] = -1; } memset(ms, 0, sizeof(*ms)); @@ -1364,8 +1371,7 @@ eal_memalloc_init(void) return -1; /* initialize all of the fd lists */ - if (internal_config.single_file_segments) - if (rte_memseg_list_walk(fd_list_create_walk, NULL)) - return -1; + if (rte_memseg_list_walk(fd_list_create_walk, NULL)) + return -1; return 0; } -- 2.17.1
[dpdk-dev] [PATCH v3 9/9] mem: support using memfd segments for in-memory mode
Enable using memfd-created segments if supported by the system. This will allow having real fd's for pages but without hugetlbfs mounts, which will enable in-memory mode to be used with virtio. The implementation is mostly piggy-backing on existing real-fd code, except that we no longer need to unlink any files or track per-page locks in single-file segments mode, because in-memory mode does not support secondary processes anyway. We move some checks from EAL command-line parsing code to memalloc because it is now possible to use single-file segments mode with in-memory mode, but only if memfd is supported. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_options.c | 6 +- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 265 ++--- 2 files changed, 235 insertions(+), 36 deletions(-) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 873099acc..ddd624110 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -1384,10 +1384,10 @@ eal_check_common_options(struct internal_config *internal_cfg) " is only supported in non-legacy memory mode\n"); } if (internal_cfg->single_file_segments && - internal_cfg->hugepage_unlink) { + internal_cfg->hugepage_unlink && + !internal_cfg->in_memory) { RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is " - "not compatible with neither --"OPT_IN_MEMORY" nor " - "--"OPT_HUGE_UNLINK"\n"); + "not compatible with --"OPT_HUGE_UNLINK"\n"); return -1; } if (internal_cfg->legacy_mem && diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 6fc9dc0ca..b2e2a9599 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -52,6 +52,23 @@ const int anonymous_hugepages_supported = #define RTE_MAP_HUGE_SHIFT 26 #endif +/* + * we don't actually care if memfd itself is supported - we only need to check + * if memfd supports hugetlbfs, as that already implies memfd support. + * + * also, this is not a constant, because while we may be *compiled* with memfd + * hugetlbfs support, we might not be *running* on a system that supports memfd + * and/or memfd with hugetlbfs, so we need to be able to adjust this flag at + * runtime, and fall back to anonymous memory. + */ +int memfd_create_supported = +#ifdef MFD_HUGETLB +#define MEMFD_SUPPORTED + 1; +#else + 0; +#endif + /* * not all kernel version support fallocate on hugetlbfs, so fall back to * ftruncate and disallow deallocation if fallocate is not supported. @@ -191,6 +208,31 @@ get_file_size(int fd) return st.st_size; } +static inline uint32_t +bsf64(uint64_t v) +{ + return (uint32_t)__builtin_ctzll(v); +} + +static inline uint32_t +log2_u64(uint64_t v) +{ + if (v == 0) + return 0; + v = rte_align64pow2(v); + return bsf64(v); +} + +static int +pagesz_flags(uint64_t page_sz) +{ + /* as per mmap() manpage, all page sizes are log2 of page size +* shifted by MAP_HUGE_SHIFT +*/ + int log2 = log2_u64(page_sz); + return log2 << RTE_MAP_HUGE_SHIFT; +} + /* returns 1 on successful lock, 0 on unsuccessful lock, -1 on error */ static int lock(int fd, int type) { @@ -287,12 +329,64 @@ static int unlock_segment(int list_idx, int seg_idx) return 0; } +static int +get_seg_memfd(struct hugepage_info *hi __rte_unused, + unsigned int list_idx __rte_unused, + unsigned int seg_idx __rte_unused) +{ +#ifdef MEMFD_SUPPORTED + int fd; + char segname[250]; /* as per manpage, limit is 249 bytes plus null */ + + if (internal_config.single_file_segments) { + fd = fd_list[list_idx].memseg_list_fd; + + if (fd < 0) { + int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz); + + snprintf(segname, sizeof(segname), "seg_%i", list_idx); + fd = memfd_create(segname, flags); + if (fd < 0) { + RTE_LOG(DEBUG, EAL, "%s(): memfd create failed: %s\n", + __func__, strerror(errno)); + return -1; + } + fd_list[list_idx].memseg_list_fd = fd; + } + } else { + fd = fd_list[list_idx].fds[seg_idx]; + + if (fd < 0) { + int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz); + + snprintf(segname, sizeof(segname), "seg_%i-%i", + list_idx, seg_idx); +
[dpdk-dev] [PATCH v3 3/9] mem: raise maximum fd limit unconditionally
Previously, when we allocated hugepages, we closed the fd's corresponding to them after we've done our mappings. Since we did mmap(), we didn't actually lose the reference, but file descriptors used for mmap() do not count against the fd limit. Since we are going to store all of our fd's, we will hit the fd limit much more often when using smaller page sizes. Fix this to raise the fd limit to maximum unconditionally. Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_memory.c | 20 1 file changed, 20 insertions(+) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index dbf19499e..dfb537f59 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -2204,6 +2205,25 @@ memseg_secondary_init(void) int rte_eal_memseg_init(void) { + /* increase rlimit to maximum */ + struct rlimit lim; + + if (getrlimit(RLIMIT_NOFILE, &lim) == 0) { + /* set limit to maximum */ + lim.rlim_cur = lim.rlim_max; + + if (setrlimit(RLIMIT_NOFILE, &lim) < 0) { + RTE_LOG(DEBUG, EAL, "Setting maximum number of open files failed: %s\n", + strerror(errno)); + } else { + RTE_LOG(DEBUG, EAL, "Setting maximum number of open files to %" + PRIu64 "\n", + (uint64_t)lim.rlim_cur); + } + } else { + RTE_LOG(ERR, EAL, "Cannot get current resource limits\n"); + } + return rte_eal_process_type() == RTE_PROC_PRIMARY ? #ifndef RTE_ARCH_64 memseg_primary_init_32() : -- 2.17.1
[dpdk-dev] [PATCH v3 2/9] eal: don't allow legacy mode with in-memory mode
In-memory mode was never meant to support legacy mode, because we cannot sort anonymous pages anyway. Fixes: 72b49ff623c4 ("mem: support --in-memory mode") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_options.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index dd5f97402..873099acc 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -1390,6 +1390,12 @@ eal_check_common_options(struct internal_config *internal_cfg) "--"OPT_HUGE_UNLINK"\n"); return -1; } + if (internal_cfg->legacy_mem && + internal_cfg->in_memory) { + RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible " + "with --"OPT_IN_MEMORY"\n"); + return -1; + } return 0; } -- 2.17.1
[dpdk-dev] [PATCH v3 7/9] mem: add external API to retrieve page fd from EAL
Now that we can retrieve page fd's internally, we can expose it as an external API. This will add two flavors of API - thread-safe and non-thread-safe. Fix up internal API's to return values we need without modifying rte_errno internally if called from within EAL. We do not want calling code to accidentally close an internal fd, so we make a duplicate of it before we return it to the user. Caller is therefore responsible for closing this fd. Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal_memalloc.c | 5 ++- lib/librte_eal/common/eal_common_memory.c | 49 ++ lib/librte_eal/common/eal_memalloc.h | 2 + lib/librte_eal/common/include/rte_memory.h | 48 + lib/librte_eal/linuxapp/eal/eal_memalloc.c | 21 ++ lib/librte_eal/rte_eal_version.map | 2 + 6 files changed, 118 insertions(+), 9 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c index a5fb09f71..80e4c3d4f 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c @@ -4,6 +4,7 @@ #include +#include #include #include @@ -50,13 +51,13 @@ eal_memalloc_sync_with_primary(void) int eal_memalloc_get_seg_fd(int list_idx, int seg_idx) { - return -1; + return -ENOTSUP; } int eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) { - return -1; + return -ENOTSUP; } int diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 034c2026a..4a80deaf5 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -550,6 +550,55 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg) return ret; } +int __rte_experimental +rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct rte_memseg_list *msl; + struct rte_fbarray *arr; + int msl_idx, seg_idx, ret; + + if (ms == NULL) { + rte_errno = EINVAL; + return -1; + } + + msl = rte_mem_virt2memseg_list(ms->addr); + if (msl == NULL) { + rte_errno = EINVAL; + return -1; + } + arr = &msl->memseg_arr; + + msl_idx = msl - mcfg->memsegs; + seg_idx = rte_fbarray_find_idx(arr, ms); + + if (!rte_fbarray_is_used(arr, seg_idx)) { + rte_errno = ENOENT; + return -1; + } + + ret = eal_memalloc_get_seg_fd(msl_idx, seg_idx); + if (ret < 0) { + rte_errno = -ret; + ret = -1; + } + return ret; +} + +int __rte_experimental +rte_memseg_get_fd(const struct rte_memseg *ms) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int ret; + + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + ret = rte_memseg_get_fd_thread_unsafe(ms); + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} + /* init memory subsystem */ int rte_eal_memory_init(void) diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h index a46c69c72..70a214de4 100644 --- a/lib/librte_eal/common/eal_memalloc.h +++ b/lib/librte_eal/common/eal_memalloc.h @@ -76,9 +76,11 @@ eal_memalloc_mem_alloc_validator_unregister(const char *name, int socket_id); int eal_memalloc_mem_alloc_validate(int socket_id, size_t new_len); +/* returns fd or -errno */ int eal_memalloc_get_seg_fd(int list_idx, int seg_idx); +/* returns 0 or -errno */ int eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd); diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h index c4b7f4cff..0d2a30056 100644 --- a/lib/librte_eal/common/include/rte_memory.h +++ b/lib/librte_eal/common/include/rte_memory.h @@ -317,6 +317,54 @@ rte_memseg_contig_walk_thread_unsafe(rte_memseg_contig_walk_t func, void *arg); int __rte_experimental rte_memseg_list_walk_thread_unsafe(rte_memseg_list_walk_t func, void *arg); +/** + * Return file descriptor associated with a particular memseg (if available). + * + * @note This function read-locks the memory hotplug subsystem, and thus cannot + * be used within memory-related callback functions. + * + * @note This returns an internal file descriptor. Performing any operations on + * this file descriptor is inherently dangerous, so it should be treated + * as read-only for all intents and purposes. + * + * @param ms + * A pointer to memseg for which to get file descriptor. + * + * @return + * Valid file descriptor in case of success. + * -1 in case of error, with ``rte_errno`` set to the following values: + * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg + * - ENODEV - ``ms`` fd is not available + * - ENOE
[dpdk-dev] [PATCH v3 8/9] mem: allow querying offset into segment fd
In a few cases, user may need to query offset into fd for a particular memory segment (for example, to selectively map pages). This commit adds a new API to do that. Signed-off-by: Anatoly Burakov --- Notes: v3: - Fix single file segments mode not working lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 +++ lib/librte_eal/common/eal_common_memory.c | 50 ++ lib/librte_eal/common/eal_memalloc.h | 3 ++ lib/librte_eal/common/include/rte_memory.h | 49 + lib/librte_eal/linuxapp/eal/eal_memalloc.c | 24 +++ lib/librte_eal/rte_eal_version.map | 2 + 6 files changed, 134 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c index 80e4c3d4f..06afbcc99 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c @@ -60,6 +60,12 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd) return -ENOTSUP; } +int +eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset) +{ + return -ENOTSUP; +} + int eal_memalloc_init(void) { diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 4a80deaf5..0b69804ff 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -599,6 +599,56 @@ rte_memseg_get_fd(const struct rte_memseg *ms) return ret; } +int __rte_experimental +rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms, + size_t *offset) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct rte_memseg_list *msl; + struct rte_fbarray *arr; + int msl_idx, seg_idx, ret; + + if (ms == NULL || offset == NULL) { + rte_errno = EINVAL; + return -1; + } + + msl = rte_mem_virt2memseg_list(ms->addr); + if (msl == NULL) { + rte_errno = EINVAL; + return -1; + } + arr = &msl->memseg_arr; + + msl_idx = msl - mcfg->memsegs; + seg_idx = rte_fbarray_find_idx(arr, ms); + + if (!rte_fbarray_is_used(arr, seg_idx)) { + rte_errno = ENOENT; + return -1; + } + + ret = eal_memalloc_get_seg_fd_offset(msl_idx, seg_idx, offset); + if (ret < 0) { + rte_errno = -ret; + ret = -1; + } + return ret; +} + +int __rte_experimental +rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int ret; + + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + ret = rte_memseg_get_fd_offset_thread_unsafe(ms, offset); + rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + + return ret; +} + /* init memory subsystem */ int rte_eal_memory_init(void) diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h index 70a214de4..af917c2f9 100644 --- a/lib/librte_eal/common/eal_memalloc.h +++ b/lib/librte_eal/common/eal_memalloc.h @@ -84,6 +84,9 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx); int eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd); +int +eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset); + int eal_memalloc_init(void); diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h index 0d2a30056..14bd277a4 100644 --- a/lib/librte_eal/common/include/rte_memory.h +++ b/lib/librte_eal/common/include/rte_memory.h @@ -365,6 +365,55 @@ rte_memseg_get_fd(const struct rte_memseg *ms); int __rte_experimental rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms); +/** + * Get offset into segment file descriptor associated with a particular memseg + * (if available). + * + * @note This function read-locks the memory hotplug subsystem, and thus cannot + * be used within memory-related callback functions. + * + * @param ms + * A pointer to memseg for which to get file descriptor. + * @param offset + * A pointer to offset value where the result will be stored. + * + * @return + * Valid file descriptor in case of success. + * -1 in case of error, with ``rte_errno`` set to the following values: + * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg + * - EINVAL - ``offset`` pointer was NULL + * - ENODEV - ``ms`` fd is not available + * - ENOENT - ``ms`` is an unused segment + * - ENOTSUP - segment fd's are not supported + */ +int __rte_experimental +rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset); + +/** + * Get offset into segment file descriptor associated with a particular memseg + * (if available). + * + * @note This function does not perform any locking, and is only safe to call + * from within memory-related callback functions. + * + * @param
[dpdk-dev] [PATCH v3 4/9] memalloc: rename lock list to fd list
Previously, we were only using lock lists to store per-page lock fd's because we cannot use modern fcntl() file description locks to lock parts of the page in single file segments mode. Now, we will be using this list to store either lock fd's (along with memseg list fd) in single file segments mode, or per-page fd's (and set memseg list fd to -1), so rename the list accordingly. Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 66 -- 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index aa95551a8..14bc5dce9 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -57,25 +57,33 @@ const int anonymous_hugepages_supported = */ static int fallocate_supported = -1; /* unknown */ -/* for single-file segments, we need some kind of mechanism to keep track of +/* + * we have two modes - single file segments, and file-per-page mode. + * + * for single-file segments, we need some kind of mechanism to keep track of * which hugepages can be freed back to the system, and which cannot. we cannot * use flock() because they don't allow locking parts of a file, and we cannot * use fcntl() due to issues with their semantics, so we will have to rely on a - * bunch of lockfiles for each page. + * bunch of lockfiles for each page. so, we will use 'fds' array to keep track + * of per-page lockfiles. we will store the actual segment list fd in the + * 'memseg_list_fd' field. + * + * for file-per-page mode, each page will have its own fd, so 'memseg_list_fd' + * will be invalid (set to -1), and we'll use 'fds' to keep track of page fd's. * * we cannot know how many pages a system will have in advance, but we do know * that they come in lists, and we know lengths of these lists. so, simply store * a malloc'd array of fd's indexed by list and segment index. * * they will be initialized at startup, and filled as we allocate/deallocate - * segments. also, use this to track memseg list proper fd. + * segments. */ static struct { int *fds; /**< dynamically allocated array of segment lock fd's */ int memseg_list_fd; /**< memseg list fd */ int len; /**< total length of the array */ int count; /**< entries used in an array */ -} lock_fds[RTE_MAX_MEMSEG_LISTS]; +} fd_list[RTE_MAX_MEMSEG_LISTS]; /** local copy of a memory map, used to synchronize memory hotplug in MP */ static struct rte_memseg_list local_memsegs[RTE_MAX_MEMSEG_LISTS]; @@ -209,12 +217,12 @@ static int get_segment_lock_fd(int list_idx, int seg_idx) char path[PATH_MAX] = {0}; int fd; - if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds)) + if (list_idx < 0 || list_idx >= (int)RTE_DIM(fd_list)) return -1; - if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len) + if (seg_idx < 0 || seg_idx >= fd_list[list_idx].len) return -1; - fd = lock_fds[list_idx].fds[seg_idx]; + fd = fd_list[list_idx].fds[seg_idx]; /* does this lock already exist? */ if (fd >= 0) return fd; @@ -236,8 +244,8 @@ static int get_segment_lock_fd(int list_idx, int seg_idx) return -1; } /* store it for future reference */ - lock_fds[list_idx].fds[seg_idx] = fd; - lock_fds[list_idx].count++; + fd_list[list_idx].fds[seg_idx] = fd; + fd_list[list_idx].count++; return fd; } @@ -245,12 +253,12 @@ static int unlock_segment(int list_idx, int seg_idx) { int fd, ret; - if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds)) + if (list_idx < 0 || list_idx >= (int)RTE_DIM(fd_list)) return -1; - if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len) + if (seg_idx < 0 || seg_idx >= fd_list[list_idx].len) return -1; - fd = lock_fds[list_idx].fds[seg_idx]; + fd = fd_list[list_idx].fds[seg_idx]; /* upgrade lock to exclusive to see if we can remove the lockfile */ ret = lock(fd, LOCK_EX); @@ -270,8 +278,8 @@ static int unlock_segment(int list_idx, int seg_idx) * and remove it from list anyway. */ close(fd); - lock_fds[list_idx].fds[seg_idx] = -1; - lock_fds[list_idx].count--; + fd_list[list_idx].fds[seg_idx] = -1; + fd_list[list_idx].count--; if (ret < 0) return -1; @@ -288,7 +296,7 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi, /* create a hugepage file path */ eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx); - fd = lock_fds[list_idx].memseg_list_fd; + fd = fd_list[list_idx].memseg_list_fd; if (fd < 0) { fd = open(path, O_CREAT | O_RDWR, 0600); @@ -304,7 +312,7 @@ get_seg_f
[dpdk-dev] [PATCH 1/3] lib/librte_table: add hash_func header files
This commit adds rte_table_hash_func.h and rte_table_hash_func_arm64.h to librte_table. This will replace hash_func.h and hash_func_arm64.h in the IP Pipeline application and in SoftNIC. This also adds a scalar implemetation of the x86_64 intrinsic for crc32 as a generic fallback. Signed-off-by: Cristian Dumitrescu Signed-off-by: Kevin Laatz --- lib/librte_table/Makefile| 2 + lib/librte_table/meson.build | 2 + lib/librte_table/rte_table_hash_func.h | 244 +++ lib/librte_table/rte_table_hash_func_arm64.h | 21 +++ lib/librte_table/rte_table_version.map | 14 ++ 5 files changed, 283 insertions(+) create mode 100644 lib/librte_table/rte_table_hash_func.h create mode 100644 lib/librte_table/rte_table_hash_func_arm64.h diff --git a/lib/librte_table/Makefile b/lib/librte_table/Makefile index 276d476..f935678 100644 --- a/lib/librte_table/Makefile +++ b/lib/librte_table/Makefile @@ -46,6 +46,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_TABLE)-include += rte_table_acl.h endif SYMLINK-$(CONFIG_RTE_LIBRTE_TABLE)-include += rte_table_hash.h SYMLINK-$(CONFIG_RTE_LIBRTE_TABLE)-include += rte_table_hash_cuckoo.h +SYMLINK-$(CONFIG_RTE_LIBRTE_TABLE)-include += rte_table_hash_func.h +SYMLINK-$(CONFIG_RTE_LIBRTE_TABLE)-include += rte_table_hash_func_arm64.h SYMLINK-$(CONFIG_RTE_LIBRTE_TABLE)-include += rte_lru.h ifeq ($(CONFIG_RTE_ARCH_X86),y) SYMLINK-$(CONFIG_RTE_LIBRTE_TABLE)-include += rte_lru_x86.h diff --git a/lib/librte_table/meson.build b/lib/librte_table/meson.build index 8b2f841..6ae3cd6 100644 --- a/lib/librte_table/meson.build +++ b/lib/librte_table/meson.build @@ -19,6 +19,8 @@ headers = files('rte_table.h', 'rte_table_lpm_ipv6.h', 'rte_table_hash.h', 'rte_table_hash_cuckoo.h', + 'rte_table_hash_func.h', + 'rte_table_hash_func_arm64.h', 'rte_lru.h', 'rte_table_array.h', 'rte_table_stub.h') diff --git a/lib/librte_table/rte_table_hash_func.h b/lib/librte_table/rte_table_hash_func.h new file mode 100644 index 000..4eadbfb --- /dev/null +++ b/lib/librte_table/rte_table_hash_func.h @@ -0,0 +1,244 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2018 Intel Corporation + */ + +#ifndef __INCLUDE_RTE_TABLE_HASH_FUNC_H__ +#define __INCLUDE_RTE_TABLE_HASH_FUNC_H__ + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +#include + +static inline uint64_t +rte_crc32_u64_generic(uint64_t crc, uint64_t value) +{ + int i; + + crc = (crc & 0xLLU) ^ value; + for (i = 63; i >= 0; i--) { + uint64_t mask; + + mask = -(crc & 1LLU); + crc = (crc >> 1LLU) ^ (0x82F63B78LLU & mask); + } + + return crc; +} + +#if defined(RTE_ARCH_X86_64) + +#include + +static inline uint64_t +rte_crc32_u64(uint64_t crc, uint64_t v) +{ + return _mm_crc32_u64(crc, v); +} + +#elif defined(RTE_ARCH_ARM64) +#include "rte_table_hash_func_arm64.h" +#else + +static inline uin64_t +rte_crc32_u64(uint64_t crc, uint64_t v) +{ + return rte_crc32_u64_generic(crc, v); +} + +#endif + +static inline uint64_t +rte_table_hash_crc_key8(void *key, void *mask, __rte_unused uint32_t key_size, + uint64_t seed) +{ + uint64_t *k = key; + uint64_t *m = mask; + uint64_t crc0; + + crc0 = rte_crc32_u64(seed, k[0] & m[0]); + + return crc0; +} + +static inline uint64_t +rte_table_hash_crc_key16(void *key, void *mask, __rte_unused uint32_t key_size, + uint64_t seed) +{ + uint64_t *k = key; + uint64_t *m = mask; + uint64_t k0, crc0, crc1; + + k0 = k[0] & m[0]; + + crc0 = rte_crc32_u64(k0, seed); + crc1 = rte_crc32_u64(k0 >> 32, k[1] & m[1]); + + crc0 ^= crc1; + + return crc0; +} + +static inline uint64_t +rte_table_hash_crc_key24(void *key, void *mask, __rte_unused uint32_t key_size, + uint64_t seed) +{ + uint64_t *k = key; + uint64_t *m = mask; + uint64_t k0, k2, crc0, crc1; + + k0 = k[0] & m[0]; + k2 = k[2] & m[2]; + + crc0 = rte_crc32_u64(k0, seed); + crc1 = rte_crc32_u64(k0 >> 32, k[1] & m[1]); + + crc0 = rte_crc32_u64(crc0, k2); + + crc0 ^= crc1; + + return crc0; +} + +static inline uint64_t +rte_table_hash_crc_key32(void *key, void *mask, __rte_unused uint32_t key_size, + uint64_t seed) +{ + uint64_t *k = key; + uint64_t *m = mask; + uint64_t k0, k2, crc0, crc1, crc2, crc3; + + k0 = k[0] & m[0]; + k2 = k[2] & m[2]; + + crc0 = rte_crc32_u64(k0, seed); + crc1 = rte_crc32_u64(k0 >> 32, k[1] & m[1]); + + crc2 = rte_crc32_u64(k2, k[3] & m[3]); + crc3 = k2 >> 32; + + crc0 = rte_crc32_u64(crc0, crc1); + crc1 = rte_crc32_u64(crc2, crc3); + + crc0 ^= crc1; + + return crc0; +} + +static inline uint64_t +rte_table_hash_crc_key40(void *k
[dpdk-dev] [PATCH 2/3] examples/ip_pipeline: modify application to use librte_table headers
This commit modifies the IP Pipeline application to use the new header files in librte_table. As we are now using the new header files, we can remove the old ones from the application directory. Signed-off-by: Cristian Dumitrescu Signed-off-by: Kevin Laatz --- examples/ip_pipeline/action.c | 34 ++-- examples/ip_pipeline/hash_func.h | 357 - examples/ip_pipeline/hash_func_arm64.h | 232 - examples/ip_pipeline/pipeline.c| 19 +- 4 files changed, 26 insertions(+), 616 deletions(-) delete mode 100644 examples/ip_pipeline/hash_func.h delete mode 100644 examples/ip_pipeline/hash_func_arm64.h diff --git a/examples/ip_pipeline/action.c b/examples/ip_pipeline/action.c index a29c2b3..20497c3 100644 --- a/examples/ip_pipeline/action.c +++ b/examples/ip_pipeline/action.c @@ -7,9 +7,9 @@ #include #include +#include #include "action.h" -#include "hash_func.h" /** * Input port @@ -57,35 +57,35 @@ port_in_action_profile_create(const char *name, (params->lb.f_hash == NULL)) { switch (params->lb.key_size) { case 8: - params->lb.f_hash = hash_default_key8; + params->lb.f_hash = rte_table_hash_crc_key8; break; case 16: - params->lb.f_hash = hash_default_key16; + params->lb.f_hash = rte_table_hash_crc_key16; break; case 24: - params->lb.f_hash = hash_default_key24; + params->lb.f_hash = rte_table_hash_crc_key24; break; case 32: - params->lb.f_hash = hash_default_key32; + params->lb.f_hash = rte_table_hash_crc_key32; break; case 40: - params->lb.f_hash = hash_default_key40; + params->lb.f_hash = rte_table_hash_crc_key40; break; case 48: - params->lb.f_hash = hash_default_key48; + params->lb.f_hash = rte_table_hash_crc_key48; break; case 56: - params->lb.f_hash = hash_default_key56; + params->lb.f_hash = rte_table_hash_crc_key56; break; case 64: - params->lb.f_hash = hash_default_key64; + params->lb.f_hash = rte_table_hash_crc_key64; break; default: @@ -192,35 +192,35 @@ table_action_profile_create(const char *name, (params->lb.f_hash == NULL)) { switch (params->lb.key_size) { case 8: - params->lb.f_hash = hash_default_key8; + params->lb.f_hash = rte_table_hash_crc_key8; break; case 16: - params->lb.f_hash = hash_default_key16; + params->lb.f_hash = rte_table_hash_crc_key16; break; case 24: - params->lb.f_hash = hash_default_key24; + params->lb.f_hash = rte_table_hash_crc_key24; break; case 32: - params->lb.f_hash = hash_default_key32; + params->lb.f_hash = rte_table_hash_crc_key32; break; case 40: - params->lb.f_hash = hash_default_key40; + params->lb.f_hash = rte_table_hash_crc_key40; break; case 48: - params->lb.f_hash = hash_default_key48; + params->lb.f_hash = rte_table_hash_crc_key48; break; case 56: - params->lb.f_hash = hash_default_key56; + params->lb.f_hash = rte_table_hash_crc_key56; break; case 64: - params->lb.f_hash = hash_default_key64; + params->lb.f_hash = rte_table_hash_crc_key64; break; default: diff --git a/examples/ip_pipeline/hash_func.h b/examples/ip_pipeline/hash_func.h deleted file mode 100644 index f1b9d94..000 --- a/examples/ip_pipeline/hash_func.h +++ /dev/null @@ -1,357 +0,0 @@ -/* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2018 Intel Corporation - */ - -#ifndef __INCLUDE_HASH_FUNC_H__ -#define __INCLUDE_HASH_FUNC_H__ - -static inline uint64_t -hash_xor_key8(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t xor0; - - xor0 = seed
[dpdk-dev] [PATCH 3/3] net/softnic: modify softnic to use librte_table headers
This commit modifies SoftNIC to make use of the new header files in librte_table. As we are now using the new header files in librte_table in SoftNIC, we no longer need the old header files so they can be removed. Signed-off-by: Cristian Dumitrescu Signed-off-by: Kevin Laatz --- drivers/net/softnic/hash_func.h| 359 - drivers/net/softnic/hash_func_arm64.h | 261 -- drivers/net/softnic/rte_eth_softnic_action.c | 34 +-- drivers/net/softnic/rte_eth_softnic_pipeline.c | 19 +- 4 files changed, 26 insertions(+), 647 deletions(-) delete mode 100644 drivers/net/softnic/hash_func.h delete mode 100644 drivers/net/softnic/hash_func_arm64.h diff --git a/drivers/net/softnic/hash_func.h b/drivers/net/softnic/hash_func.h deleted file mode 100644 index 198d2b2..000 --- a/drivers/net/softnic/hash_func.h +++ /dev/null @@ -1,359 +0,0 @@ -/* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2018 Intel Corporation - */ - -#ifndef __INCLUDE_HASH_FUNC_H__ -#define __INCLUDE_HASH_FUNC_H__ - -#include - -static inline uint64_t -hash_xor_key8(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t xor0; - - xor0 = seed ^ (k[0] & m[0]); - - return (xor0 >> 32) ^ xor0; -} - -static inline uint64_t -hash_xor_key16(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t xor0; - - xor0 = ((k[0] & m[0]) ^ seed) ^ (k[1] & m[1]); - - return (xor0 >> 32) ^ xor0; -} - -static inline uint64_t -hash_xor_key24(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t xor0; - - xor0 = ((k[0] & m[0]) ^ seed) ^ (k[1] & m[1]); - - xor0 ^= k[2] & m[2]; - - return (xor0 >> 32) ^ xor0; -} - -static inline uint64_t -hash_xor_key32(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t xor0, xor1; - - xor0 = ((k[0] & m[0]) ^ seed) ^ (k[1] & m[1]); - xor1 = (k[2] & m[2]) ^ (k[3] & m[3]); - - xor0 ^= xor1; - - return (xor0 >> 32) ^ xor0; -} - -static inline uint64_t -hash_xor_key40(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t xor0, xor1; - - xor0 = ((k[0] & m[0]) ^ seed) ^ (k[1] & m[1]); - xor1 = (k[2] & m[2]) ^ (k[3] & m[3]); - - xor0 ^= xor1; - - xor0 ^= k[4] & m[4]; - - return (xor0 >> 32) ^ xor0; -} - -static inline uint64_t -hash_xor_key48(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t xor0, xor1, xor2; - - xor0 = ((k[0] & m[0]) ^ seed) ^ (k[1] & m[1]); - xor1 = (k[2] & m[2]) ^ (k[3] & m[3]); - xor2 = (k[4] & m[4]) ^ (k[5] & m[5]); - - xor0 ^= xor1; - - xor0 ^= xor2; - - return (xor0 >> 32) ^ xor0; -} - -static inline uint64_t -hash_xor_key56(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t xor0, xor1, xor2; - - xor0 = ((k[0] & m[0]) ^ seed) ^ (k[1] & m[1]); - xor1 = (k[2] & m[2]) ^ (k[3] & m[3]); - xor2 = (k[4] & m[4]) ^ (k[5] & m[5]); - - xor0 ^= xor1; - xor2 ^= k[6] & m[6]; - - xor0 ^= xor2; - - return (xor0 >> 32) ^ xor0; -} - -static inline uint64_t -hash_xor_key64(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t xor0, xor1, xor2, xor3; - - xor0 = ((k[0] & m[0]) ^ seed) ^ (k[1] & m[1]); - xor1 = (k[2] & m[2]) ^ (k[3] & m[3]); - xor2 = (k[4] & m[4]) ^ (k[5] & m[5]); - xor3 = (k[6] & m[6]) ^ (k[7] & m[7]); - - xor0 ^= xor1; - xor2 ^= xor3; - - xor0 ^= xor2; - - return (xor0 >> 32) ^ xor0; -} - -#if defined(RTE_ARCH_X86_64) - -#include - -static inline uint64_t -hash_crc_key8(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t crc0; - - crc0 = _mm_crc32_u64(seed, k[0] & m[0]); - - return crc0; -} - -static inline uint64_t -hash_crc_key16(void *key, void *mask, __rte_unused uint32_t key_size, - uint64_t seed) -{ - uint64_t *k = key; - uint64_t *m = mask; - uint64_t k0, crc0, crc1; - - k0 = k[0] & m[0]; - - crc0 = _mm_crc32_u64(k0, seed); - crc1 = _mm_crc32_u64(k0 >> 32, k[1] & m[1]); - - crc0 ^= crc1; - - return crc0; -} - -static inline uint64_t -hash_crc_key24(void *key, void *ma
Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
Thanks for the patch. Do we need both of the state and istate struct? struct rte_hash_iterator_state seems not doing much. How about we only have one "state" struct and just not expose the internals to the public API, similar to the rte_hash struct or rte_member_setsum struct. And in _init function use rte_malloc to allocate the state and add a _free function to free it. Thanks Yipeng >-Original Message- >From: Qiaobin Fu [mailto:qiaob...@bu.edu] >Sent: Friday, August 31, 2018 9:51 AM >To: Richardson, Bruce ; De Lara Guarch, Pablo > >Cc: dev@dpdk.org; douce...@bu.edu; Wiles, Keith ; >Gobriel, Sameh ; Tai, Charlie >; step...@networkplumber.org; n...@arm.com; >honnappa.nagaraha...@arm.com; Wang, Yipeng1 >; mic...@digirati.com.br; qiaob...@bu.edu >Subject: [PATCH v3] hash table: add an iterator over conflicting entries > >Function rte_hash_iterate_conflict_entries() iterates over >the entries that conflict with an incoming entry. > >Iterating over conflicting entries enables one to decide >if the incoming entry is more valuable than the entries already >in the hash table. This is particularly useful after >an insertion failure. > >v3: >* Make the rte_hash_iterate() API similar to > rte_hash_iterate_conflict_entries() > >v2: >* Fix the style issue > >* Make the API more universal > >Signed-off-by: Qiaobin Fu >Reviewed-by: Cody Doucette >Reviewed-by: Michel Machado >Reviewed-by: Keith Wiles >Reviewed-by: Yipeng Wang >Reviewed-by: Honnappa Nagarahalli >--- > lib/librte_hash/Makefile | 1 + > lib/librte_hash/rte_cuckoo_hash.c| 132 +++ > lib/librte_hash/rte_hash.h | 80 ++-- > lib/librte_hash/rte_hash_version.map | 8 ++ > test/test/test_hash.c| 7 +- > test/test/test_hash_multiwriter.c| 8 +- > test/test/test_hash_readwrite.c | 16 ++-- > 7 files changed, 219 insertions(+), 33 deletions(-) > >diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile >index c8c435dfd..9be58a205 100644 >--- a/lib/librte_hash/Makefile >+++ b/lib/librte_hash/Makefile >@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk > # library name > LIB = librte_hash.a > >+CFLAGS += -DALLOW_EXPERIMENTAL_API > CFLAGS += -O3 > CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) > LDLIBS += -lrte_eal -lrte_ring >diff --git a/lib/librte_hash/rte_cuckoo_hash.c >b/lib/librte_hash/rte_cuckoo_hash.c >index f7b86c8c9..cf5b28196 100644 >--- a/lib/librte_hash/rte_cuckoo_hash.c >+++ b/lib/librte_hash/rte_cuckoo_hash.c >@@ -1300,45 +1300,143 @@ rte_hash_lookup_bulk_data(const struct rte_hash *h, >const void **keys, > return __builtin_popcountl(*hit_mask); > } > >+/* istate stands for internal state. */ >+struct rte_hash_iterator_istate { >+ const struct rte_hash *h; >+ uint32_t next; >+ uint32_t total_entries; >+}; >+ >+int32_t >+rte_hash_iterator_init(const struct rte_hash *h, >+ struct rte_hash_iterator_state *state) >+{ >+ struct rte_hash_iterator_istate *__state; >+ >+ RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL); >+ >+ __state = (struct rte_hash_iterator_istate *)state; >+ __state->h = h; >+ __state->next = 0; >+ __state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES; >+ >+ return 0; >+} >+ > int32_t >-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, >uint32_t *next) >+rte_hash_iterate( >+ struct rte_hash_iterator_state *state, const void **key, void **data) > { >+ struct rte_hash_iterator_istate *__state; > uint32_t bucket_idx, idx, position; > struct rte_hash_key *next_key; > >- RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL); >+ RETURN_IF_TRUE(((state == NULL) || (key == NULL) || >+ (data == NULL)), -EINVAL); >+ >+ __state = (struct rte_hash_iterator_istate *)state; > >- const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES; > /* Out of bounds */ >- if (*next >= total_entries) >+ if (__state->next >= __state->total_entries) > return -ENOENT; > > /* Calculate bucket and index of current iterator */ >- bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES; >- idx = *next % RTE_HASH_BUCKET_ENTRIES; >+ bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES; >+ idx = __state->next % RTE_HASH_BUCKET_ENTRIES; > > /* If current position is empty, go to the next one */ >- while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) { >- (*next)++; >+ while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) { >+ __state->next++; > /* End of table */ >- if (*next == total_entries) >+ if (__state->next == __state->total_entries) > return -ENOENT; >- bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES; >- idx = *next % RTE_HASH_BUCKET_ENTRIES; >+ bucket_i
Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
Hmm, I guess my comment is for code readability. If we don’t need the extra state that would be great. I think "rte_hash" is defined as an internal data structure but expose the type to the public header. Would this work? I propose to malloc inside function mostly because I think it is cleaner to the user. But your argument is valid. Depending on use case I think it is OK. Another comment is you put the total_entry in the state, is it for performance of the rte_hash_iterate? If you use it to iterate conflict entries, especially If you reuse same "state" struct and init it again and again for different keys, would this slow down the performance for your specific use case? Also iterate_conflic_entry may need reader lock protection. Thanks Yipeng >-Original Message- >From: Michel Machado [mailto:mic...@digirati.com.br] >Sent: Tuesday, September 4, 2018 12:08 PM >To: Wang, Yipeng1 ; Qiaobin Fu ; >Richardson, Bruce ; De >Lara Guarch, Pablo >Cc: dev@dpdk.org; douce...@bu.edu; Wiles, Keith ; >Gobriel, Sameh ; Tai, Charlie >; step...@networkplumber.org; n...@arm.com; >honnappa.nagaraha...@arm.com >Subject: Re: [PATCH v3] hash table: add an iterator over conflicting entries > >Hi Yipeng, > >On 09/04/2018 02:55 PM, Wang, Yipeng1 wrote: >> Do we need both of the state and istate struct? struct >> rte_hash_iterator_state seems not doing much. >> How about we only have one "state" struct and just not expose the internals >> to the public API, similar to the >> rte_hash struct or rte_member_setsum struct. >> And in _init function use rte_malloc to allocate the state and add a _free >> function to free it. > >The purpose of have struct state is to enable applications to >allocate iterator states on their execution stack or embedding iterator >states in larger structs to avoid an extra malloc()/free(). > >Do you foresee that the upcoming new underlying algorithm of hash >tables will need to dynamically allocate iterator states? > >[ ]'s >Michel Machado
Re: [dpdk-dev] [PATCH] doc: Clarify IOMMU usage with "uio-pci" kernel module
Hi Tone, >if the devices for used DPDK bound to the ``uio-pci`` kernel module, please make The three kernel modules which can be used for DPDK binding are vfio-pci, uio_pci_generic and igb_uio. Don't you mean here uio_pci_generic ? Regards, Rami Rosen בתאריך יום ג׳, 4 בספט׳ 2018, 11:59, מאת tone.z > When binding the devices used by DPDK to the "uio-pci" kernel module, > the IOMMU should be disabled in order not to break the > because of the virtual / physical address mapping. > > The patch clarifies the IOMMU configuration on both x86_64 and arm64 > systems. > > Signed-off-by: tone.zhang > --- > doc/guides/linux_gsg/linux_drivers.rst | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/doc/guides/linux_gsg/linux_drivers.rst > b/doc/guides/linux_gsg/linux_drivers.rst > index 371a817..8f9ec8f 100644 > --- a/doc/guides/linux_gsg/linux_drivers.rst > +++ b/doc/guides/linux_gsg/linux_drivers.rst > @@ -48,6 +48,13 @@ be loaded as shown below: > ``vfio-pci`` > 2.7.4 > >