RE: [EXT] Re: [PATCH 1/2] net/cnxk: update meter bpf ID in rq
> -Original Message- > From: Ferruh Yigit > Sent: Tuesday, January 11, 2022 6:16 PM > To: Rakesh Kudurumalla ; Nithin Kumar > Dabilpuram ; Kiran Kumar Kokkilagadda > ; Sunil Kumar Kori ; Satha > Koteswara Rao Kottidi > Cc: dev@dpdk.org > Subject: [EXT] Re: [PATCH 1/2] net/cnxk: update meter bpf ID in rq > > External Email > > -- > On 11/30/2021 6:41 AM, Rakesh Kudurumalla wrote: > > Patch updates configured meter bpf is in rq context during meter > > creation > > RQ is receive queue, right? Can you please use the long version for > clarification? Yes RQ is receive queue. Sure will use long version > > > > > Signed-off-by: Rakesh Kudurumalla > > --- > > drivers/net/cnxk/cn10k_rte_flow.c | 9 - > > drivers/net/cnxk/cnxk_ethdev_mtr.c | 25 ++--- > > 2 files changed, 22 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/net/cnxk/cn10k_rte_flow.c > > b/drivers/net/cnxk/cn10k_rte_flow.c > > index b830abe63e..402bb1c72f 100644 > > --- a/drivers/net/cnxk/cn10k_rte_flow.c > > +++ b/drivers/net/cnxk/cn10k_rte_flow.c > > @@ -36,20 +36,20 @@ cn10k_mtr_configure(struct rte_eth_dev *eth_dev, > > for (i = 0; actions[i].type != RTE_FLOW_ACTION_TYPE_END; i++) { > > if (actions[i].type == RTE_FLOW_ACTION_TYPE_METER) { > > mtr_conf = (const struct rte_flow_action_meter > > - *)(actions->conf); > > + *)(actions[i].conf); > > mtr_id = mtr_conf->mtr_id; > > is_mtr_act = true; > > } > > if (actions[i].type == RTE_FLOW_ACTION_TYPE_QUEUE) { > > q_conf = (const struct rte_flow_action_queue > > - *)(actions->conf); > > + *)(actions[i].conf); > > if (is_mtr_act) > > nix_mtr_rq_update(eth_dev, mtr_id, 1, > > &q_conf->index); > > } > > if (actions[i].type == RTE_FLOW_ACTION_TYPE_RSS) { > > rss_conf = (const struct rte_flow_action_rss > > - *)(actions->conf); > > + *)(actions[i].conf); > > if (is_mtr_act) > > nix_mtr_rq_update(eth_dev, mtr_id, > > rss_conf->queue_num, > > @@ -98,7 +98,7 @@ cn10k_rss_action_validate(struct rte_eth_dev > *eth_dev, > > return -EINVAL; > > } > > > > - if (eth_dev->data->dev_conf.rxmode.mq_mode != > RTE_ETH_MQ_RX_RSS) { > > + if (eth_dev->data->dev_conf.rxmode.mq_mode != ETH_MQ_RX_RSS) > { > > This change seems unintended. Please keep the original value. Will keep original value > > > > <...> > > > if (!capa) > > return -rte_mtr_error_set(error, EINVAL, > > - RTE_MTR_ERROR_TYPE_MTR_PARAMS, > NULL, > > - "NULL input parameter"); > > + > RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, > > + "NULL input parameter"); > > > > Previous indentation looks more consistent with DPDK coding guide. Will update to pervious indentation
RE: [PATCH] net/mlx5: relax headroom assertion
Hi, > -Original Message- > From: Dmitry Kozlyuk > Sent: Tuesday, December 28, 2021 11:21 AM > To: dev@dpdk.org > Cc: Slava Ovsiienko ; sta...@dpdk.org; Matan > Azrad > Subject: [PATCH] net/mlx5: relax headroom assertion > > A debug assertion in Single-Packet Receive Queue (SPRQ) mode required all > Rx mbufs to have a 128 byte headroom, based on the assumption that > rte_pktmbuf_init() sets it. > However, rte_pktmbuf_init() may set a smaller headroom if the dataroom is > insufficient, e.g. this is a natural case for split buffer segments. The > headroom can also be larger. > Only check the headroom size when vectored Rx routines are used because > they rely on it. Relax the assertion to require sufficient headroom size, not > an > exact one. > > Fixes: a0a45e8af723 ("net/mlx5: configure Rx queue for buffer split") > Cc: viachesl...@nvidia.com > Cc: sta...@dpdk.org > > Signed-off-by: Dmitry Kozlyuk > Acked-by: Matan Azrad Patch rebased and applied to next-net-mlx, Kindest regards, Raslan Darawsheh
[PATCH] app/testpmd: skip stopped queues when forwarding
After "port rxq|txq stop" the stopped queue was used in forwarding nonetheless, which may cause undefined behavior in the PMD. Record the configured queue state and account for it when launching forwarding as follows: ++-+-+---+ |RxQ |TxQ |Configured mode |Launch routine | ++-+-+---+ |stopped |stopped |*|- | |stopped |started |txonly |(configured) | |stopped |started |*|- | |started |stopped |*|rxonly | |started |started |*|(configured) | ++-+-+---+ Display stopped queues on "show port config rxtx". Fixes: 5f4ec54f1d16 ("testpmd: queue start and stop") Cc: jing.d.c...@intel.com Cc: sta...@dpdk.org Signed-off-by: Dmitry Kozlyuk Reviewed-by: Raslan Darawsheh --- app/test-pmd/cmdline.c | 8 app/test-pmd/config.c | 6 ++ app/test-pmd/testpmd.c | 18 -- app/test-pmd/testpmd.h | 10 ++ 4 files changed, 40 insertions(+), 2 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index e626b1c7d9..8b0920e23d 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -2702,6 +2702,14 @@ cmd_config_rxtx_queue_parsed(void *parsed_result, if (ret == -ENOTSUP) fprintf(stderr, "Function not supported in PMD\n"); + if (ret == 0) { + struct rte_port *port; + struct queue_state *states; + + port = &ports[res->portid]; + states = isrx ? port->rxq_state : port->txq_state; + states[res->qid].stopped = !isstart; + } } cmdline_parse_token_string_t cmd_config_rxtx_queue_port = diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 1722d6c8f8..7ce9cb483a 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -2817,6 +2817,9 @@ rxtx_config_display(void) rx_conf->share_qid); printf("\n"); } + for (qid = 0; qid < nb_rxq; qid++) + if (ports[pid].rxq_state[qid].stopped) + printf("RX queue %d is stopped\n", qid); /* per tx queue config only for first queue to be less verbose */ for (qid = 0; qid < 1; qid++) { @@ -2850,6 +2853,9 @@ rxtx_config_display(void) printf(" TX offloads=0x%"PRIx64" - TX RS bit threshold=%d\n", offloads_tmp, tx_rs_thresh_tmp); } + for (qid = 0; qid < nb_txq; qid++) + if (ports[pid].txq_state[qid].stopped) + printf("TX queue %d is stopped\n", qid); } } diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 6c387bde84..36ff845181 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -2152,6 +2152,8 @@ flush_fwd_rx_queues(void) for (rxp = 0; rxp < cur_fwd_config.nb_fwd_ports; rxp++) { for (rxq = 0; rxq < nb_rxq; rxq++) { port_id = fwd_ports_ids[rxp]; + if (ports[port_id].rxq_state[rxq].stopped) + continue; /** * testpmd can stuck in the below do while loop * if rte_eth_rx_burst() always returns nonzero @@ -2223,8 +2225,20 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t pkt_fwd) static int start_pkt_forward_on_core(void *fwd_arg) { - run_pkt_fwd_on_lcore((struct fwd_lcore *) fwd_arg, -cur_fwd_config.fwd_eng->packet_fwd); + struct fwd_lcore *fc = fwd_arg; + struct fwd_stream *fsm = fwd_streams[fc->stream_idx]; + struct queue_state *rxq = &ports[fsm->rx_port].rxq_state[fsm->rx_queue]; + struct queue_state *txq = &ports[fsm->tx_port].txq_state[fsm->tx_queue]; + struct fwd_engine *fwd_engine = cur_fwd_config.fwd_eng; + packet_fwd_t packet_fwd; + + /* Check if there will ever be any packets to send. */ + if (rxq->stopped && (txq->stopped || fwd_engine != &tx_only_engine)) + return 0; + /* Force rxonly mode if RxQ is started, but TxQ is stopped. */ + packet_fwd = !rxq->stopped && txq->stopped ? rx_only_engine.packet_fwd + : fwd_engine->packet_fwd; + run_pkt_fwd_on_lcore(fc, packet_fwd); return 0; } diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index 2149ecd93a..2744fa4d76 100644 --- a/app/test-pmd/testpmd.h +++ b/app/test-pmd/testpmd.h @@ -216,6 +216,12 @@ struct xstat_display_info { bool allocated; }; +/** Application state of a queue. */ +struct queue_sta
Re: [PATCH v2 1/1] mempool: implement index-based per core cache
On Thu, Jan 13, 2022 at 11:06 AM Dharmik Thakkar wrote: > > Current mempool per core cache implementation stores pointers to mbufs > On 64b architectures, each pointer consumes 8B > This patch replaces it with index-based implementation, > where in each buffer is addressed by (pool base address + index) > It reduces the amount of memory/cache required for per core cache > > L3Fwd performance testing reveals minor improvements in the cache > performance (L1 and L2 misses reduced by 0.60%) > with no change in throughput > > Suggested-by: Honnappa Nagarahalli > Signed-off-by: Dharmik Thakkar > Reviewed-by: Ruifeng Wang > --- > > /* Now fill in the response ... */ > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE Instead of having this #ifdef clutter everywhere for the pair, I think, we can define RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE once, and have a different implementation. i.e #ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE void x() { } void y() { } #else void x() { } void y() { } #endif call x(); y(); in the main code. > diff --git a/lib/mempool/rte_mempool_ops_default.c > b/lib/mempool/rte_mempool_ops_default.c > index 22fccf9d7619..3543cad9d4ce 100644 > --- a/lib/mempool/rte_mempool_ops_default.c > +++ b/lib/mempool/rte_mempool_ops_default.c > @@ -127,6 +127,13 @@ rte_mempool_op_populate_helper(struct rte_mempool *mp, > unsigned int flags, > obj = va + off; > obj_cb(mp, obj_cb_arg, obj, >(iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off)); > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE This is the only place used in C code. Since we are going compile time approach. Can make this unconditional? That will enable the use of this model in the application, without recompiling DPDK. All application needs to #define RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE 1 #include I believe enabling such structuring helps to avoid DPDK recompilation of code. > + /* Store pool base value to calculate indices for index-based > +* lcore cache implementation > +*/ > + if (i == 0) > + mp->pool_base_value = obj; > +#endif > rte_mempool_ops_enqueue_bulk(mp, &obj, 1); > off += mp->elt_size + mp->trailer_size; > } > -- > 2.17.1 >
[dpdk-dev] [PATCH v2 1/2] ethdev: support queue-based priority flow control
From: Jerin Jacob Based on device support and use-case need, there are two different ways to enable PFC. The first case is the port level PFC configuration, in this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to configure the PFC, and PFC frames will be generated using based on VLAN TC value. The second case is the queue level PFC configuration, in this case, Any packet field content can be used to steer the packet to the specific queue using rte_flow or RSS and then use rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping on each queue. Based on congestion selected on the specific queue, configured TC shall be used to generate PFC frames. Operation of these modes are mutually exclusive, when driver sets non zero value for rte_eth_dev_info::pfc_queue_tc_max, application must use queue level PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API instead of port level PFC configuration via rte_eth_dev_priority_flow_ctrl_set() API to realize PFC configuration. This patch enables the configuration for second case a.k.a queue based PFC also updates rte_eth_dev_priority_flow_ctrl_set() implmentaion to adheher to rte_eth_dev_info::pfc_queue_tc_max handling. Also updated libabigail.abignore to ignore the update to reserved fields in rte_eth_dev_info. Signed-off-by: Jerin Jacob Signed-off-by: Sunil Kumar Kori --- A driver implemtion based on this API is at https://patches.dpdk.org/project/dpdk/patch/20220111081831.881374-1-sk...@marvell.com/ RFC..v1 - Added queue based PFC config API instead port based v1..v2 - Updated libabigail.ignore as has_data_member_inserted_between = {offset_of(reserved_64s), end} - Updated doxygen comments of rte_eth_pfc_queue_conf devtools/libabigail.abignore | 5 ++ doc/guides/nics/features.rst | 5 +- doc/guides/rel_notes/release_22_03.rst | 3 + lib/ethdev/ethdev_driver.h | 6 +- lib/ethdev/rte_ethdev.c| 109 + lib/ethdev/rte_ethdev.h| 83 ++- lib/ethdev/version.map | 3 + 7 files changed, 208 insertions(+), 6 deletions(-) diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore index 4b676f317d..3bdecaaef0 100644 --- a/devtools/libabigail.abignore +++ b/devtools/libabigail.abignore @@ -11,3 +11,8 @@ ; Ignore generated PMD information strings [suppress_variable] name_regexp = _pmd_info$ + +;Ignore fields inserted in place of reserved fields of rte_eth_dev_info +[suppress_type] + name = rte_eth_dev_info + has_data_member_inserted_between = {offset_of(reserved_64s), end} diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index 27be2d2576..277a784f4e 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -379,9 +379,10 @@ Flow control Supports configuring link flow control. * **[implements] eth_dev_ops**: ``flow_ctrl_get``, ``flow_ctrl_set``, - ``priority_flow_ctrl_set``. + ``priority_flow_ctrl_set``, ``priority_flow_ctrl_queue_set``. * **[related]API**: ``rte_eth_dev_flow_ctrl_get()``, ``rte_eth_dev_flow_ctrl_set()``, - ``rte_eth_dev_priority_flow_ctrl_set()``. + ``rte_eth_dev_priority_flow_ctrl_set()``, ``rte_eth_dev_priority_flow_ctrl_queue_set()``. +* **[provides] rte_eth_dev_info**: ``pfc_queue_tc_max``. .. _nic_features_rate_limitation: diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst index 6d99d1eaa9..b75c0356e6 100644 --- a/doc/guides/rel_notes/release_22_03.rst +++ b/doc/guides/rel_notes/release_22_03.rst @@ -55,6 +55,9 @@ New Features Also, make sure to start the actual text at the margin. === +* **Added an API to enable queue based priority flow ctrl(PFC).** + + A new API, ``rte_eth_dev_priority_flow_ctrl_queue_set()``, was added. Removed Items - diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index d95605a355..e0bbfe89d7 100644 --- a/lib/ethdev/ethdev_driver.h +++ b/lib/ethdev/ethdev_driver.h @@ -532,6 +532,9 @@ typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev, /** @internal Setup priority flow control parameter on an Ethernet device. */ typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev, struct rte_eth_pfc_conf *pfc_conf); +/** @internal Queue setup for priority flow control parameter on an Ethernet device. */ +typedef int (*priority_flow_ctrl_queue_set_t)(struct rte_eth_dev *dev, + struct rte_eth_pfc_queue_conf *pfc_queue_conf); /** @internal Update RSS redirection table on an Ethernet device. */ typedef int (*reta_update_t)(struct rte_eth_dev *dev, @@ -1080,7 +1083,8 @@ struct eth_dev_ops { flow_ctrl_set_tflow_ctrl_set; /**< Setup flow control */ /** Setup priority flow control */ priority_flow_ctrl_set_t prio
[dpdk-dev] [PATCH v2 2/2] app/testpmd: add queue based pfc CLI options
From: Sunil Kumar Kori Patch adds command line options to configure queue based priority flow control. - Syntax command is given as below: set pfc_queue_ctrl rx\ tx - Example command to configure queue based priority flow control on rx and tx side for port 0, Rx queue 0, Tx queue 0 with pause time 2047 testpmd> set pfc_queue_ctrl 0 rx on 0 0 tx on 0 0 2047 Signed-off-by: Sunil Kumar Kori --- app/test-pmd/cmdline.c | 122 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 22 2 files changed, 144 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index e626b1c7d9..19dbcea4f1 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -544,6 +544,11 @@ static void cmd_help_long_parsed(void *parsed_result, "Set the priority flow control parameter on a" " port.\n\n" + "set pfc_queue_ctrl (port_id) rx (on|off) (tx_qid)" + " (tx_tc) tx (on|off) (rx_qid) (rx_tc) (pause_time)\n" + "Set the queue priority flow control parameter on a" + " given Rx and Tx queues of a port.\n\n" + "set stat_qmap (tx|rx) (port_id) (queue_id) (qmapping)\n" "Set statistics mapping (qmapping 0..15) for RX/TX" " queue on port.\n" @@ -7690,6 +7695,122 @@ cmdline_parse_inst_t cmd_priority_flow_control_set = { }, }; +struct cmd_queue_priority_flow_ctrl_set_result { + cmdline_fixed_string_t set; + cmdline_fixed_string_t pfc_queue_ctrl; + portid_t port_id; + cmdline_fixed_string_t rx; + cmdline_fixed_string_t rx_pfc_mode; + uint16_t tx_qid; + uint8_t tx_tc; + cmdline_fixed_string_t tx; + cmdline_fixed_string_t tx_pfc_mode; + uint16_t rx_qid; + uint8_t rx_tc; + uint16_t pause_time; +}; + +static void +cmd_queue_priority_flow_ctrl_set_parsed(void *parsed_result, + __rte_unused struct cmdline *cl, + __rte_unused void *data) +{ + struct cmd_queue_priority_flow_ctrl_set_result *res = parsed_result; + struct rte_eth_pfc_queue_conf pfc_queue_conf; + int rx_fc_enable, tx_fc_enable; + int ret; + + /* +* Rx on/off, flow control is enabled/disabled on RX side. This can +* indicate the RTE_ETH_FC_TX_PAUSE, Transmit pause frame at the Rx +* side. Tx on/off, flow control is enabled/disabled on TX side. This +* can indicate the RTE_ETH_FC_RX_PAUSE, Respond to the pause frame at +* the Tx side. +*/ + static enum rte_eth_fc_mode rx_tx_onoff_2_mode[2][2] = { + {RTE_ETH_FC_NONE, RTE_ETH_FC_TX_PAUSE}, + {RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_FULL} + }; + + memset(&pfc_queue_conf, 0, sizeof(struct rte_eth_pfc_queue_conf)); + rx_fc_enable = (!strncmp(res->rx_pfc_mode, "on", 2)) ? 1 : 0; + tx_fc_enable = (!strncmp(res->tx_pfc_mode, "on", 2)) ? 1 : 0; + pfc_queue_conf.mode = rx_tx_onoff_2_mode[rx_fc_enable][tx_fc_enable]; + pfc_queue_conf.rx_pause.tc = res->tx_tc; + pfc_queue_conf.rx_pause.tx_qid = res->tx_qid; + pfc_queue_conf.tx_pause.tc = res->rx_tc; + pfc_queue_conf.tx_pause.rx_qid = res->rx_qid; + pfc_queue_conf.tx_pause.pause_time = res->pause_time; + + ret = rte_eth_dev_priority_flow_ctrl_queue_set(res->port_id, + &pfc_queue_conf); + if (ret != 0) { + fprintf(stderr, + "bad queue priority flow control parameter, rc = %d\n", + ret); + } +} + +cmdline_parse_token_string_t cmd_q_pfc_set_set = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + set, "set"); +cmdline_parse_token_string_t cmd_q_pfc_set_flow_ctrl = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + pfc_queue_ctrl, "pfc_queue_ctrl"); +cmdline_parse_token_num_t cmd_q_pfc_set_portid = + TOKEN_NUM_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + port_id, RTE_UINT16); +cmdline_parse_token_string_t cmd_q_pfc_set_rx = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + rx, "rx"); +cmdline_parse_token_string_t cmd_q_pfc_set_rx_mode = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + rx_pfc_mode, "on#off"); +cmdline_parse_token_num_t cmd_q_pfc_set_tx_qid = + TOKEN_NUM_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + tx_qid, RTE_UINT16); +cmdline_parse_token_num_t cmd_q_pfc_set_tx_tc = + TOKEN_
RE: [PATCH 2/5] crypto/openssl: fix output of RSA verify op
Thank you for the comments. I agree that OpenSSL PMD needs a major refactoring in asym crypto. I have asked Akhil to reject this patch series. -Original Message- From: Kusztal, ArkadiuszX Sent: Tuesday, December 28, 2021 2:41 PM To: Ramkumar Balu ; Akhil Goyal ; Anoob Joseph ; Doherty, Declan ; Zhang, Roy Fan ; Ankur Dwivedi ; Tejasree Kondoj Cc: sta...@dpdk.org; dev@dpdk.org Subject: [EXT] RE: [PATCH 2/5] crypto/openssl: fix output of RSA verify op -- > -Original Message- > From: Ramkumar Balu > Sent: Monday, November 29, 2021 10:52 AM > To: Akhil Goyal ; Anoob Joseph > ; Doherty, Declan ; > Zhang, Roy Fan ; Ankur Dwivedi > ; Tejasree Kondoj > Cc: sta...@dpdk.org; dev@dpdk.org; Ramkumar > Subject: [PATCH 2/5] crypto/openssl: fix output of RSA verify op > > From: Ramkumar > > During RSA verify, the OpenSSL PMD fails to return the plaintext after > public key decryption. > This patch fixes the OpenSSL PMD to return the decrypted plaintext in > cipher.data / cipher.length fields > > Fixes: 3e9d6bd447fb ("crypto/openssl: add RSA and mod asym > operations") > Fixes: fe1606e0138c ("crypto/openssl: fix RSA verify operation") > Cc: sta...@dpdk.org > > Signed-off-by: Ramkumar > --- > drivers/crypto/openssl/rte_openssl_pmd.c | 16 +++- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/drivers/crypto/openssl/rte_openssl_pmd.c > b/drivers/crypto/openssl/rte_openssl_pmd.c > index 5794ed8159..3ab2c3b5c1 100644 > --- a/drivers/crypto/openssl/rte_openssl_pmd.c > +++ b/drivers/crypto/openssl/rte_openssl_pmd.c > @@ -1953,12 +1953,16 @@ process_openssl_rsa_op(struct rte_crypto_op > *cop, > break; > > case RTE_CRYPTO_ASYM_OP_VERIFY: > - tmp = rte_malloc(NULL, op->rsa.sign.length, 0); > + tmp = op->rsa.cipher.data; > if (tmp == NULL) { > - OPENSSL_LOG(ERR, "Memory allocation failed"); > - cop->status = RTE_CRYPTO_OP_STATUS_ERROR; > - break; > + tmp = rte_malloc(NULL, op->rsa.sign.length, 0); > + if (tmp == NULL) { > + OPENSSL_LOG(ERR, "Memory allocation > failed"); > + cop->status = > RTE_CRYPTO_OP_STATUS_ERROR; > + break; > + } > } > + > ret = RSA_public_decrypt(op->rsa.sign.length, > op->rsa.sign.data, > tmp, [Arek] - this function is deprecated and more importantly it properly handle only NO_PADDING situation (no der encoding, like pre TLS 1.2). OpenSSL code needs major refactor in this area soon (mostly in asymmetric crypto). > @@ -1974,7 +1978,9 @@ process_openssl_rsa_op(struct rte_crypto_op *cop, > OPENSSL_LOG(ERR, "RSA sign Verification failed"); > cop->status = RTE_CRYPTO_OP_STATUS_ERROR; > } > - rte_free(tmp); > + op->rsa.cipher.length = ret; > + if (tmp != op->rsa.cipher.data) > + rte_free(tmp); > break; > > default: > -- > 2.17.1
RE: [PATCH 1/1] mempool: implement index-based per core cache
Hi Dharmik, > > > >> Current mempool per core cache implementation stores pointers to mbufs > >> On 64b architectures, each pointer consumes 8B > >> This patch replaces it with index-based implementation, > >> where in each buffer is addressed by (pool base address + index) > >> It reduces the amount of memory/cache required for per core cache > >> > >> L3Fwd performance testing reveals minor improvements in the cache > >> performance (L1 and L2 misses reduced by 0.60%) > >> with no change in throughput > > > > I feel really sceptical about that patch and the whole idea in general: > > - From what I read above there is no real performance improvement observed. > > (In fact on my IA boxes mempool_perf_autotest reports ~20% slowdown, > > see below for more details). > > Currently, the optimizations (loop unroll and vectorization) are only > implemented for ARM64. > Similar optimizations can be implemented for x86 platforms which should close > the performance gap > and in my understanding should give better performance for a bulk size of 32. Might be, but I still don't see the reason for such effort. As you mentioned there is no performance improvement in 'real' apps: l3fwd, etc. on ARM64 even with vectorized version of the code. > > - Space utilization difference looks neglectable too. > > Sorry, I did not understand this point. As I understand one of the expectations from that patch was: reduce memory/cache required, which should improve cache utilization (less misses, etc.). Though I think such improvements would be neglectable and wouldn't cause any real performance gain. > > - The change introduces a new build time config option with a major > > limitation: > > All memzones in a pool have to be within the same 4GB boundary. > > To address it properly, extra changes will be required in init(/populate) > > part of the code. > > I agree to the above mentioned challenges and I am currently working on > resolving these issues. I still think that to justify such changes some really noticeable performance improvement needs to be demonstrated: double-digit speedup for l3fwd/ipsec-secgw/... Otherwise it just not worth the hassle. > > All that will complicate mempool code, will make it more error prone > > and harder to maintain. > > But, as there is no real gain in return - no point to add such extra > > complexity at all. > > > > Konstantin > >
RE: [PATCH v3 1/2] eventdev/crypto_adapter: move crypto ops to circular buffer
Hi Ganapati, > -Original Message- > From: Kundapura, Ganapati > Sent: Tuesday, January 11, 2022 4:07 PM > To: jerinjac...@gmail.com; Jayatheerthan, Jay ; > dev@dpdk.org > Cc: Gujjar, Abhinandan S > Subject: [PATCH v3 1/2] eventdev/crypto_adapter: move crypto ops to circular > buffer > > Move crypto ops to circular buffer to retain crypto ops when > cryptodev/eventdev are temporarily full > > --- > v3: > * update eca_ops_buffer_flush() to flush out all the crypto > ops out of circular buffer. > * remove freeing of failed crypto ops from eca_ops_enqueue_burst() > and add to cirular buffer for later processing. > > v2: > * reset cryptp adapter next cdev id before dequeueing from the Cryptp -> crypto > next cdev > --- > > Signed-off-by: Ganapati Kundapura I don't see sign off after applying the patch. Could you take a look? commit 18d9c3b1728f325dba5fe5dbb51df2366b70527a Author: Ganapati Kundapura Date: Tue Jan 11 04:36:30 2022 -0600 eventdev/crypto_adapter: move crypto ops to circular buffer Move crypto ops to circular buffer to retain crypto ops when cryptodev/eventdev are temporarily full > > diff --git a/lib/eventdev/rte_event_crypto_adapter.c > b/lib/eventdev/rte_event_crypto_adapter.c > index d840803..9086368 100644 > --- a/lib/eventdev/rte_event_crypto_adapter.c > +++ b/lib/eventdev/rte_event_crypto_adapter.c > @@ -25,11 +25,27 @@ > #define CRYPTO_ADAPTER_MEM_NAME_LEN 32 > #define CRYPTO_ADAPTER_MAX_EV_ENQ_RETRIES 100 > > +#define CRYPTO_ADAPTER_OPS_BUFFER_SZ (BATCH_SIZE + BATCH_SIZE) > #define > +CRYPTO_ADAPTER_BUFFER_SZ 1024 > + > /* Flush an instance's enqueue buffers every CRYPTO_ENQ_FLUSH_THRESHOLD > * iterations of eca_crypto_adapter_enq_run() > */ > #define CRYPTO_ENQ_FLUSH_THRESHOLD 1024 > > +struct crypto_ops_circular_buffer { > + /* index of head element in circular buffer */ > + uint16_t head; > + /* index of tail element in circular buffer */ > + uint16_t tail; > + /* number elements in buffer */ number elements -> number of elements > + uint16_t count; > + /* size of circular buffer */ > + uint16_t size; > + /* Pointer to hold rte_crypto_ops for batching */ > + struct rte_crypto_op **op_buffer; > +} __rte_cache_aligned; > + > struct event_crypto_adapter { > /* Event device identifier */ > uint8_t eventdev_id; > @@ -47,6 +63,8 @@ struct event_crypto_adapter { > struct crypto_device_info *cdevs; > /* Loop counter to flush crypto ops */ > uint16_t transmit_loop_count; > + /* Circular buffer for batching crypto ops to eventdev */ > + struct crypto_ops_circular_buffer ebuf; > /* Per instance stats structure */ > struct rte_event_crypto_adapter_stats crypto_stats; > /* Configuration callback for rte_service configuration */ @@ -93,8 > +111,8 @@ struct crypto_device_info { struct crypto_queue_pair_info { > /* Set to indicate queue pair is enabled */ > bool qp_enabled; > - /* Pointer to hold rte_crypto_ops for batching */ > - struct rte_crypto_op **op_buffer; > + /* Circular buffer for batching crypto ops to cdev */ > + struct crypto_ops_circular_buffer cbuf; > /* No of crypto ops accumulated */ > uint8_t len; > } __rte_cache_aligned; > @@ -141,6 +159,77 @@ eca_init(void) > return 0; > } > > +static inline bool > +eca_circular_buffer_batch_ready(struct crypto_ops_circular_buffer > +*bufp) { > + return bufp->count >= BATCH_SIZE; > +} > + > +static inline void > +eca_circular_buffer_free(struct crypto_ops_circular_buffer *bufp) { > + rte_free(bufp->op_buffer); > +} > + > +static inline int > +eca_circular_buffer_init(const char *name, > + struct crypto_ops_circular_buffer *bufp, > + uint16_t sz) > +{ > + bufp->op_buffer = rte_zmalloc(name, > + sizeof(struct rte_crypto_op *) * sz, > + 0); > + if (bufp->op_buffer == NULL) > + return -ENOMEM; > + > + bufp->size = sz; > + return 0; > +} > + > +static inline int > +eca_circular_buffer_add(struct crypto_ops_circular_buffer *bufp, > + struct rte_crypto_op *op) > +{ > + uint16_t *tailp = &bufp->tail; > + > + bufp->op_buffer[*tailp] = op; > + *tailp = (*tailp + 1) % bufp->size; Provide a comment saying you are taking care of buffer rollover condition > + bufp->count++; > + > + return 0; > +} > + > +static inline int > +eca_circular_buffer_flush_to_cdev(struct crypto_ops_circular_buffer *bufp, > + uint8_t cdev_id, uint16_t qp_id, > + uint16_t *nb_ops_flushed) This function is returning a value but caller does not check! > +{ > + uint16_t n = 0; > + uint16_t *headp = &bufp->head; > + uint16_t *tailp = &bufp->tail; > + struct rte_crypto_op **ops = bufp->op_buffer; > + > + if (*tailp > *headp) > +
Re: [dpdk-dev] [PATCH 1/2] devtools: remove ugly workaround from get maintainers
01/11/2021 14:35, Ferruh Yigit: > Linux kernel 'get_maintainer.pl' script supports running out of Linux > tree since commit > 31bb82c9caa9 ("get_maintainer: allow usage outside of kernel tree") > > As commit is a few years old now, integrating it to DPDK and removing > ugly workaround for it. > > Signed-off-by: Ferruh Yigit Applied without patch 2, thanks.
Re: [PATCH 1/1] ci: restrict concurrency
Hi, The explanation should be in the patch, not the cover letter. Actually, you don't need a cover letter for a single patch. Copying it here: " dpdk is fairly expensive to build in GitHub. It's helpful to abandon old builds as soon as there's a new build waiting instead of wasting resources on the previous round. " 12/01/2022 07:50, Josh Soref: > Signed-off-by: Josh Soref > --- > +concurrency: > + group: build-${{ matrix.config.os }}-${{ matrix.config.compiler }}-${{ > matrix.config.library }}-${{ matrix.config.cross }}-${{ matrix.config.mini > }}-${{ github.event.pull_request.number || github.ref }} > + cancel-in-progress: true The goal of the CI is to catch any issue in a submitted patch. Is your change cancelling a test of a patch when another one is submitted?
[PATCH] common/cnxk: enable NIX Tx interrupts errata
An errata exists whereby NIX may incorrectly overwrite the value in NIX_SQ_CTX_S[SQ_INT]. This may cause set interrupts to get cleared or causing an QINT when no error is outstanding. As a workaround, software should always read all SQ debug registers and not just rely on NIX_SQINT_E bits set in NIX_SQ_CTX_S[SQ_INT]. Also for detecting SQB faults software must read SQ context and check id next_sqb is NULL. Signed-off-by: Harman Kalra --- drivers/common/cnxk/roc_nix_irq.c | 69 ++- 1 file changed, 50 insertions(+), 19 deletions(-) diff --git a/drivers/common/cnxk/roc_nix_irq.c b/drivers/common/cnxk/roc_nix_irq.c index a5cd9d4b02..71971ef261 100644 --- a/drivers/common/cnxk/roc_nix_irq.c +++ b/drivers/common/cnxk/roc_nix_irq.c @@ -196,15 +196,42 @@ nix_lf_sq_irq_get_and_clear(struct nix *nix, uint16_t sq) return nix_lf_q_irq_get_and_clear(nix, sq, NIX_LF_SQ_OP_INT, ~0x1ff00); } -static inline void +static inline bool +nix_lf_is_sqb_null(struct dev *dev, int q) +{ + bool is_sqb_null = false; + volatile void *ctx; + int rc; + + rc = nix_q_ctx_get(dev, NIX_AQ_CTYPE_SQ, q, &ctx); + if (rc) { + plt_err("Failed to get sq context"); + } else { + is_sqb_null = + roc_model_is_cn9k() ? + (((__io struct nix_sq_ctx_s *)ctx)->next_sqb == +0) : + (((__io struct nix_cn10k_sq_ctx_s *)ctx) +->next_sqb == 0); + } + + return is_sqb_null; +} + +static inline uint8_t nix_lf_sq_debug_reg(struct nix *nix, uint32_t off) { + uint8_t err = 0; uint64_t reg; reg = plt_read64(nix->base + off); - if (reg & BIT_ULL(44)) - plt_err("SQ=%d err_code=0x%x", (int)((reg >> 8) & 0xf), - (uint8_t)(reg & 0xff)); + if (reg & BIT_ULL(44)) { + err = reg & 0xff; + /* Clear valid bit */ + plt_write64(BIT_ULL(44), nix->base + off); + } + + return err; } static void @@ -226,6 +253,7 @@ nix_lf_q_irq(void *param) struct dev *dev = &nix->dev; int q, cq, rq, sq; uint64_t intr; + uint8_t rc; intr = plt_read64(nix->base + NIX_LF_QINTX_INT(qintx)); if (intr == 0) @@ -266,22 +294,25 @@ nix_lf_q_irq(void *param) sq = q % nix->qints; irq = nix_lf_sq_irq_get_and_clear(nix, sq); - if (irq & BIT_ULL(NIX_SQINT_LMT_ERR)) { - plt_err("SQ=%d NIX_SQINT_LMT_ERR", sq); - nix_lf_sq_debug_reg(nix, NIX_LF_SQ_OP_ERR_DBG); - } - if (irq & BIT_ULL(NIX_SQINT_MNQ_ERR)) { - plt_err("SQ=%d NIX_SQINT_MNQ_ERR", sq); - nix_lf_sq_debug_reg(nix, NIX_LF_MNQ_ERR_DBG); - } - if (irq & BIT_ULL(NIX_SQINT_SEND_ERR)) { - plt_err("SQ=%d NIX_SQINT_SEND_ERR", sq); - nix_lf_sq_debug_reg(nix, NIX_LF_SEND_ERR_DBG); - } - if (irq & BIT_ULL(NIX_SQINT_SQB_ALLOC_FAIL)) { + /* Detect LMT store error */ + rc = nix_lf_sq_debug_reg(nix, NIX_LF_SQ_OP_ERR_DBG); + if (rc) + plt_err("SQ=%d NIX_SQINT_LMT_ERR, errcode %x", sq, rc); + + /* Detect Meta-descriptor enqueue error */ + rc = nix_lf_sq_debug_reg(nix, NIX_LF_MNQ_ERR_DBG); + if (rc) + plt_err("SQ=%d NIX_SQINT_MNQ_ERR, errcode %x", sq, rc); + + /* Detect Send error */ + rc = nix_lf_sq_debug_reg(nix, NIX_LF_SEND_ERR_DBG); + if (rc) + plt_err("SQ=%d NIX_SQINT_SEND_ERR, errcode %x", sq, rc); + + /* Detect SQB fault, read SQ context to check SQB NULL case */ + if (irq & BIT_ULL(NIX_SQINT_SQB_ALLOC_FAIL) || + nix_lf_is_sqb_null(dev, q)) plt_err("SQ=%d NIX_SQINT_SQB_ALLOC_FAIL", sq); - nix_lf_sq_debug_reg(nix, NIX_LF_SEND_ERR_DBG); - } } /* Clear interrupt */ -- 2.18.0
[PATCH v2 1/2] net/cnxk: update meter bpf ID in Recevie Queue
Patch updates configured meter bpf is in recevie queue context during meter creation Signed-off-by: Rakesh Kudurumalla Acked-by: Sunil Kumar Kori --- v2: Fix commit message drivers/net/cnxk/cn10k_rte_flow.c | 7 +++ drivers/net/cnxk/cnxk_ethdev_mtr.c | 21 - 2 files changed, 19 insertions(+), 9 deletions(-) diff --git a/drivers/net/cnxk/cn10k_rte_flow.c b/drivers/net/cnxk/cn10k_rte_flow.c index b830abe63e..529fb0e4b7 100644 --- a/drivers/net/cnxk/cn10k_rte_flow.c +++ b/drivers/net/cnxk/cn10k_rte_flow.c @@ -36,20 +36,20 @@ cn10k_mtr_configure(struct rte_eth_dev *eth_dev, for (i = 0; actions[i].type != RTE_FLOW_ACTION_TYPE_END; i++) { if (actions[i].type == RTE_FLOW_ACTION_TYPE_METER) { mtr_conf = (const struct rte_flow_action_meter - *)(actions->conf); + *)(actions[i].conf); mtr_id = mtr_conf->mtr_id; is_mtr_act = true; } if (actions[i].type == RTE_FLOW_ACTION_TYPE_QUEUE) { q_conf = (const struct rte_flow_action_queue - *)(actions->conf); + *)(actions[i].conf); if (is_mtr_act) nix_mtr_rq_update(eth_dev, mtr_id, 1, &q_conf->index); } if (actions[i].type == RTE_FLOW_ACTION_TYPE_RSS) { rss_conf = (const struct rte_flow_action_rss - *)(actions->conf); + *)(actions[i].conf); if (is_mtr_act) nix_mtr_rq_update(eth_dev, mtr_id, rss_conf->queue_num, @@ -171,7 +171,6 @@ cn10k_flow_create(struct rte_eth_dev *eth_dev, const struct rte_flow_attr *attr, return NULL; } } - for (i = 0; actions[i].type != RTE_FLOW_ACTION_TYPE_END; i++) { if (actions[i].type == RTE_FLOW_ACTION_TYPE_METER) { mtr = (const struct rte_flow_action_meter *)actions[i] diff --git a/drivers/net/cnxk/cnxk_ethdev_mtr.c b/drivers/net/cnxk/cnxk_ethdev_mtr.c index 39d8563826..cc783e5f86 100644 --- a/drivers/net/cnxk/cnxk_ethdev_mtr.c +++ b/drivers/net/cnxk/cnxk_ethdev_mtr.c @@ -35,7 +35,6 @@ static struct rte_mtr_capabilities mtr_capa = { .chaining_n_mtrs_per_flow_max = NIX_MTR_COUNT_PER_FLOW, .chaining_use_prev_mtr_color_supported = true, .chaining_use_prev_mtr_color_enforced = true, - .meter_rate_max = NIX_BPF_RATE_MAX / 8, /* Bytes per second */ .color_aware_srtcm_rfc2697_supported = true, .color_aware_trtcm_rfc2698_supported = true, .color_aware_trtcm_rfc4115_supported = true, @@ -180,13 +179,13 @@ cnxk_nix_mtr_capabilities_get(struct rte_eth_dev *dev, struct rte_mtr_capabilities *capa, struct rte_mtr_error *error) { - struct cnxk_eth_dev *eth_dev = cnxk_eth_pmd_priv(dev); - uint16_t count[ROC_NIX_BPF_LEVEL_MAX] = {0}; uint8_t lvl_mask = ROC_NIX_BPF_LEVEL_F_LEAF | ROC_NIX_BPF_LEVEL_F_MID | ROC_NIX_BPF_LEVEL_F_TOP; + struct cnxk_eth_dev *eth_dev = cnxk_eth_pmd_priv(dev); + uint16_t count[ROC_NIX_BPF_LEVEL_MAX] = {0}; struct roc_nix *nix = ð_dev->nix; - int rc; - int i; + uint32_t time_unit; + int rc, i; RTE_SET_USED(dev); @@ -207,6 +206,15 @@ cnxk_nix_mtr_capabilities_get(struct rte_eth_dev *dev, mtr_capa.meter_trtcm_rfc4115_n_max = mtr_capa.n_max; mtr_capa.meter_policy_n_max = mtr_capa.n_max; + rc = roc_nix_bpf_timeunit_get(nix, &time_unit); + if (rc) + return rc; + + mtr_capa.meter_rate_max = + NIX_BPF_RATE(time_unit, NIX_BPF_MAX_RATE_EXPONENT, +NIX_BPF_MAX_RATE_MANTISSA, 0) / + 8; + *capa = mtr_capa; return 0; } @@ -304,6 +312,9 @@ cnxk_nix_mtr_policy_validate(struct rte_eth_dev *dev, if (action->type == RTE_FLOW_ACTION_TYPE_DROP) supported[i] = true; + if (action->type == RTE_FLOW_ACTION_TYPE_VOID) + supported[i] = true; + if (!supported[i]) { sprintf(message, "%s action is not valid", -- 2.25.1
[PATCH v2 2/2] common/cnxk: update meter algorithm in band profile
Patch updates meter algorithm in nix band profile structure Signed-off-by: Rakesh Kudurumalla Acked-by: Sunil Kumar Kori --- v2: series patch of 1/2 drivers/common/cnxk/hw/nix.h | 5 --- drivers/common/cnxk/roc_nix_bpf.c | 61 +-- 2 files changed, 17 insertions(+), 49 deletions(-) diff --git a/drivers/common/cnxk/hw/nix.h b/drivers/common/cnxk/hw/nix.h index dd2ebecc6a..6931f1d1d2 100644 --- a/drivers/common/cnxk/hw/nix.h +++ b/drivers/common/cnxk/hw/nix.h @@ -2133,11 +2133,6 @@ struct nix_lso_format { ((NIX_BPF_RATE_CONST * ((256 + (mantissa)) << (exponent))) / \ (((1ull << (div_exp)) * 256 * policer_timeunit))) -/* Meter rate limits in Bits/Sec */ -#define NIX_BPF_RATE_MIN NIX_BPF_RATE(10, 0, 0, 0) -#define NIX_BPF_RATE_MAX \ - NIX_BPF_RATE(1, NIX_BPF_MAX_RATE_EXPONENT, NIX_BPF_MAX_RATE_MANTISSA, 0) - #define NIX_BPF_DEFAULT_ADJUST_MANTISSA 511 #define NIX_BPF_DEFAULT_ADJUST_EXPONENT 0 diff --git a/drivers/common/cnxk/roc_nix_bpf.c b/drivers/common/cnxk/roc_nix_bpf.c index 6996a54be0..46ed91e87b 100644 --- a/drivers/common/cnxk/roc_nix_bpf.c +++ b/drivers/common/cnxk/roc_nix_bpf.c @@ -38,48 +38,23 @@ meter_rate_to_nix(uint64_t value, uint64_t *exponent_p, uint64_t *mantissa_p, uint64_t *div_exp_p, uint32_t timeunit_p) { uint64_t div_exp, exponent, mantissa; - uint32_t time_us = timeunit_p; + uint32_t time_ns = timeunit_p; /* Boundary checks */ - if (value < NIX_BPF_RATE_MIN || value > NIX_BPF_RATE_MAX) + if (value < NIX_BPF_RATE(time_ns, 0, 0, 0) || + value > NIX_BPF_RATE(time_ns, NIX_BPF_MAX_RATE_EXPONENT, +NIX_BPF_MAX_RATE_MANTISSA, 0)) return 0; - if (value <= NIX_BPF_RATE(time_us, 0, 0, 0)) { - /* Calculate rate div_exp and mantissa using -* the following formula: -* -* value = (2E6 * (256 + mantissa) -* / ((1 << div_exp) * 256)) -*/ - div_exp = 0; - exponent = 0; - mantissa = NIX_BPF_MAX_RATE_MANTISSA; - - while (value < (NIX_BPF_RATE_CONST / (1 << div_exp))) - div_exp += 1; - - while (value < ((NIX_BPF_RATE_CONST * (256 + mantissa)) / - ((1 << div_exp) * 256))) - mantissa -= 1; - } else { - /* Calculate rate exponent and mantissa using -* the following formula: -* -* value = (2E6 * ((256 + mantissa) << exponent)) / 256 -* -*/ - div_exp = 0; - exponent = NIX_BPF_MAX_RATE_EXPONENT; - mantissa = NIX_BPF_MAX_RATE_MANTISSA; - - while (value < (NIX_BPF_RATE_CONST * (1 << exponent))) - exponent -= 1; - - while (value < - ((NIX_BPF_RATE_CONST * ((256 + mantissa) << exponent)) / - 256)) - mantissa -= 1; - } + div_exp = 0; + exponent = NIX_BPF_MAX_RATE_EXPONENT; + mantissa = NIX_BPF_MAX_RATE_MANTISSA; + + while (value < (NIX_BPF_RATE(time_ns, exponent, 0, 0))) + exponent -= 1; + + while (value < (NIX_BPF_RATE(time_ns, exponent, mantissa, 0))) + mantissa -= 1; if (div_exp > NIX_BPF_MAX_RATE_DIV_EXP || exponent > NIX_BPF_MAX_RATE_EXPONENT || @@ -94,7 +69,7 @@ meter_rate_to_nix(uint64_t value, uint64_t *exponent_p, uint64_t *mantissa_p, *mantissa_p = mantissa; /* Calculate real rate value */ - return NIX_BPF_RATE(time_us, exponent, mantissa, div_exp); + return NIX_BPF_RATE(time_ns, exponent, mantissa, div_exp); } static inline uint64_t @@ -195,11 +170,7 @@ nix_precolor_conv_table_write(struct roc_nix *roc_nix, uint64_t val, int64_t *addr; addr = PLT_PTR_ADD(nix->base, off); - /* FIXME: Currently writing to this register throwing kernel dump. -* plt_write64(val, addr); -*/ - PLT_SET_USED(val); - PLT_SET_USED(addr); + plt_write64(val, addr); } static uint8_t @@ -665,6 +636,7 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id, aq->prof.lmode = cfg->lmode; aq->prof.icolor = cfg->icolor; + aq->prof.meter_algo = cfg->alg; aq->prof.pc_mode = cfg->pc_mode; aq->prof.tnl_ena = cfg->tnl_ena; aq->prof.gc_action = cfg->action[ROC_NIX_BPF_COLOR_GREEN]; @@ -673,6 +645,7 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id, aq->prof_mask.lmode = ~(aq->prof_mask.lmode); aq->prof_mask.icolor = ~(aq->prof_mask.icolor); + aq->prof_mask.meter_algo = ~(aq->prof_mask.meter_algo); aq->prof_m
[PATCH v3 1/2] net/cnxk: update meter bpf ID in Receive Queue
Patch updates configured meter bpf is in receive queue context during meter creation Signed-off-by: Rakesh Kudurumalla Acked-by: Sunil Kumar Kori --- v3: Fix commit message spelling drivers/net/cnxk/cn10k_rte_flow.c | 7 +++ drivers/net/cnxk/cnxk_ethdev_mtr.c | 21 - 2 files changed, 19 insertions(+), 9 deletions(-) diff --git a/drivers/net/cnxk/cn10k_rte_flow.c b/drivers/net/cnxk/cn10k_rte_flow.c index b830abe63e..529fb0e4b7 100644 --- a/drivers/net/cnxk/cn10k_rte_flow.c +++ b/drivers/net/cnxk/cn10k_rte_flow.c @@ -36,20 +36,20 @@ cn10k_mtr_configure(struct rte_eth_dev *eth_dev, for (i = 0; actions[i].type != RTE_FLOW_ACTION_TYPE_END; i++) { if (actions[i].type == RTE_FLOW_ACTION_TYPE_METER) { mtr_conf = (const struct rte_flow_action_meter - *)(actions->conf); + *)(actions[i].conf); mtr_id = mtr_conf->mtr_id; is_mtr_act = true; } if (actions[i].type == RTE_FLOW_ACTION_TYPE_QUEUE) { q_conf = (const struct rte_flow_action_queue - *)(actions->conf); + *)(actions[i].conf); if (is_mtr_act) nix_mtr_rq_update(eth_dev, mtr_id, 1, &q_conf->index); } if (actions[i].type == RTE_FLOW_ACTION_TYPE_RSS) { rss_conf = (const struct rte_flow_action_rss - *)(actions->conf); + *)(actions[i].conf); if (is_mtr_act) nix_mtr_rq_update(eth_dev, mtr_id, rss_conf->queue_num, @@ -171,7 +171,6 @@ cn10k_flow_create(struct rte_eth_dev *eth_dev, const struct rte_flow_attr *attr, return NULL; } } - for (i = 0; actions[i].type != RTE_FLOW_ACTION_TYPE_END; i++) { if (actions[i].type == RTE_FLOW_ACTION_TYPE_METER) { mtr = (const struct rte_flow_action_meter *)actions[i] diff --git a/drivers/net/cnxk/cnxk_ethdev_mtr.c b/drivers/net/cnxk/cnxk_ethdev_mtr.c index 39d8563826..cc783e5f86 100644 --- a/drivers/net/cnxk/cnxk_ethdev_mtr.c +++ b/drivers/net/cnxk/cnxk_ethdev_mtr.c @@ -35,7 +35,6 @@ static struct rte_mtr_capabilities mtr_capa = { .chaining_n_mtrs_per_flow_max = NIX_MTR_COUNT_PER_FLOW, .chaining_use_prev_mtr_color_supported = true, .chaining_use_prev_mtr_color_enforced = true, - .meter_rate_max = NIX_BPF_RATE_MAX / 8, /* Bytes per second */ .color_aware_srtcm_rfc2697_supported = true, .color_aware_trtcm_rfc2698_supported = true, .color_aware_trtcm_rfc4115_supported = true, @@ -180,13 +179,13 @@ cnxk_nix_mtr_capabilities_get(struct rte_eth_dev *dev, struct rte_mtr_capabilities *capa, struct rte_mtr_error *error) { - struct cnxk_eth_dev *eth_dev = cnxk_eth_pmd_priv(dev); - uint16_t count[ROC_NIX_BPF_LEVEL_MAX] = {0}; uint8_t lvl_mask = ROC_NIX_BPF_LEVEL_F_LEAF | ROC_NIX_BPF_LEVEL_F_MID | ROC_NIX_BPF_LEVEL_F_TOP; + struct cnxk_eth_dev *eth_dev = cnxk_eth_pmd_priv(dev); + uint16_t count[ROC_NIX_BPF_LEVEL_MAX] = {0}; struct roc_nix *nix = ð_dev->nix; - int rc; - int i; + uint32_t time_unit; + int rc, i; RTE_SET_USED(dev); @@ -207,6 +206,15 @@ cnxk_nix_mtr_capabilities_get(struct rte_eth_dev *dev, mtr_capa.meter_trtcm_rfc4115_n_max = mtr_capa.n_max; mtr_capa.meter_policy_n_max = mtr_capa.n_max; + rc = roc_nix_bpf_timeunit_get(nix, &time_unit); + if (rc) + return rc; + + mtr_capa.meter_rate_max = + NIX_BPF_RATE(time_unit, NIX_BPF_MAX_RATE_EXPONENT, +NIX_BPF_MAX_RATE_MANTISSA, 0) / + 8; + *capa = mtr_capa; return 0; } @@ -304,6 +312,9 @@ cnxk_nix_mtr_policy_validate(struct rte_eth_dev *dev, if (action->type == RTE_FLOW_ACTION_TYPE_DROP) supported[i] = true; + if (action->type == RTE_FLOW_ACTION_TYPE_VOID) + supported[i] = true; + if (!supported[i]) { sprintf(message, "%s action is not valid", -- 2.25.1
[PATCH v3 2/2] common/cnxk: update meter algorithm in band profile
Patch updates meter algorithm in nix band profile structure Signed-off-by: Rakesh Kudurumalla Acked-by: Sunil Kumar Kori --- v3: series patch of 1/2 drivers/common/cnxk/hw/nix.h | 5 --- drivers/common/cnxk/roc_nix_bpf.c | 61 +-- 2 files changed, 17 insertions(+), 49 deletions(-) diff --git a/drivers/common/cnxk/hw/nix.h b/drivers/common/cnxk/hw/nix.h index dd2ebecc6a..6931f1d1d2 100644 --- a/drivers/common/cnxk/hw/nix.h +++ b/drivers/common/cnxk/hw/nix.h @@ -2133,11 +2133,6 @@ struct nix_lso_format { ((NIX_BPF_RATE_CONST * ((256 + (mantissa)) << (exponent))) / \ (((1ull << (div_exp)) * 256 * policer_timeunit))) -/* Meter rate limits in Bits/Sec */ -#define NIX_BPF_RATE_MIN NIX_BPF_RATE(10, 0, 0, 0) -#define NIX_BPF_RATE_MAX \ - NIX_BPF_RATE(1, NIX_BPF_MAX_RATE_EXPONENT, NIX_BPF_MAX_RATE_MANTISSA, 0) - #define NIX_BPF_DEFAULT_ADJUST_MANTISSA 511 #define NIX_BPF_DEFAULT_ADJUST_EXPONENT 0 diff --git a/drivers/common/cnxk/roc_nix_bpf.c b/drivers/common/cnxk/roc_nix_bpf.c index 6996a54be0..46ed91e87b 100644 --- a/drivers/common/cnxk/roc_nix_bpf.c +++ b/drivers/common/cnxk/roc_nix_bpf.c @@ -38,48 +38,23 @@ meter_rate_to_nix(uint64_t value, uint64_t *exponent_p, uint64_t *mantissa_p, uint64_t *div_exp_p, uint32_t timeunit_p) { uint64_t div_exp, exponent, mantissa; - uint32_t time_us = timeunit_p; + uint32_t time_ns = timeunit_p; /* Boundary checks */ - if (value < NIX_BPF_RATE_MIN || value > NIX_BPF_RATE_MAX) + if (value < NIX_BPF_RATE(time_ns, 0, 0, 0) || + value > NIX_BPF_RATE(time_ns, NIX_BPF_MAX_RATE_EXPONENT, +NIX_BPF_MAX_RATE_MANTISSA, 0)) return 0; - if (value <= NIX_BPF_RATE(time_us, 0, 0, 0)) { - /* Calculate rate div_exp and mantissa using -* the following formula: -* -* value = (2E6 * (256 + mantissa) -* / ((1 << div_exp) * 256)) -*/ - div_exp = 0; - exponent = 0; - mantissa = NIX_BPF_MAX_RATE_MANTISSA; - - while (value < (NIX_BPF_RATE_CONST / (1 << div_exp))) - div_exp += 1; - - while (value < ((NIX_BPF_RATE_CONST * (256 + mantissa)) / - ((1 << div_exp) * 256))) - mantissa -= 1; - } else { - /* Calculate rate exponent and mantissa using -* the following formula: -* -* value = (2E6 * ((256 + mantissa) << exponent)) / 256 -* -*/ - div_exp = 0; - exponent = NIX_BPF_MAX_RATE_EXPONENT; - mantissa = NIX_BPF_MAX_RATE_MANTISSA; - - while (value < (NIX_BPF_RATE_CONST * (1 << exponent))) - exponent -= 1; - - while (value < - ((NIX_BPF_RATE_CONST * ((256 + mantissa) << exponent)) / - 256)) - mantissa -= 1; - } + div_exp = 0; + exponent = NIX_BPF_MAX_RATE_EXPONENT; + mantissa = NIX_BPF_MAX_RATE_MANTISSA; + + while (value < (NIX_BPF_RATE(time_ns, exponent, 0, 0))) + exponent -= 1; + + while (value < (NIX_BPF_RATE(time_ns, exponent, mantissa, 0))) + mantissa -= 1; if (div_exp > NIX_BPF_MAX_RATE_DIV_EXP || exponent > NIX_BPF_MAX_RATE_EXPONENT || @@ -94,7 +69,7 @@ meter_rate_to_nix(uint64_t value, uint64_t *exponent_p, uint64_t *mantissa_p, *mantissa_p = mantissa; /* Calculate real rate value */ - return NIX_BPF_RATE(time_us, exponent, mantissa, div_exp); + return NIX_BPF_RATE(time_ns, exponent, mantissa, div_exp); } static inline uint64_t @@ -195,11 +170,7 @@ nix_precolor_conv_table_write(struct roc_nix *roc_nix, uint64_t val, int64_t *addr; addr = PLT_PTR_ADD(nix->base, off); - /* FIXME: Currently writing to this register throwing kernel dump. -* plt_write64(val, addr); -*/ - PLT_SET_USED(val); - PLT_SET_USED(addr); + plt_write64(val, addr); } static uint8_t @@ -665,6 +636,7 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id, aq->prof.lmode = cfg->lmode; aq->prof.icolor = cfg->icolor; + aq->prof.meter_algo = cfg->alg; aq->prof.pc_mode = cfg->pc_mode; aq->prof.tnl_ena = cfg->tnl_ena; aq->prof.gc_action = cfg->action[ROC_NIX_BPF_COLOR_GREEN]; @@ -673,6 +645,7 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id, aq->prof_mask.lmode = ~(aq->prof_mask.lmode); aq->prof_mask.icolor = ~(aq->prof_mask.icolor); + aq->prof_mask.meter_algo = ~(aq->prof_mask.meter_algo); aq->prof_m
Re: [PATCH 1/1] ci: restrict concurrency
On Thu, Jan 13, 2022, 6:42 AM Thomas Monjalon wrote: > Hi, > > The explanation should be in the patch, not the cover letter. > Actually, you don't need a cover letter for a single patch. > Copying it here: > " > dpdk is fairly expensive to build in GitHub. > > It's helpful to abandon old builds as soon as there's a new > build waiting instead of wasting resources on the previous > round. > " > > 12/01/2022 07:50, Josh Soref: > > Signed-off-by: Josh Soref > > --- > > +concurrency: > > + group: build-${{ matrix.config.os }}-${{ matrix.config.compiler > }}-${{ matrix.config.library }}-${{ matrix.config.cross }}-${{ > matrix.config.mini }}-${{ github.event.pull_request.number || github.ref }} > > + cancel-in-progress: true > > The goal of the CI is to catch any issue in a submitted patch. > Is your change cancelling a test of a patch when another one is submitted? > If it's on the same branch or if it's in the same pull request yes, otherwise, no. >
RE: [PATCH 3/8] ethdev: add mbuf dynfield for incomplete IP reassembly
Hi Konstantin, > > > Hardware IP reassembly may be incomplete for multiple reasons like > > > reassembly timeout reached, duplicate fragments, etc. > > > To save application cycles to process these packets again, a new > > > mbuf ol_flag (RTE_MBUF_F_RX_IPREASSEMBLY_INCOMPLETE) is added to > > > show that the mbuf received is not reassembled properly. > > > > If we use dynfiled for data, why not use dynflag for > > RTE_MBUF_F_RX_IPREASSEMBLY_INCOMPLETE? > > That way we can avoid introduced hardcoded (always defined) flags for that > > case. > > I have not looked into using dynflag. Will explore if it can be used. The intent of adding this feature is to reduce application cycles for IP reassembly. But if we use dynflag, it will take a lot of cycles to check if dyn flag is set or not. As I understand, it first need to be looked up in a linked list and then checked. And this will be checked for each packet even if there is no reassembly involved.
RE: [PATCH 2/8] ethdev: add dev op for IP reassembly configuration
Hi Konstantin, > > > > > > Another question - if we have reassembly_conf_set() would it make > sense > > to > > > > > > have also reassembly_conf_get? > > > > > > So user can retrieve current ip_reassembly config values? > > > > > > > > > > > The set/supported values can be retrieved using rte_eth_dev_info :: > > > > reass_capa > > > > > > > > Hmm, I thought rte_eth_dev_info :: reass_capa reports > > > > max supported values, not currently set values. > > > > Did I misunderstand something? > > > > > > > Reassembly configuration is expected to be a one-time setting and is not > > expected > > > to change multiple times in the application. > > > You are correct that rte_eth_dev_info :: reass_capa reports max supported > > values > > > by the PMD. > > > But if somebody uses the _set API, dev_info values will be overwritten. > > > However, a get API can be added, if we have some use case. > > > IMO, we can add it later if it will be required. > > > > Basically you forbid user to reconfigure this feature > > during application life-time? > > That sounds like a really strange approach to me and > > Probably will affect its usability in a negative way. > > Wonder why it has to be that restrictive? > > Also with the model you suggest, what would happen after user will do: > > dev_stop(); dev_configure();? > > Would rte_eth_dev_info :: reass_capa be reset to initial values, > > or user values will be preserved, or ...? > > > I am not restricting the user to not reconfigure the feature. > When dev_configure() is called again after dev_stop(), it will reset the > previously > set values to max ones. > However, if you insist the get API can be added. No strong opinion on that. On another thought, setting dev_info :: reass_capa to a max value and not changing it in reassembly_conf_set() will make more sense. The most common case, would be to get the max values and if they are not good Enough for the application, set lesser values using the new API. I do not see a use case to get the current values set. However, it may be used for debugging some driver issue related to these values. But, I believe that can be managed internally in the PMD. Do you suspect any other use case for get API?
RE: [PATCH] bus/ifpga: remove useless check while browsing devices
Hi, Thanks. > -Original Message- > From: Maxime Gouin > Sent: Wednesday, January 05, 2022 18:27 > To: dev@dpdk.org > Cc: Maxime Gouin ; Xu, Rosen > ; Zhang, Qi Z ; Zhang, Tianfei > ; Olivier Matz > Subject: [PATCH] bus/ifpga: remove useless check while browsing devices > > reported by code analysis tool C++test (version 10.4): > > > /build/dpdk-20.11/drivers/bus/ifpga/ifpga_bus.c > > 67Condition "afu_dev" is always evaluated to true > > 81Condition "afu_dev" is always evaluated to true > > The "for" loop already checks that afu_dev is not NULL. > > Fixes: 05fa3d4a6539 ("bus/ifpga: add Intel FPGA bus library") > > Signed-off-by: Maxime Gouin > Reviewed-by: Olivier Matz > --- > drivers/bus/ifpga/ifpga_bus.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c > index cbc680928486..c5c8bbd57219 100644 > --- a/drivers/bus/ifpga/ifpga_bus.c > +++ b/drivers/bus/ifpga/ifpga_bus.c > @@ -64,8 +64,7 @@ ifpga_find_afu_dev(const struct rte_rawdev *rdev, > struct rte_afu_device *afu_dev = NULL; > > TAILQ_FOREACH(afu_dev, &ifpga_afu_dev_list, next) { > - if (afu_dev && > - afu_dev->rawdev == rdev && > + if (afu_dev->rawdev == rdev && > !ifpga_afu_id_cmp(&afu_dev->id, afu_id)) > return afu_dev; > } > @@ -78,8 +77,7 @@ rte_ifpga_find_afu_by_name(const char *name) > struct rte_afu_device *afu_dev = NULL; > > TAILQ_FOREACH(afu_dev, &ifpga_afu_dev_list, next) { > - if (afu_dev && > - !strcmp(afu_dev->device.name, name)) > + if (!strcmp(afu_dev->device.name, name)) > return afu_dev; > } > return NULL; > -- > 2.30.2 Acked-by: Rosen Xu
[PATCH] net/mlx5: fix maximum packet headers size for TSO
The maximum packet headers size for TSO is calculated as a sum of Ethernet, VLAN, IPv6 and TCP headers (plus inner headers). The rationale behind choosing IPv6 and TCP is their headers are bigger than IPv4 and UDP respectively, giving us the maximum possible headers size. But it is not true for L3 headers. IPv4 header size (20 bytes) is smaller than IPv6 header size (40 bytes) only in the default case. There are up to 10 optional header fields called Options in case IHL > 5. This means that the maximum size of the IPv4 header is 60 bytes. Choosing the wrong maximum packets headers size causes inability to transmit multi-segment TSO packets with IPv4 Options present. PMD check that it is possible to inline all the packet headers and the packet headers size exceeds the expected maximum size. The maximum packet headers size was set to 192 bytes before, but its value has been reduced during Tx path refactor activity. Restore the proper maximum packet headers size for TSO. Fixes: 50724e1 ("net/mlx5: update Tx definitions") Cc: sta...@dpdk.org Signed-off-by: Alexander Kozyrev Acked-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_defs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index 36b384fa08..2d48fde010 100644 --- a/drivers/net/mlx5/mlx5_defs.h +++ b/drivers/net/mlx5/mlx5_defs.h @@ -50,7 +50,7 @@ #define MLX5_MAX_XSTATS 32 /* Maximum Packet headers size (L2+L3+L4) for TSO. */ -#define MLX5_MAX_TSO_HEADER (128u + 34u) +#define MLX5_MAX_TSO_HEADER 192U /* Inline data size required by NICs. */ #define MLX5_INLINE_HSIZE_NONE 0 -- 2.18.2
RE: [PATCH 3/8] ethdev: add mbuf dynfield for incomplete IP reassembly
Hi Akhil, > Hi Konstantin, > > > > Hardware IP reassembly may be incomplete for multiple reasons like > > > > reassembly timeout reached, duplicate fragments, etc. > > > > To save application cycles to process these packets again, a new > > > > mbuf ol_flag (RTE_MBUF_F_RX_IPREASSEMBLY_INCOMPLETE) is added to > > > > show that the mbuf received is not reassembled properly. > > > > > > If we use dynfiled for data, why not use dynflag for > > > RTE_MBUF_F_RX_IPREASSEMBLY_INCOMPLETE? > > > That way we can avoid introduced hardcoded (always defined) flags for that > > > case. > > > > I have not looked into using dynflag. Will explore if it can be used. > The intent of adding this feature is to reduce application cycles for IP > reassembly. > But if we use dynflag, it will take a lot of cycles to check if dyn flag is > set or not. > As I understand, it first need to be looked up in a linked list and then > checked. > And this will be checked for each packet even if there is no reassembly > involved. No, I don't think it is correct understanding. For dyn-flag it is the same approach as for dyn-field. At init time it selects the bit which will be used and return it'e value to the user. Then user will set/check the at runtime. So no linking list walks at runtime. All you missing comparing to hard-coded values: complier optimizations.
[PATCH] net/mlx5: fix wrong MPRQ WQE size assertion
Preparation of the stride size and the number of strides for Multi-Packet RQ was updated recently to accommodate the hardware limitation about minimum WQE size. The wrong assertion was introduced to ensure this limitation is met. Assert that the configured WQE size is not less than the minimum supported size. Fixes: 219f08a ("net/mlx5: fix missing adjustment MPRQ stride devargs") Signed-off-by: Alexander Kozyrev Acked-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_rxq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index ed21fbba1e..105e094fdf 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -1619,7 +1619,7 @@ mlx5_mprq_prepare(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, RTE_BIT32(log_def_stride_size)); log_stride_wqe_size = log_def_stride_num + log_def_stride_size; } - MLX5_ASSERT(log_stride_wqe_size < config->mprq.log_min_stride_wqe_size); + MLX5_ASSERT(log_stride_wqe_size >= config->mprq.log_min_stride_wqe_size); if (desc <= RTE_BIT32(*actual_log_stride_num)) goto unsupport; if (min_mbuf_size > RTE_BIT32(log_stride_wqe_size)) { -- 2.18.2
RE: [PATCH 2/8] ethdev: add dev op for IP reassembly configuration
> > > > > > > Another question - if we have reassembly_conf_set() would it make > > sense > > > to > > > > > > > have also reassembly_conf_get? > > > > > > > So user can retrieve current ip_reassembly config values? > > > > > > > > > > > > > The set/supported values can be retrieved using rte_eth_dev_info :: > > > > > reass_capa > > > > > > > > > > Hmm, I thought rte_eth_dev_info :: reass_capa reports > > > > > max supported values, not currently set values. > > > > > Did I misunderstand something? > > > > > > > > > Reassembly configuration is expected to be a one-time setting and is not > > > expected > > > > to change multiple times in the application. > > > > You are correct that rte_eth_dev_info :: reass_capa reports max > > > > supported > > > values > > > > by the PMD. > > > > But if somebody uses the _set API, dev_info values will be overwritten. > > > > However, a get API can be added, if we have some use case. > > > > IMO, we can add it later if it will be required. > > > > > > Basically you forbid user to reconfigure this feature > > > during application life-time? > > > That sounds like a really strange approach to me and > > > Probably will affect its usability in a negative way. > > > Wonder why it has to be that restrictive? > > > Also with the model you suggest, what would happen after user will do: > > > dev_stop(); dev_configure();? > > > Would rte_eth_dev_info :: reass_capa be reset to initial values, > > > or user values will be preserved, or ...? > > > > > I am not restricting the user to not reconfigure the feature. > > When dev_configure() is called again after dev_stop(), it will reset the > > previously > > set values to max ones. > > However, if you insist the get API can be added. No strong opinion on that. > > On another thought, setting dev_info :: reass_capa to a max value and not > changing it > in reassembly_conf_set() will make more sense. Yes, agree. > The most common case, would be to get the max values and if they are not good > Enough for the application, set lesser values using the new API. > I do not see a use case to get the current values set. However, it may be > used for debugging > some driver issue related to these values. But, I believe that can be managed > internally > in the PMD. Do you suspect any other use case for get API? I think it would be really plausible for both user and ethdev layer to have an ability to get values that are currently in place.
RE: [PATCH 3/8] ethdev: add mbuf dynfield for incomplete IP reassembly
> Hi Akhil, > > > Hi Konstantin, > > > > > Hardware IP reassembly may be incomplete for multiple reasons like > > > > > reassembly timeout reached, duplicate fragments, etc. > > > > > To save application cycles to process these packets again, a new > > > > > mbuf ol_flag (RTE_MBUF_F_RX_IPREASSEMBLY_INCOMPLETE) is added > to > > > > > show that the mbuf received is not reassembled properly. > > > > > > > > If we use dynfiled for data, why not use dynflag for > > > > RTE_MBUF_F_RX_IPREASSEMBLY_INCOMPLETE? > > > > That way we can avoid introduced hardcoded (always defined) flags for > that > > > > case. > > > > > > I have not looked into using dynflag. Will explore if it can be used. > > The intent of adding this feature is to reduce application cycles for IP > reassembly. > > But if we use dynflag, it will take a lot of cycles to check if dyn flag is > > set or > not. > > As I understand, it first need to be looked up in a linked list and then > > checked. > > And this will be checked for each packet even if there is no reassembly > involved. > > No, I don't think it is correct understanding. > For dyn-flag it is the same approach as for dyn-field. > At init time it selects the bit which will be used and return it'e value to > the user. > Then user will set/check the at runtime. > So no linking list walks at runtime. > All you missing comparing to hard-coded values: complier optimizations. > Ok, got it. rte_mbuf_dynflag_lookup() need to happen only for the first mbuf. I was checking is_timestamp_enabled() in test-pmd. Didn't see that dynflag was a static variable. I thought it was happening for each packet.
Re: ETH_RSS_IP only does not seem to balance traffic
On Thu, Jan 13, 2022 at 12:52:04AM +0900, Yasuhiro Ohara wrote: > > Hi, > > My system developper friend recently ran into a problem where > l3fwd does not appear to receive balanced traffic on Intel XL710, > but it is resolved when the attached patch is applied. > > -.rss_hf = ETH_RSS_IP, > +.rss_hf = ETH_RSS_IP | ETH_RSS_TCP | ETH_RSS_UDP, > > IIRC I ran into a similar problem 3 or 4 years back, > but didn't report then because I believed I was doing something silly. > But since my friend is an experienced engineer, I feel like > it may be better (for the community) to ask this in the list. > > We are using dpdk-stable-18.11.6 and igb_uio. > > What are we doing wrong? > > If it is not a FAQ, I can test it again with more recent stable, > and will report the details. > For XL710 NICs, I believe that ETH_RSS_IP load balances only IP frames that do not have TCP or UDP headers also. Adding i40e driver maintainer on CC to comment further. /Bruce
[PATCH] net/bnxt: add back dependency to virt kmods
During a large refactoring sweep for 21.11, a previous commit removed the dependency the bnxt driver had on Linux virtual bus drivers, such as vfio-pci. This breaks port detection. This patch adds the kmod dependency back as it was. Fixes: 295968d17407 ("ethdev: add namespace") Signed-off-by: Geoffrey Le Gourriérec --- drivers/net/bnxt/bnxt_ethdev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index f79f33ab4e17..3ca6b7be7b6a 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -6302,4 +6302,5 @@ bool is_bnxt_supported(struct rte_eth_dev *dev) RTE_LOG_REGISTER_SUFFIX(bnxt_logtype_driver, driver, NOTICE); RTE_PMD_REGISTER_PCI(net_bnxt, bnxt_rte_pmd); RTE_PMD_REGISTER_PCI_TABLE(net_bnxt, bnxt_pci_id_map); +RTE_PMD_REGISTER_KMOD_DEP(net_bnxt, "* igb_uio | uio_pci_generic | vfio-pci"); -- 2.30.2
Re: [PATCH] build: add missing arch define for Arm
17/12/2021 09:54, Ruifeng Wang: > As per design document, RTE_ARCH is the name of the architecture. > However, the definition was missing on Arm with meson build. > It impacts applications that refers to this string. > > Added for Arm builds. > > Fixes: b1d48c41189a ("build: support ARM with meson") > Cc: sta...@dpdk.org > > Signed-off-by: Ruifeng Wang > --- > ['RTE_ARCH_ARMv8_AARCH32', true], > +['RTE_ARCH', 'arm64_aarch32'], Why not armv8_aarch32? [...] > dpdk_conf.set('RTE_ARCH_ARMv7', true) > +dpdk_conf.set('RTE_ARCH', 'armv7') [...] > # armv8 build > +dpdk_conf.set('RTE_ARCH', 'arm64') Why not armv8? What I prefer the most in silicon industry is the naming craziness :)
[PATCH 0/6] allow more DPDK libraries to be disabled on build
A common request on-list has been to allow more of the DPDK build to be disabled by those who are doing their own builds and only use a subset of the libraries. To this end, this patchset makes some infrastructure changes [first two patches] to make it easier to have libraries disabled, and then adds a six libraries to the "optional" list. Bruce Richardson (6): lib: allow recursive disabling of libs in build app/test: link unit test binary against all available libs build: add node library to optional list build: add flow classification library to optional list build: add "packet framework" libs to optional list build: add cfgfile library to optional list app/test/meson.build | 74 lib/meson.build | 30 -- 2 files changed, 40 insertions(+), 64 deletions(-) -- 2.32.0
[PATCH 1/6] lib: allow recursive disabling of libs in build
Align the code in lib/meson.build with that in drivers/meson.build to enable recursive disabling of libraries, i.e. if library b depends on library a, disable library b if a is disabled (either explicitly or implicitly). This allows libraries to be optional even if other DPDK libs depend on them, something that was not previously possible. Signed-off-by: Bruce Richardson --- lib/meson.build | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/lib/meson.build b/lib/meson.build index fbaa6ef7c2..af4662e942 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -134,23 +134,29 @@ foreach l:libraries warning('Library name, "@0@", and directory name, "@1@", do not match'.format(name, l)) endif -if not build -dpdk_libs_disabled += name -set_variable(name.underscorify() + '_disable_reason', reason) -continue -endif - shared_deps = ext_deps static_deps = ext_deps foreach d:deps +if not build +break +endif if not is_variable('shared_rte_' + d) -error('Missing internal dependency "@0@" for @1@ [@2@]' +build = false +reason = 'missing internal dependency, "@0@"'.format(d) +message('Disabling @1@ [@2@]: missing internal dependency "@0@"' .format(d, name, 'lib/' + l)) +else +shared_deps += [get_variable('shared_rte_' + d)] +static_deps += [get_variable('static_rte_' + d)] endif -shared_deps += [get_variable('shared_rte_' + d)] -static_deps += [get_variable('static_rte_' + d)] endforeach +if not build +dpdk_libs_disabled += name +set_variable(name.underscorify() + '_disable_reason', reason) +continue +endif + enabled_libs += name dpdk_conf.set('RTE_LIB_' + name.to_upper(), 1) install_headers(headers) -- 2.32.0
[PATCH 2/6] app/test: link unit test binary against all available libs
Rather than maintaining a list of the libraries the unit tests need, and having to conditionally include/omit optional libs from the list, we can just link against all available libraries, simplifying the code considerably. Signed-off-by: Bruce Richardson --- app/test/meson.build | 47 +--- 1 file changed, 1 insertion(+), 46 deletions(-) diff --git a/app/test/meson.build b/app/test/meson.build index 344a609a4d..9919de4307 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -157,39 +157,7 @@ test_sources = files( 'virtual_pmd.c', ) -test_deps = [ -'acl', -'bus_pci', -'bus_vdev', -'bpf', -'cfgfile', -'cmdline', -'cryptodev', -'distributor', -'dmadev', -'efd', -'ethdev', -'eventdev', -'fib', -'flow_classify', -'graph', -'hash', -'ipsec', -'lpm', -'member', -'node', -'pipeline', -'port', -'rawdev', -'rcu', -'reorder', -'rib', -'ring', -'security', -'stack', -'telemetry', -'timer', -] +test_deps = enabled_libs # Each test is marked with flag true/false # to indicate whether it can run in no-huge mode. @@ -380,7 +348,6 @@ if dpdk_conf.has('RTE_EVENT_SKELETON') test_deps += 'event_skeleton' endif if dpdk_conf.has('RTE_LIB_METRICS') -test_deps += 'metrics' test_sources += ['test_metrics.c'] fast_tests += [['metrics_autotest', true]] endif @@ -410,17 +377,14 @@ if dpdk_conf.has('RTE_NET_RING') perf_test_names += 'ring_pmd_perf_autotest' fast_tests += [['event_eth_tx_adapter_autotest', false]] if dpdk_conf.has('RTE_LIB_BITRATESTATS') -test_deps += 'bitratestats' test_sources += 'test_bitratestats.c' fast_tests += [['bitratestats_autotest', true]] endif if dpdk_conf.has('RTE_LIB_LATENCYSTATS') -test_deps += 'latencystats' test_sources += 'test_latencystats.c' fast_tests += [['latencystats_autotest', true]] endif if dpdk_conf.has('RTE_LIB_PDUMP') -test_deps += 'pdump' test_sources += 'test_pdump.c' fast_tests += [['pdump_autotest', true]] endif @@ -434,18 +398,10 @@ endif if dpdk_conf.has('RTE_HAS_LIBPCAP') ext_deps += pcap_dep if dpdk_conf.has('RTE_LIB_PCAPNG') -test_deps += 'pcapng' test_sources += 'test_pcapng.c' endif endif -if dpdk_conf.has('RTE_LIB_POWER') -test_deps += 'power' -endif -if dpdk_conf.has('RTE_LIB_KNI') -test_deps += 'kni' -endif - if cc.has_argument('-Wno-format-truncation') cflags += '-Wno-format-truncation' endif @@ -462,7 +418,6 @@ if dpdk_conf.has('RTE_LIB_COMPRESSDEV') if compress_test_dep.found() test_dep_objs += compress_test_dep test_sources += 'test_compressdev.c' -test_deps += 'compressdev' fast_tests += [['compressdev_autotest', false]] endif endif -- 2.32.0
[PATCH 3/6] build: add node library to optional list
Allow the 'node' library to be disabled in builds Signed-off-by: Bruce Richardson --- lib/meson.build | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/meson.build b/lib/meson.build index af4662e942..dd20fe70a6 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -74,6 +74,7 @@ optional_libs = [ 'jobstats', 'latencystats', 'metrics', +'node', 'pdump', 'power', 'vhost', -- 2.32.0
[PATCH 4/6] build: add flow classification library to optional list
Add the flow_classify library to the list of optional libraries, and ensure tests can build with it disabled. Signed-off-by: Bruce Richardson --- app/test/meson.build | 7 +-- lib/meson.build | 1 + 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/app/test/meson.build b/app/test/meson.build index 9919de4307..a92dd0c1f0 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -65,7 +65,6 @@ test_sources = files( 'test_fib6.c', 'test_fib6_perf.c', 'test_func_reentrancy.c', -'test_flow_classify.c', 'test_graph.c', 'test_graph_perf.c', 'test_hash.c', @@ -194,7 +193,6 @@ fast_tests = [ ['fib_autotest', true], ['fib6_autotest', true], ['func_reentrancy_autotest', false], -['flow_classify_autotest', false], ['hash_autotest', true], ['interrupt_autotest', true], ['ipfrag_autotest', false], @@ -347,6 +345,11 @@ endif if dpdk_conf.has('RTE_EVENT_SKELETON') test_deps += 'event_skeleton' endif + +if dpdk_conf.has('RTE_LIB_FLOW_CLASSIFY') +test_sources += 'test_flow_classify.c' +fast_tests += [['flow_classify_autotest', false]] +endif if dpdk_conf.has('RTE_LIB_METRICS') test_sources += ['test_metrics.c'] fast_tests += [['metrics_autotest', true]] diff --git a/lib/meson.build b/lib/meson.build index dd20fe70a6..ede5199374 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -67,6 +67,7 @@ libraries = [ optional_libs = [ 'bitratestats', +'flow_classify', 'gpudev', 'gro', 'gso', -- 2.32.0
[PATCH 5/6] build: add "packet framework" libs to optional list
Add port, table and pipeline libraries - collectively often known as the "packet framework" - to the list of optional libraries, and ensure tests can build with them disabled. Signed-off-by: Bruce Richardson --- app/test/meson.build | 20 +--- lib/meson.build | 3 +++ 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/app/test/meson.build b/app/test/meson.build index a92dd0c1f0..ba62e5e98c 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -135,12 +135,6 @@ test_sources = files( 'test_stack.c', 'test_stack_perf.c', 'test_string_fns.c', -'test_table.c', -'test_table_acl.c', -'test_table_combined.c', -'test_table_pipeline.c', -'test_table_ports.c', -'test_table_tables.c', 'test_tailq.c', 'test_thash.c', 'test_thash_perf.c', @@ -227,7 +221,6 @@ fast_tests = [ ['stack_autotest', false], ['stack_lf_autotest', false], ['string_autotest', true], -['table_autotest', true], ['tailq_autotest', true], ['ticketlock_autotest', true], ['timer_autotest', false], @@ -358,6 +351,19 @@ if dpdk_conf.has('RTE_LIB_TELEMETRY') test_sources += ['test_telemetry_json.c', 'test_telemetry_data.c'] fast_tests += [['telemetry_json_autotest', true], ['telemetry_data_autotest', true]] endif +if dpdk_conf.has('RTE_LIB_PIPELINE') +# pipeline lib depends on port and table libs, so those must be present +# if pipeline library is. +test_sources += [ +'test_table.c', +'test_table_acl.c', +'test_table_combined.c', +'test_table_pipeline.c', +'test_table_ports.c', +'test_table_tables.c', +] +fast_tests += [['table_autotest', true]] +endif # The following linkages of drivers are required because # they are used via a driver-specific API. diff --git a/lib/meson.build b/lib/meson.build index ede5199374..dcc1b4d835 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -77,7 +77,10 @@ optional_libs = [ 'metrics', 'node', 'pdump', +'pipeline', +'port', 'power', +'table', 'vhost', ] -- 2.32.0
[PATCH 6/6] build: add cfgfile library to optional list
Allow disabling of the cfgfile library in builds. Signed-off-by: Bruce Richardson --- lib/meson.build | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/meson.build b/lib/meson.build index dcc1b4d835..8e5acd7819 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -67,6 +67,7 @@ libraries = [ optional_libs = [ 'bitratestats', +'cfgfile', 'flow_classify', 'gpudev', 'gro', -- 2.32.0
Re: [PATCH 0/6] allow more DPDK libraries to be disabled on build
On Thu, 13 Jan 2022 17:39:12 + Bruce Richardson wrote: > A common request on-list has been to allow more of the DPDK build to be > disabled by those who are > doing their own builds and only use a subset of the libraries. To this end, > this patchset makes some > infrastructure changes [first two patches] to make it easier to have > libraries disabled, and then > adds a six libraries to the "optional" list. > > Bruce Richardson (6): > lib: allow recursive disabling of libs in build > app/test: link unit test binary against all available libs > build: add node library to optional list > build: add flow classification library to optional list > build: add "packet framework" libs to optional list > build: add cfgfile library to optional list > > app/test/meson.build | 74 > lib/meson.build | 30 -- > 2 files changed, 40 insertions(+), 64 deletions(-) > > -- > 2.32.0 > Acked-by: Stephen Hemminger
RE: [PATCH 3/8] ethdev: add mbuf dynfield for incomplete IP reassembly
> > > > > > > > > > +/** > > > > > + * In case of IP reassembly offload failure, ol_flags in mbuf will > > > > > be set > > > > > + * with RTE_MBUF_F_RX_IPREASSEMBLY_INCOMPLETE and packets will > > be > > > > returned > > > > > + * without alteration. The application can retrieve the attached > > > > > fragments > > > > > + * using mbuf dynamic field. > > > > > + */ > > > > > +typedef struct { > > > > > + /** > > > > > + * Next fragment packet. Application should fetch dynamic field > > > > > of > > > > > + * each fragment until a NULL is received and nb_frags is 0. > > > > > + */ > > > > > + struct rte_mbuf *next_frag; > > > > > + /** Time spent(in ms) by HW in waiting for further fragments. */ > > > > > + uint16_t time_spent; > > > > > + /** Number of more fragments attached in mbuf dynamic fields. */ > > > > > + uint16_t nb_frags; > > > > > +} rte_eth_ip_reass_dynfield_t; > > > > > > > > > > > > Looks like a bit of overkill to me: > > > > We do already have 'next' and 'nb_frags' fields inside mbuf, > > > > why can't they be used here? Why a separate ones are necessary? > > > > > > > The next and nb_frags in mbuf is for segmented buffers and not IP > > > fragments. > > > But here we will have separate mbufs in each dynfield denoting each of the > > > fragments which may have further segmented buffers. > > > > Makes sense, thanks for explanation. > > Though in that case just 'struct rte_mbuf *next_frag' might be enough > > (user will walk though the list till mbuf->next_frag != NULL)? > > The reason I am asking: current sizeof(rte_eth_ip_reass_dynfield_t) is 16B, > > which is quite a lot for mbuf, especially considering that it has to be > > continuous > > 16B. > > Making it smaller (8B) or even splitting into 2 fileds (8+4) will give it > > more > > chances > > to coexist with other dynfields. > > Even if we drop nb_frags, we will be left with uint16_t time_spent. > Are you suggesting to use separate dynfield altogether for 2 bytes of > time_spent? Yes, that's was my thought - split it into two separate fields, if possible.
Re: ETH_RSS_IP only does not seem to balance traffic
That makes sense. Thank you. It would be great to have further comments from the maintener. For the RTE framework integrity, would it be better for us to have a consistent meaning for the ETH_RSS_XXX flags? Do the other drivers act differently? Best regards, Yasu From: Bruce Richardson Subject: Re: ETH_RSS_IP only does not seem to balance traffic Date: Thu, 13 Jan 2022 15:05:53 + Message-ID: > On Thu, Jan 13, 2022 at 12:52:04AM +0900, Yasuhiro Ohara wrote: >> >> Hi, >> >> My system developper friend recently ran into a problem where >> l3fwd does not appear to receive balanced traffic on Intel XL710, >> but it is resolved when the attached patch is applied. >> >> -.rss_hf = ETH_RSS_IP, >> +.rss_hf = ETH_RSS_IP | ETH_RSS_TCP | ETH_RSS_UDP, >> >> IIRC I ran into a similar problem 3 or 4 years back, >> but didn't report then because I believed I was doing something silly. >> But since my friend is an experienced engineer, I feel like >> it may be better (for the community) to ask this in the list. >> >> We are using dpdk-stable-18.11.6 and igb_uio. >> >> What are we doing wrong? >> >> If it is not a FAQ, I can test it again with more recent stable, >> and will report the details. >> > For XL710 NICs, I believe that ETH_RSS_IP load balances only IP frames that > do not have TCP or UDP headers also. Adding i40e driver maintainer on CC > to comment further. > > /Bruce >
RE: ETH_RSS_IP only does not seem to balance traffic
> -Original Message- > From: Richardson, Bruce > Sent: Thursday, January 13, 2022 11:06 PM > To: Yasuhiro Ohara > Cc: dev@dpdk.org; Xing, Beilei > Subject: Re: ETH_RSS_IP only does not seem to balance traffic > > On Thu, Jan 13, 2022 at 12:52:04AM +0900, Yasuhiro Ohara wrote: > > > > Hi, > > > > My system developper friend recently ran into a problem where l3fwd > > does not appear to receive balanced traffic on Intel XL710, but it is > > resolved when the attached patch is applied. > > > > -.rss_hf = ETH_RSS_IP, > > +.rss_hf = ETH_RSS_IP | ETH_RSS_TCP | ETH_RSS_UDP, > > > > IIRC I ran into a similar problem 3 or 4 years back, but didn't report > > then because I believed I was doing something silly. > > But since my friend is an experienced engineer, I feel like it may be > > better (for the community) to ask this in the list. > > > > We are using dpdk-stable-18.11.6 and igb_uio. > > > > What are we doing wrong? > > > > If it is not a FAQ, I can test it again with more recent stable, and > > will report the details. > > > For XL710 NICs, I believe that ETH_RSS_IP load balances only IP frames that do > not have TCP or UDP headers also. Adding i40e driver maintainer on CC to > comment further. Yes, Bruce is right. For XL710, ETH_RSS_IP doesn't cover TCP and UDP packets. > > /Bruce
Re: ETH_RSS_IP only does not seem to balance traffic
Thank you for the confirmation! Best regards, Yasu From: "Xing, Beilei" Subject: RE: ETH_RSS_IP only does not seem to balance traffic Date: Fri, 14 Jan 2022 01:44:45 + Message-ID: > > >> -Original Message- >> From: Richardson, Bruce >> Sent: Thursday, January 13, 2022 11:06 PM >> To: Yasuhiro Ohara >> Cc: dev@dpdk.org; Xing, Beilei >> Subject: Re: ETH_RSS_IP only does not seem to balance traffic >> >> On Thu, Jan 13, 2022 at 12:52:04AM +0900, Yasuhiro Ohara wrote: >> > >> > Hi, >> > >> > My system developper friend recently ran into a problem where l3fwd >> > does not appear to receive balanced traffic on Intel XL710, but it is >> > resolved when the attached patch is applied. >> > >> > -.rss_hf = ETH_RSS_IP, >> > +.rss_hf = ETH_RSS_IP | ETH_RSS_TCP | ETH_RSS_UDP, >> > >> > IIRC I ran into a similar problem 3 or 4 years back, but didn't report >> > then because I believed I was doing something silly. >> > But since my friend is an experienced engineer, I feel like it may be >> > better (for the community) to ask this in the list. >> > >> > We are using dpdk-stable-18.11.6 and igb_uio. >> > >> > What are we doing wrong? >> > >> > If it is not a FAQ, I can test it again with more recent stable, and >> > will report the details. >> > >> For XL710 NICs, I believe that ETH_RSS_IP load balances only IP frames that >> do >> not have TCP or UDP headers also. Adding i40e driver maintainer on CC to >> comment further. > > Yes, Bruce is right. For XL710, ETH_RSS_IP doesn't cover TCP and UDP packets. > >> >> /Bruce >
dumpcap w/ pcapng produces out of order/negative times
While utilizing dumpcap with our app, we have observed the captured file producing out of order timestamps to include negative times. We are still investigating the root cause but believe it is in lib/pcapng. While doing some testing of this issue, this behavior was not observed with pcap. In the attached pcap, there are 5 streams (same curl multiple times a few seconds apart), with streams 1 and 3 showing oddities. Specifically, stream 1 is in the future relative to the packet order and stream 3 has a negative time. Not sure if the pcap file will actual post/attach, if it doesn't will try something. System: CentOS7 + devtoolset 9 DPDK: v21.11 1060.pcapng Description: Binary data
[PATCH 0/3] enic PMD patches
Here are a couple patches to the enic PMD that should apply on top of the patch: 'net/enic: support GENEVE flow item' by Hyong Youb Kim. John Daley (3): net/enic: add support for eCPRI matching net/enic: update VIC firmware API net/enic: support max descriptors allowed by adapter doc/guides/rel_notes/release_22_03.rst | 1 + drivers/net/enic/base/cq_enet_desc.h | 6 ++- drivers/net/enic/base/vnic_enet.h | 22 + drivers/net/enic/enic_fm_flow.c| 65 ++ drivers/net/enic/enic_res.c| 20 ++-- drivers/net/enic/enic_res.h| 6 ++- drivers/net/enic/enic_rxtx.c | 33 + 7 files changed, 136 insertions(+), 17 deletions(-) -- 2.33.1
[PATCH 1/3] net/enic: add support for eCPRI matching
eCPRI message can be over Ethernet layer (.1Q supported also) or over UDP layer. Message header formats are the same in these two variants. Only up though the first packet header in the PDU can be matched. RSS on the eCPRI header fields is not supported. Signed-off-by: John Daley Reviewed-by: Hyong Youb Kim --- doc/guides/rel_notes/release_22_03.rst | 1 + drivers/net/enic/enic_fm_flow.c| 65 ++ 2 files changed, 66 insertions(+) diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst index b38dc54e62..52d1e32cf6 100644 --- a/doc/guides/rel_notes/release_22_03.rst +++ b/doc/guides/rel_notes/release_22_03.rst @@ -58,6 +58,7 @@ New Features * **Updated Cisco enic driver.** * Added rte_flow support for matching GENEVE packets. + * Added rte_flow support for matching eCPRI packets. Removed Items - diff --git a/drivers/net/enic/enic_fm_flow.c b/drivers/net/enic/enic_fm_flow.c index 752ffeb5c5..589c9253e1 100644 --- a/drivers/net/enic/enic_fm_flow.c +++ b/drivers/net/enic/enic_fm_flow.c @@ -237,6 +237,7 @@ static enic_copy_item_fn enic_fm_copy_item_vxlan; static enic_copy_item_fn enic_fm_copy_item_gtp; static enic_copy_item_fn enic_fm_copy_item_geneve; static enic_copy_item_fn enic_fm_copy_item_geneve_opt; +static enic_copy_item_fn enic_fm_copy_item_ecpri; /* Ingress actions */ static const enum rte_flow_action_type enic_fm_supported_ig_actions[] = { @@ -392,6 +393,15 @@ static const struct enic_fm_items enic_fm_items[] = { RTE_FLOW_ITEM_TYPE_END, }, }, + [RTE_FLOW_ITEM_TYPE_ECPRI] = { + .copy_item = enic_fm_copy_item_ecpri, + .valid_start_item = 1, + .prev_items = (const enum rte_flow_item_type[]) { + RTE_FLOW_ITEM_TYPE_ETH, + RTE_FLOW_ITEM_TYPE_UDP, + RTE_FLOW_ITEM_TYPE_END, + }, + }, }; static int @@ -877,6 +887,61 @@ enic_fm_copy_item_geneve_opt(struct copy_item_args *arg) return 0; } +/* Match eCPRI combined message header */ +static int +enic_fm_copy_item_ecpri(struct copy_item_args *arg) +{ + const struct rte_flow_item *item = arg->item; + const struct rte_flow_item_ecpri *spec = item->spec; + const struct rte_flow_item_ecpri *mask = item->mask; + struct fm_tcam_match_entry *entry = arg->fm_tcam_entry; + struct fm_header_set *fm_data, *fm_mask; + uint8_t *fm_data_to, *fm_mask_to; + + ENICPMD_FUNC_TRACE(); + + /* Tunneling not supported- only matching on inner eCPRI fields. */ + if (arg->header_level > 0) + return -EINVAL; + + /* Need both spec and mask */ + if (!spec || !mask) + return -EINVAL; + + fm_data = &entry->ftm_data.fk_hdrset[0]; + fm_mask = &entry->ftm_mask.fk_hdrset[0]; + + /* eCPRI can only follow L2/VLAN layer if ethernet type is 0xAEFE. */ + if (!(fm_data->fk_metadata & FKM_UDP) && + (fm_mask->l2.eth.fk_ethtype != UINT16_MAX || + rte_cpu_to_be_16(fm_data->l2.eth.fk_ethtype) != + RTE_ETHER_TYPE_ECPRI)) + return -EINVAL; + + if (fm_data->fk_metadata & FKM_UDP) { + /* eCPRI on UDP */ + fm_data->fk_header_select |= FKH_L4RAW; + fm_mask->fk_header_select |= FKH_L4RAW; + fm_data_to = &fm_data->l4.rawdata[sizeof(fm_data->l4.udp)]; + fm_mask_to = &fm_mask->l4.rawdata[sizeof(fm_data->l4.udp)]; + } else { + /* eCPRI directly after Etherent header */ + fm_data->fk_header_select |= FKH_L3RAW; + fm_mask->fk_header_select |= FKH_L3RAW; + fm_data_to = &fm_data->l3.rawdata[0]; + fm_mask_to = &fm_mask->l3.rawdata[0]; + } + + /* +* Use the raw L3 or L4 buffer to match eCPRI since fm_header_set does +* not have eCPRI header. Only 1st message header of PDU can be matched. +* "C" * bit ignored. +*/ + memcpy(fm_data_to, spec, sizeof(*spec)); + memcpy(fm_mask_to, mask, sizeof(*mask)); + return 0; +} + /* * Currently, raw pattern match is very limited. It is intended for matching * UDP tunnel header (e.g. vxlan or geneve). -- 2.33.1
[PATCH 2/3] net/enic: update VIC firmware API
Update the configuration structure used between the adapter and driver. The structure is compatible with all Cisco VIC adapters. Signed-off-by: John Daley Reviewed-by: Hyong Youb Kim --- drivers/net/enic/base/vnic_enet.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/net/enic/base/vnic_enet.h b/drivers/net/enic/base/vnic_enet.h index 2a97a33044..66261d9127 100644 --- a/drivers/net/enic/base/vnic_enet.h +++ b/drivers/net/enic/base/vnic_enet.h @@ -31,6 +31,28 @@ struct vnic_enet_config { uint32_t rdma_mr_id; uint32_t rdma_mr_count; uint32_t max_pkt_size; + uint16_t vf_subvnic_count; + uint16_t mq_subvnic_count; + uint32_t mq_flags; + + /* the following 3 fields are per-MQ-vnic counts */ + uint32_t mq_rdma_mr_count; + uint16_t mq_rdma_qp_count; + uint16_t mq_rdma_resgrp; + + uint16_t rdma_max_sq_ring_sz; + uint16_t rdma_max_rq_ring_sz; + uint32_t rdma_max_cq_ring_sz; + uint16_t rdma_max_wr_sge; + uint16_t rdma_max_mr_sge; + uint8_t rdma_max_rd_per_qp; + uint8_t unused; /* available */ + uint16_t mq_rdma_engine_count; + uint32_t intr_coal_tick_ns; /* coalescing timer tick in nsec */ + uint32_t max_rq_ring; /* MAX RQ ring size */ + uint32_t max_wq_ring; /* MAX WQ ring size */ + uint32_t max_cq_ring; /* MAX CQ ring size */ + uint32_t rdma_rsvd_lkey;/* Reserved (privileged) LKey */ }; #define VENETF_TSO 0x1 /* TSO enabled */ -- 2.33.1
[PATCH 3/3] net/enic: support max descriptors allowed by adapter
Newer VIC adapters have the max number of supported RX and TX descriptors in their configuration. Use these values as the maximums. Signed-off-by: John Daley Reviewed-by: Hyong Youb Kim --- drivers/net/enic/base/cq_enet_desc.h | 6 - drivers/net/enic/enic_res.c | 20 + drivers/net/enic/enic_res.h | 6 +++-- drivers/net/enic/enic_rxtx.c | 33 +++- 4 files changed, 48 insertions(+), 17 deletions(-) diff --git a/drivers/net/enic/base/cq_enet_desc.h b/drivers/net/enic/base/cq_enet_desc.h index a34a4f5400..02db85b9a0 100644 --- a/drivers/net/enic/base/cq_enet_desc.h +++ b/drivers/net/enic/base/cq_enet_desc.h @@ -67,7 +67,8 @@ struct cq_enet_rq_desc_64 { uint16_t vlan; uint16_t checksum_fcoe; uint8_t flags; - uint8_t unused[48]; + uint8_t fetch_idx_flags; + uint8_t unused[47]; uint8_t type_color; }; @@ -92,6 +93,9 @@ struct cq_enet_rq_desc_64 { #define CQ_ENET_RQ_DESC_BYTES_WRITTEN_BITS 14 #define CQ_ENET_RQ_DESC_BYTES_WRITTEN_MASK \ ((1 << CQ_ENET_RQ_DESC_BYTES_WRITTEN_BITS) - 1) +#define CQ_ENET_RQ_DESC_FETCH_IDX_BITS 2 +#define CQ_ENET_RQ_DESC_FETCH_IDX_MASK \ + ((1 << CQ_ENET_RQ_DESC_FETCH_IDX_BITS) - 1) #define CQ_ENET_RQ_DESC_FLAGS_TRUNCATED (0x1 << 14) #define CQ_ENET_RQ_DESC_FLAGS_VLAN_STRIPPED (0x1 << 15) diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c index 9cfb857939..caf773bab2 100644 --- a/drivers/net/enic/enic_res.c +++ b/drivers/net/enic/enic_res.c @@ -26,6 +26,7 @@ int enic_get_vnic_config(struct enic *enic) struct vnic_enet_config *c = &enic->config; int err; uint64_t sizes; + uint32_t max_rq_descs, max_wq_descs; err = vnic_dev_get_mac_addr(enic->vdev, enic->mac_addr); if (err) { @@ -57,6 +58,8 @@ int enic_get_vnic_config(struct enic *enic) GET_CONFIG(loop_tag); GET_CONFIG(num_arfs); GET_CONFIG(max_pkt_size); + GET_CONFIG(max_rq_ring); + GET_CONFIG(max_wq_ring); /* max packet size is only defined in newer VIC firmware * and will be 0 for legacy firmware and VICs @@ -101,20 +104,29 @@ int enic_get_vnic_config(struct enic *enic) ((enic->filter_actions & FILTER_ACTION_COUNTER_FLAG) ? "count " : "")); - c->wq_desc_count = RTE_MIN((uint32_t)ENIC_MAX_WQ_DESCS, + /* The max size of RQ and WQ rings are specified in 1500 series VICs and +* beyond. If they are not specified by the VIC or if 64B CQ descriptors +* are not being used, the max number of descriptors is 4096. +*/ + max_wq_descs = (enic->cq64_request && c->max_wq_ring) ? c->max_wq_ring : + ENIC_LEGACY_MAX_WQ_DESCS; + c->wq_desc_count = RTE_MIN(max_wq_descs, RTE_MAX((uint32_t)ENIC_MIN_WQ_DESCS, c->wq_desc_count)); c->wq_desc_count &= 0xffe0; /* must be aligned to groups of 32 */ - - c->rq_desc_count = RTE_MIN((uint32_t)ENIC_MAX_RQ_DESCS, + max_rq_descs = (enic->cq64_request && c->max_rq_ring) ? c->max_rq_ring + : ENIC_LEGACY_MAX_WQ_DESCS; + c->rq_desc_count = RTE_MIN(max_rq_descs, RTE_MAX((uint32_t)ENIC_MIN_RQ_DESCS, c->rq_desc_count)); c->rq_desc_count &= 0xffe0; /* must be aligned to groups of 32 */ + dev_debug(NULL, "Max supported VIC descriptors: WQ:%u, RQ:%u\n", + max_wq_descs, max_rq_descs); c->intr_timer_usec = RTE_MIN(c->intr_timer_usec, vnic_dev_get_intr_coal_timer_max(enic->vdev)); dev_info(enic_get_dev(enic), "vNIC MAC addr " RTE_ETHER_ADDR_PRT_FMT - "wq/rq %d/%d mtu %d, max mtu:%d\n", + " wq/rq %d/%d mtu %d, max mtu:%d\n", enic->mac_addr[0], enic->mac_addr[1], enic->mac_addr[2], enic->mac_addr[3], enic->mac_addr[4], enic->mac_addr[5], c->wq_desc_count, c->rq_desc_count, diff --git a/drivers/net/enic/enic_res.h b/drivers/net/enic/enic_res.h index 34f15d5a42..ae979d52be 100644 --- a/drivers/net/enic/enic_res.h +++ b/drivers/net/enic/enic_res.h @@ -12,9 +12,11 @@ #include "vnic_rq.h" #define ENIC_MIN_WQ_DESCS 64 -#define ENIC_MAX_WQ_DESCS 4096 #define ENIC_MIN_RQ_DESCS 64 -#define ENIC_MAX_RQ_DESCS 4096 + +/* 1400 series VICs and prior all have 4K max, after that it's in the config */ +#define ENIC_LEGACY_MAX_WQ_DESCS4096 +#define ENIC_LEGACY_MAX_RQ_DESCS4096 /* A descriptor ring has a multiple of 32 descriptors */ #define ENIC_ALIGN_DESCS 32 diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c index c44715bfd0..4681ef6eca 100644 --- a/drivers/net/enic/enic_rxtx.c +++ b/drivers/net/enic/enic_rxtx.c @@ -84,6 +84,7 @@ enic_recv_pkts_common(vo
RE: [PATCH v1 1/1] vhost: integrate dmadev in asynchronous datapath
Hi Jiayu, This is first round of review, I'll spend time on OVS patches later and look back. > -Original Message- > From: Hu, Jiayu > Sent: Friday, December 31, 2021 5:55 AM > To: dev@dpdk.org > Cc: maxime.coque...@redhat.com; i.maxim...@ovn.org; Xia, Chenbo > ; Richardson, Bruce ; Van > Haaren, Harry ; Pai G, Sunil > ; Mcnamara, John ; Ding, Xuan > ; Jiang, Cheng1 ; > lian...@liangbit.com; Hu, Jiayu > Subject: [PATCH v1 1/1] vhost: integrate dmadev in asynchronous datapath > > Since dmadev is introduced in 21.11, to avoid the overhead of vhost DMA > abstraction layer and simplify application logics, this patch integrates > dmadev in asynchronous data path. > > Signed-off-by: Jiayu Hu > Signed-off-by: Sunil Pai G > --- > doc/guides/prog_guide/vhost_lib.rst | 70 - > examples/vhost/Makefile | 2 +- > examples/vhost/ioat.c | 218 -- > examples/vhost/ioat.h | 63 > examples/vhost/main.c | 230 +++- > examples/vhost/main.h | 11 ++ > examples/vhost/meson.build | 6 +- > lib/vhost/meson.build | 3 +- > lib/vhost/rte_vhost_async.h | 121 +-- > lib/vhost/version.map | 3 + > lib/vhost/vhost.c | 130 +++- > lib/vhost/vhost.h | 53 ++- > lib/vhost/virtio_net.c | 206 +++-- > 13 files changed, 587 insertions(+), 529 deletions(-) > delete mode 100644 examples/vhost/ioat.c > delete mode 100644 examples/vhost/ioat.h > > diff --git a/doc/guides/prog_guide/vhost_lib.rst > b/doc/guides/prog_guide/vhost_lib.rst > index 76f5d303c9..bdce7cbf02 100644 > --- a/doc/guides/prog_guide/vhost_lib.rst > +++ b/doc/guides/prog_guide/vhost_lib.rst > @@ -218,38 +218,12 @@ The following is an overview of some key Vhost API > functions: > >Enable or disable zero copy feature of the vhost crypto backend. > > -* ``rte_vhost_async_channel_register(vid, queue_id, config, ops)`` > +* ``rte_vhost_async_channel_register(vid, queue_id)`` > >Register an async copy device channel for a vhost queue after vring Since dmadev is here, let's just use 'DMA device' instead of 'copy device' > - is enabled. Following device ``config`` must be specified together > - with the registration: > + is enabled. > > - * ``features`` > - > -This field is used to specify async copy device features. > - > -``RTE_VHOST_ASYNC_INORDER`` represents the async copy device can > -guarantee the order of copy completion is the same as the order > -of copy submission. > - > -Currently, only ``RTE_VHOST_ASYNC_INORDER`` capable device is > -supported by vhost. > - > - Applications must provide following ``ops`` callbacks for vhost lib to > - work with the async copy devices: > - > - * ``transfer_data(vid, queue_id, descs, opaque_data, count)`` > - > -vhost invokes this function to submit copy data to the async devices. > -For non-async_inorder capable devices, ``opaque_data`` could be used > -for identifying the completed packets. > - > - * ``check_completed_copies(vid, queue_id, opaque_data, max_packets)`` > - > -vhost invokes this function to get the copy data completed by async > -devices. > - > -* ``rte_vhost_async_channel_register_thread_unsafe(vid, queue_id, config, > ops)`` > +* ``rte_vhost_async_channel_register_thread_unsafe(vid, queue_id)`` > >Register an async copy device channel for a vhost queue without >performing any locking. > @@ -277,18 +251,13 @@ The following is an overview of some key Vhost API > functions: >This function is only safe to call in vhost callback functions >(i.e., struct rte_vhost_device_ops). > > -* ``rte_vhost_submit_enqueue_burst(vid, queue_id, pkts, count, comp_pkts, > comp_count)`` > +* ``rte_vhost_submit_enqueue_burst(vid, queue_id, pkts, count, dma_id, > dma_vchan)`` > >Submit an enqueue request to transmit ``count`` packets from host to guest > - by async data path. Successfully enqueued packets can be transfer completed > - or being occupied by DMA engines; transfer completed packets are returned > in > - ``comp_pkts``, but others are not guaranteed to finish, when this API > - call returns. > + by async data path. Applications must not free the packets submitted for > + enqueue until the packets are completed. > > - Applications must not free the packets submitted for enqueue until the > - packets are completed. > - > -* ``rte_vhost_poll_enqueue_completed(vid, queue_id, pkts, count)`` > +* ``rte_vhost_poll_enqueue_completed(vid, queue_id, pkts, count, dma_id, > dma_vchan)`` > >Poll enqueue completion status from async data path. Completed packets >are returned to applications through ``pkts``. > @@ -298,7 +267,7 @@ The following is an overview of some key Vhost API > functions: >This function returns the amount of in-flight pa
RE: [PATCH v3] ethdev: mark old macros as deprecated
> -Original Message- > From: Yigit, Ferruh > Sent: Wednesday, January 12, 2022 10:36 PM > To: Thomas Monjalon ; Andrew Rybchenko > ; Hemant Agrawal ; > Tyler Retzlaff ; Xia, Chenbo > ; Jerin Jacob > Cc: dev@dpdk.org; Yigit, Ferruh ; Stephen Hemminger > > Subject: [PATCH v3] ethdev: mark old macros as deprecated > > Old macros kept for backward compatibility, but this cause old macro > usage to sneak in silently. > > Marking old macros as deprecated. Downside is this will cause some noise > for applications that are using old macros. > > Fixes: 295968d17407 ("ethdev: add namespace") > > Signed-off-by: Ferruh Yigit > Acked-by: Stephen Hemminger > --- > v2: > * Release notes updated > > v3: > * Update 22.03 release note > --- > doc/guides/rel_notes/release_22_03.rst | 3 + > lib/ethdev/rte_ethdev.h| 474 + > 2 files changed, 247 insertions(+), 230 deletions(-) Acked-by: Chenbo Xia
RE: [PATCH] vdpa/sfc: make MCDI memzone name unique
Hi Abhimanyu, > -Original Message- > From: abhimanyu.sa...@xilinx.com > Sent: Tuesday, January 11, 2022 1:33 PM > To: dev@dpdk.org > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; > andrew.rybche...@oktetlabs.ru; Abhimanyu Saini > Subject: [PATCH] vdpa/sfc: make MCDI memzone name unique > > From: Abhimanyu Saini > > Buffer for MCDI channel is allocated using rte_memzone_reserve_aligned > with zone name 'mcdi'. Since multiple MCDI channels are needed to > support multiple VF(s) and rte_memzone_reserve_aligned expects unique > zone names, append PCI address to zone name to make it unique. > > Signed-off-by: Abhimanyu Saini > --- Could you help with the apply issue and checkpatch issue, then send new version? Thanks, Chenbo
RE: [dpdk-dev] [PATCH] net/virtio: fix uninitialized old_rss_key variable
Hi Yunjuan, > -Original Message- > From: Yunjian Wang > Sent: Saturday, January 8, 2022 4:14 PM > To: dev@dpdk.org > Cc: maxime.coque...@redhat.com; Xia, Chenbo ; > dingxiaoxi...@huawei.com; xudin...@huawei.com; Yunjian Wang > ; sta...@dpdk.org > Subject: [dpdk-dev] [PATCH] net/virtio: fix uninitialized old_rss_key variable > > This patch fixes an issue that uninitialized old_rss_key > is used for restoring the rss_key. > > Coverity issue: 373866 > Fixes: 0c9d66207054 ("net/virtio: support RSS") > Cc: sta...@dpdk.org > > Signed-off-by: Yunjian Wang > --- > drivers/net/virtio/virtio_ethdev.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio/virtio_ethdev.c > b/drivers/net/virtio/virtio_ethdev.c > index c2588369b2..7c445bfc48 100644 > --- a/drivers/net/virtio/virtio_ethdev.c > +++ b/drivers/net/virtio/virtio_ethdev.c > @@ -2028,7 +2028,8 @@ virtio_dev_rss_hash_update(struct rte_eth_dev *dev, > > return 0; > restore_key: > - memcpy(hw->rss_key, old_rss_key, VIRTIO_NET_RSS_KEY_SIZE); > + if (rss_conf->rss_key && rss_conf->rss_key_len) > + memcpy(hw->rss_key, old_rss_key, VIRTIO_NET_RSS_KEY_SIZE); > restore_types: > hw->rss_hash_types = old_hash_types; > > -- > 2.27.0 Appreciate your effort with Coverity issue, thanks! Reviewed-by: Chenbo Xia
RE: [PATCH v3] vdpa/ifc: fix log info mismatch
> -Original Message- > From: Pei, Andy > Sent: Monday, December 13, 2021 3:01 PM > To: dev@dpdk.org > Cc: Xia, Chenbo ; sta...@dpdk.org > Subject: [PATCH v3] vdpa/ifc: fix log info mismatch > > Fix log info mismatch. > > Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver") > Cc: sta...@dpdk.org > > Signed-off-by: Andy Pei > --- > drivers/vdpa/ifc/base/ifcvf.c | 14 -- > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/vdpa/ifc/base/ifcvf.c b/drivers/vdpa/ifc/base/ifcvf.c > index 721cb1d..d10c1fd 100644 > --- a/drivers/vdpa/ifc/base/ifcvf.c > +++ b/drivers/vdpa/ifc/base/ifcvf.c > @@ -94,12 +94,14 @@ > return -1; > } > > - DEBUGOUT("capability mapping:\ncommon cfg: %p\n" > - "notify base: %p\nisr cfg: %p\ndevice cfg: %p\n" > - "multiplier: %u\n", > - hw->common_cfg, hw->dev_cfg, > - hw->isr, hw->notify_base, > - hw->notify_off_multiplier); > + DEBUGOUT("capability mapping:\n" > + "common cfg: %p\n" > + "notify base: %p\n" > + "isr cfg: %p\n" > + "device cfg: %p\n" > + "multiplier: %u\n", > + hw->common_cfg, hw->notify_base, hw->isr, hw->dev_cfg, > + hw->notify_off_multiplier); > > return 0; > } > -- > 1.8.3.1 Reviewed-by: Chenbo Xia
[PATCH] vdpa/ifc: fix log info mismatch
Fix log info mismatch. Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver") Cc: sta...@dpdk.org Signed-off-by: Andy Pei --- drivers/vdpa/ifc/base/ifcvf.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/vdpa/ifc/base/ifcvf.c b/drivers/vdpa/ifc/base/ifcvf.c index 721cb1d..d10c1fd 100644 --- a/drivers/vdpa/ifc/base/ifcvf.c +++ b/drivers/vdpa/ifc/base/ifcvf.c @@ -94,12 +94,14 @@ return -1; } - DEBUGOUT("capability mapping:\ncommon cfg: %p\n" - "notify base: %p\nisr cfg: %p\ndevice cfg: %p\n" - "multiplier: %u\n", - hw->common_cfg, hw->dev_cfg, - hw->isr, hw->notify_base, - hw->notify_off_multiplier); + DEBUGOUT("capability mapping:\n" +"common cfg: %p\n" +"notify base: %p\n" +"isr cfg: %p\n" +"device cfg: %p\n" +"multiplier: %u\n", +hw->common_cfg, hw->notify_base, hw->isr, hw->dev_cfg, +hw->notify_off_multiplier); return 0; } -- 1.8.3.1
[PATCH] vhost: add some log for vhost message VHOST_USER_SET_VRING_BASE
Usually the last avail index and last used index is 0, but for target device of live migration, the last avail index and last used index is not 0. So I think some log is helpful. Signed-off-by: Andy Pei --- lib/vhost/vhost_user.c | 5 + 1 file changed, 5 insertions(+) diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index a781346..3cb13fb 100644 --- a/lib/vhost/vhost_user.c +++ b/lib/vhost/vhost_user.c @@ -973,6 +973,11 @@ vq->last_avail_idx = msg->payload.state.num; } + VHOST_LOG_CONFIG(INFO, + "vring base idx:%d last_used_idx:%u last_avail_idx:%u.\n", + msg->payload.state.index, vq->last_used_idx, + vq->last_avail_idx); + return RTE_VHOST_MSG_RESULT_OK; } -- 1.8.3.1
[PATCH 1/2] vhost: fix physical address mapping
From: Xuan Ding When choosing IOVA as PA mode, IOVA is likely to be discontinuous, which requires page by page mapping for DMA devices. To be consistent, this patch implements page by page mapping instead of mapping at the region granularity for both IOVA as VA and PA mode. Fixes: 7c61fa08b716 ("vhost: enable IOMMU for async vhost") Signed-off-by: Xuan Ding Signed-off-by: Yuan Wang --- lib/vhost/vhost.h | 1 + lib/vhost/vhost_user.c | 116 - 2 files changed, 57 insertions(+), 60 deletions(-) diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index 7085e0885c..d246538ca5 100644 --- a/lib/vhost/vhost.h +++ b/lib/vhost/vhost.h @@ -355,6 +355,7 @@ struct vring_packed_desc_event { struct guest_page { uint64_t guest_phys_addr; uint64_t host_phys_addr; + uint64_t host_user_addr; uint64_t size; }; diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index a781346c4d..6d888766b0 100644 --- a/lib/vhost/vhost_user.c +++ b/lib/vhost/vhost_user.c @@ -143,57 +143,56 @@ get_blk_size(int fd) return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize; } -static int -async_dma_map(struct rte_vhost_mem_region *region, bool do_map) +static void +async_dma_map(struct virtio_net *dev, bool do_map) { - uint64_t host_iova; int ret = 0; - - host_iova = rte_mem_virt2iova((void *)(uintptr_t)region->host_user_addr); + uint32_t i; + struct guest_page *page; if (do_map) { - /* Add mapped region into the default container of DPDK. */ - ret = rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, -region->host_user_addr, -host_iova, -region->size); - if (ret) { - /* -* DMA device may bind with kernel driver, in this case, -* we don't need to program IOMMU manually. However, if no -* device is bound with vfio/uio in DPDK, and vfio kernel -* module is loaded, the API will still be called and return -* with ENODEV/ENOSUP. -* -* DPDK vfio only returns ENODEV/ENOSUP in very similar -* situations(vfio either unsupported, or supported -* but no devices found). Either way, no mappings could be -* performed. We treat it as normal case in async path. -*/ - if (rte_errno == ENODEV || rte_errno == ENOTSUP) - return 0; - - VHOST_LOG_CONFIG(ERR, "DMA engine map failed\n"); - /* DMA mapping errors won't stop VHST_USER_SET_MEM_TABLE. */ - return 0; + for (i = 0; i < dev->nr_guest_pages; i++) { + page = &dev->guest_pages[i]; + ret = rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, +page->host_user_addr, +page->host_phys_addr, +page->size); + if (ret) { + /* +* DMA device may bind with kernel driver, in this case, +* we don't need to program IOMMU manually. However, if no +* device is bound with vfio/uio in DPDK, and vfio kernel +* module is loaded, the API will still be called and return +* with ENODEV. +* +* DPDK vfio only returns ENODEV in very similar situations +* (vfio either unsupported, or supported but no devices found). +* Either way, no mappings could be performed. We treat it as +* normal case in async path. This is a workaround. +*/ + if (rte_errno == ENODEV) + return; + + /* DMA mapping errors won't stop VHOST_USER_SET_MEM_TABLE. */ + VHOST_LOG_CONFIG(ERR, "DMA engine map failed\n"); + } } } else { - /* Remove mapped region from the default container of DPDK. */ - ret = rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD, - region->host_user_addr, - host_iova, -
[PATCH 0/2] vhost: fix physical address mapping
From: Xuan Ding This patchset fixes the issue of incorrect DMA mapping in PA mode. Due to the ambiguity of host_phys_addr naming in the guest page struct, rename it to host_iova. Xuan Ding (2): vhost: fix physical address mapping vhost: rename field in guest page struct lib/vhost/vhost.h | 11 ++-- lib/vhost/vhost_user.c | 130 - lib/vhost/virtio_net.c | 11 ++-- 3 files changed, 75 insertions(+), 77 deletions(-) -- 2.17.1
[PATCH 2/2] vhost: rename field in guest page struct
From: Xuan Ding This patch renames the host_phys_addr to host_iova in guest_page struct. The host_phys_addr is iova, it depends on the DPDK IOVA mode. Signed-off-by: Xuan Ding --- lib/vhost/vhost.h | 10 +- lib/vhost/vhost_user.c | 24 lib/vhost/virtio_net.c | 11 ++- 3 files changed, 23 insertions(+), 22 deletions(-) diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index d246538ca5..9521ae56da 100644 --- a/lib/vhost/vhost.h +++ b/lib/vhost/vhost.h @@ -354,7 +354,7 @@ struct vring_packed_desc_event { struct guest_page { uint64_t guest_phys_addr; - uint64_t host_phys_addr; + uint64_t host_iova; uint64_t host_user_addr; uint64_t size; }; @@ -605,13 +605,13 @@ gpa_to_first_hpa(struct virtio_net *dev, uint64_t gpa, if (gpa + gpa_size <= page->guest_phys_addr + page->size) { return gpa - page->guest_phys_addr + - page->host_phys_addr; + page->host_iova; } else if (gpa < page->guest_phys_addr + page->size) { *hpa_size = page->guest_phys_addr + page->size - gpa; return gpa - page->guest_phys_addr + - page->host_phys_addr; + page->host_iova; } } } else { @@ -622,13 +622,13 @@ gpa_to_first_hpa(struct virtio_net *dev, uint64_t gpa, if (gpa + gpa_size <= page->guest_phys_addr + page->size) { return gpa - page->guest_phys_addr + - page->host_phys_addr; + page->host_iova; } else if (gpa < page->guest_phys_addr + page->size) { *hpa_size = page->guest_phys_addr + page->size - gpa; return gpa - page->guest_phys_addr + - page->host_phys_addr; + page->host_iova; } } } diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index 6d888766b0..e2e56308b9 100644 --- a/lib/vhost/vhost_user.c +++ b/lib/vhost/vhost_user.c @@ -154,7 +154,7 @@ async_dma_map(struct virtio_net *dev, bool do_map) page = &dev->guest_pages[i]; ret = rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, page->host_user_addr, -page->host_phys_addr, +page->host_iova, page->size); if (ret) { /* @@ -182,7 +182,7 @@ async_dma_map(struct virtio_net *dev, bool do_map) page = &dev->guest_pages[i]; ret = rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD, page->host_user_addr, - page->host_phys_addr, + page->host_iova, page->size); if (ret) { /* like DMA map, ignore the kernel driver case when unmap. */ @@ -977,7 +977,7 @@ vhost_user_set_vring_base(struct virtio_net **pdev, static int add_one_guest_page(struct virtio_net *dev, uint64_t guest_phys_addr, - uint64_t host_phys_addr, uint64_t host_user_addr, uint64_t size) + uint64_t host_iova, uint64_t host_user_addr, uint64_t size) { struct guest_page *page, *last_page; struct guest_page *old_pages; @@ -998,7 +998,7 @@ add_one_guest_page(struct virtio_net *dev, uint64_t guest_phys_addr, if (dev->nr_guest_pages > 0) { last_page = &dev->guest_pages[dev->nr_guest_pages - 1]; /* merge if the two pages are continuous */ - if (host_phys_addr == last_page->host_phys_addr + last_page->size + if (host_iova == last_page->host_iova + last_page->size && guest_phys_addr == last_page->guest_phys_addr + last_page->size && host_user_addr == last_page->host_user_addr +
RE: [PATCH] vhost: add some log for vhost message VHOST_USER_SET_VRING_BASE
> -Original Message- > From: Pei, Andy > Sent: Friday, January 14, 2022 3:19 PM > To: dev@dpdk.org > Cc: Xia, Chenbo > Subject: [PATCH] vhost: add some log for vhost message > VHOST_USER_SET_VRING_BASE I suggest the title be: vhost: add log for VHOST_USER_SET_VRING_BASE > > Usually the last avail index and last used index is 0, but for target > device of live migration, the last avail index and last used index is > not 0. So I think some log is helpful. Can simplify to: This patch adds log for vring related info in handling of vhost message VHOST_USER_SET_VRING_BASE, which will be useful in live migration case. > > Signed-off-by: Andy Pei > --- > lib/vhost/vhost_user.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c > index a781346..3cb13fb 100644 > --- a/lib/vhost/vhost_user.c > +++ b/lib/vhost/vhost_user.c > @@ -973,6 +973,11 @@ > vq->last_avail_idx = msg->payload.state.num; > } > > + VHOST_LOG_CONFIG(INFO, > + "vring base idx:%d last_used_idx:%u last_avail_idx:%u.\n", vring idx:%u Thanks, Chenbo > + msg->payload.state.index, vq->last_used_idx, > + vq->last_avail_idx); > + > return RTE_VHOST_MSG_RESULT_OK; > } > > -- > 1.8.3.1
RE: [PATCH] net/virtio: fix unreleased resource when creating virtio user dev is failed
> -Original Message- > From: Harold Huang > Sent: Thursday, December 23, 2021 12:43 PM > To: dev@dpdk.org > Cc: Maxime Coquelin ; Xia, Chenbo > > Subject: [PATCH] net/virtio: fix unreleased resource when creating virtio user > dev is failed > > When eth_virtio_dev_init is failed, the registered virtio user memory event > cb is not released and the backend created tap device is not destroyed. > It would cause some residual tap device existed in the host and creating > a new vdev could be failed because the new virtio_user_dev could use the > same address pointer and register memory event cb to the same address is > not allowed. > > Signed-off-by: Harold Huang > --- > Compared to patch v3, commit log is changed because this bug could > cause residual tap device in the host. > drivers/net/virtio/virtio_user_ethdev.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/virtio/virtio_user_ethdev.c > b/drivers/net/virtio/virtio_user_ethdev.c > index 0271098f0d..16eca2f940 100644 > --- a/drivers/net/virtio/virtio_user_ethdev.c > +++ b/drivers/net/virtio/virtio_user_ethdev.c > @@ -666,6 +666,7 @@ virtio_user_pmd_probe(struct rte_vdev_device *vdev) > /* previously called by pci probing for physical dev */ > if (eth_virtio_dev_init(eth_dev) < 0) { > PMD_INIT_LOG(ERR, "eth_virtio_dev_init fails"); > + virtio_user_dev_uninit(dev); > virtio_user_eth_dev_free(eth_dev); > goto end; > } > -- > 2.27.0 Reviewed-by: Chenbo Xia
RE: [PATCH] vhost: add some log for vhost message VHOST_USER_SET_VRING_BASE
Hi Chenbo, Thanks for you reply. I will send a V2 patch to address it. -Original Message- From: Xia, Chenbo Sent: Friday, January 14, 2022 3:40 PM To: Pei, Andy ; dev@dpdk.org Subject: RE: [PATCH] vhost: add some log for vhost message VHOST_USER_SET_VRING_BASE > -Original Message- > From: Pei, Andy > Sent: Friday, January 14, 2022 3:19 PM > To: dev@dpdk.org > Cc: Xia, Chenbo > Subject: [PATCH] vhost: add some log for vhost message > VHOST_USER_SET_VRING_BASE I suggest the title be: vhost: add log for VHOST_USER_SET_VRING_BASE > > Usually the last avail index and last used index is 0, but for target > device of live migration, the last avail index and last used index is > not 0. So I think some log is helpful. Can simplify to: This patch adds log for vring related info in handling of vhost message VHOST_USER_SET_VRING_BASE, which will be useful in live migration case. > > Signed-off-by: Andy Pei > --- > lib/vhost/vhost_user.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index > a781346..3cb13fb 100644 > --- a/lib/vhost/vhost_user.c > +++ b/lib/vhost/vhost_user.c > @@ -973,6 +973,11 @@ > vq->last_avail_idx = msg->payload.state.num; > } > > + VHOST_LOG_CONFIG(INFO, > + "vring base idx:%d last_used_idx:%u last_avail_idx:%u.\n", vring idx:%u Thanks, Chenbo > + msg->payload.state.index, vq->last_used_idx, > + vq->last_avail_idx); > + > return RTE_VHOST_MSG_RESULT_OK; > } > > -- > 1.8.3.1