[PATCH v1 2/2] app/testpmd: add queue based pfc CLI options
From: Sunil Kumar Kori Patch adds command line options to configure queue based priority flow control. - Syntax command is given as below: set pfc_queue_ctrl rx\ tx - Example command to configure queue based priority flow control on rx and tx side for port 0, Rx queue 0, Tx queue 0 with pause time 2047 testpmd> set pfc_queue_ctrl 0 rx on 0 0 tx on 0 0 2047 Signed-off-by: Sunil Kumar Kori Change-Id: Ic6d0e3b1ba5f0a98ca7a5b7cafb0044c6c568b81 --- app/test-pmd/cmdline.c | 122 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 22 2 files changed, 144 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 6e10afeedd..0240fcf9e7 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -544,6 +544,11 @@ static void cmd_help_long_parsed(void *parsed_result, "Set the priority flow control parameter on a" " port.\n\n" + "set pfc_queue_ctrl (port_id) rx (on|off) (tx_qid)" + " (tx_tc) tx (on|off) (rx_qid) (rx_tc) (pause_time)\n" + "Set the queue priority flow control parameter on a" + " given Rx and Tx queues of a port.\n\n" + "set stat_qmap (tx|rx) (port_id) (queue_id) (qmapping)\n" "Set statistics mapping (qmapping 0..15) for RX/TX" " queue on port.\n" @@ -7690,6 +7695,122 @@ cmdline_parse_inst_t cmd_priority_flow_control_set = { }, }; +struct cmd_queue_priority_flow_ctrl_set_result { + cmdline_fixed_string_t set; + cmdline_fixed_string_t pfc_queue_ctrl; + portid_t port_id; + cmdline_fixed_string_t rx; + cmdline_fixed_string_t rx_pfc_mode; + uint16_t tx_qid; + uint8_t tx_tc; + cmdline_fixed_string_t tx; + cmdline_fixed_string_t tx_pfc_mode; + uint16_t rx_qid; + uint8_t rx_tc; + uint16_t pause_time; +}; + +static void +cmd_queue_priority_flow_ctrl_set_parsed(void *parsed_result, + __rte_unused struct cmdline *cl, + __rte_unused void *data) +{ + struct cmd_queue_priority_flow_ctrl_set_result *res = parsed_result; + struct rte_eth_pfc_queue_conf pfc_queue_conf; + int rx_fc_enable, tx_fc_enable; + int ret; + + /* +* Rx on/off, flow control is enabled/disabled on RX side. This can +* indicate the RTE_ETH_FC_TX_PAUSE, Transmit pause frame at the Rx +* side. Tx on/off, flow control is enabled/disabled on TX side. This +* can indicate the RTE_ETH_FC_RX_PAUSE, Respond to the pause frame at +* the Tx side. +*/ + static enum rte_eth_fc_mode rx_tx_onoff_2_mode[2][2] = { + {RTE_ETH_FC_NONE, RTE_ETH_FC_TX_PAUSE}, + {RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_FULL} + }; + + memset(&pfc_queue_conf, 0, sizeof(struct rte_eth_pfc_queue_conf)); + rx_fc_enable = (!strncmp(res->rx_pfc_mode, "on", 2)) ? 1 : 0; + tx_fc_enable = (!strncmp(res->tx_pfc_mode, "on", 2)) ? 1 : 0; + pfc_queue_conf.mode = rx_tx_onoff_2_mode[rx_fc_enable][tx_fc_enable]; + pfc_queue_conf.rx_pause.tc = res->tx_tc; + pfc_queue_conf.rx_pause.tx_qid = res->tx_qid; + pfc_queue_conf.tx_pause.tc = res->rx_tc; + pfc_queue_conf.tx_pause.rx_qid = res->rx_qid; + pfc_queue_conf.tx_pause.pause_time = res->pause_time; + + ret = rte_eth_dev_priority_flow_ctrl_queue_set(res->port_id, + &pfc_queue_conf); + if (ret != 0) { + fprintf(stderr, + "bad queue priority flow control parameter, rc = %d\n", + ret); + } +} + +cmdline_parse_token_string_t cmd_q_pfc_set_set = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + set, "set"); +cmdline_parse_token_string_t cmd_q_pfc_set_flow_ctrl = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + pfc_queue_ctrl, "pfc_queue_ctrl"); +cmdline_parse_token_num_t cmd_q_pfc_set_portid = + TOKEN_NUM_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + port_id, RTE_UINT16); +cmdline_parse_token_string_t cmd_q_pfc_set_rx = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + rx, "rx"); +cmdline_parse_token_string_t cmd_q_pfc_set_rx_mode = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + rx_pfc_mode, "on#off"); +cmdline_parse_token_num_t cmd_q_pfc_set_tx_qid = + TOKEN_NUM_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + tx_qid, RTE_UINT16); +cmdline_p
[PATCH v1 1/2] ethdev: support queue-based priority flow control
From: Sunil Kumar Kori Based on device support and use-case need, there are two different ways to enable PFC. The first case is the port level PFC configuration, in this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to configure the PFC, and PFC frames will be generated using based on VLAN TC value. The second case is the queue level PFC configuration, in this case, Any packet field content can be used to steer the packet to the specific queue using rte_flow or RSS and then use rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping on each queue. Based on congestion selected on the specific queue, configured TC shall be used to generate PFC frames. Operation of these modes are mutually exclusive, when driver sets non zero value for rte_eth_dev_info::pfc_queue_tc_max, application must use queue level PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API instead of port level PFC configuration via rte_eth_dev_priority_flow_ctrl_set() API to realize PFC configuration. This patch enables the configuration for second case a.k.a queue based PFC also updates rte_eth_dev_priority_flow_ctrl_set() implmentaion to adheher to rte_eth_dev_info::pfc_queue_tc_max handling. Also updated libabigail.abignore to ignore the update to reserved fields in rte_eth_dev_info. Signed-off-by: Jerin Jacob Signed-off-by: Sunil Kumar Kori Change-Id: Ibd67e65914cc7c4164c779ba0a59fb807b6e2baa --- v1: - Added queue based PFC config API instead port based devtools/libabigail.abignore | 5 ++ doc/guides/nics/features.rst | 5 +- doc/guides/rel_notes/release_22_03.rst | 3 + lib/ethdev/ethdev_driver.h | 6 +- lib/ethdev/rte_ethdev.c| 109 + lib/ethdev/rte_ethdev.h| 83 ++- lib/ethdev/version.map | 3 + 7 files changed, 208 insertions(+), 6 deletions(-) diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore index 4b676f317d..1d29e60c06 100644 --- a/devtools/libabigail.abignore +++ b/devtools/libabigail.abignore @@ -11,3 +11,8 @@ ; Ignore generated PMD information strings [suppress_variable] name_regexp = _pmd_info$ + +; Ignore fields inserted in place of reserved fields of rte_eth_dev_info +[suppress_type] + name = rte_eth_dev_info + has_data_member_inserted_between = {offset_after(switch_info), end} diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index 27be2d2576..277a784f4e 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -379,9 +379,10 @@ Flow control Supports configuring link flow control. * **[implements] eth_dev_ops**: ``flow_ctrl_get``, ``flow_ctrl_set``, - ``priority_flow_ctrl_set``. + ``priority_flow_ctrl_set``, ``priority_flow_ctrl_queue_set``. * **[related]API**: ``rte_eth_dev_flow_ctrl_get()``, ``rte_eth_dev_flow_ctrl_set()``, - ``rte_eth_dev_priority_flow_ctrl_set()``. + ``rte_eth_dev_priority_flow_ctrl_set()``, ``rte_eth_dev_priority_flow_ctrl_queue_set()``. +* **[provides] rte_eth_dev_info**: ``pfc_queue_tc_max``. .. _nic_features_rate_limitation: diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst index 6d99d1eaa9..b75c0356e6 100644 --- a/doc/guides/rel_notes/release_22_03.rst +++ b/doc/guides/rel_notes/release_22_03.rst @@ -55,6 +55,9 @@ New Features Also, make sure to start the actual text at the margin. === +* **Added an API to enable queue based priority flow ctrl(PFC).** + + A new API, ``rte_eth_dev_priority_flow_ctrl_queue_set()``, was added. Removed Items - diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index d95605a355..e0bbfe89d7 100644 --- a/lib/ethdev/ethdev_driver.h +++ b/lib/ethdev/ethdev_driver.h @@ -532,6 +532,9 @@ typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev, /** @internal Setup priority flow control parameter on an Ethernet device. */ typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev, struct rte_eth_pfc_conf *pfc_conf); +/** @internal Queue setup for priority flow control parameter on an Ethernet device. */ +typedef int (*priority_flow_ctrl_queue_set_t)(struct rte_eth_dev *dev, + struct rte_eth_pfc_queue_conf *pfc_queue_conf); /** @internal Update RSS redirection table on an Ethernet device. */ typedef int (*reta_update_t)(struct rte_eth_dev *dev, @@ -1080,7 +1083,8 @@ struct eth_dev_ops { flow_ctrl_set_tflow_ctrl_set; /**< Setup flow control */ /** Setup priority flow control */ priority_flow_ctrl_set_t priority_flow_ctrl_set; - + /** Priority flow control queue setup */ + priority_flow_ctrl_queue_set_t priority_flow_ctrl_queue_set; /** Set Unicast Table Array */ eth_uc_hash_table_set_tuc_hash_table_set;
[PATCH v1 2/2] app/testpmd: add queue based pfc CLI options
From: Sunil Kumar Kori Patch adds command line options to configure queue based priority flow control. - Syntax command is given as below: set pfc_queue_ctrl rx\ tx - Example command to configure queue based priority flow control on rx and tx side for port 0, Rx queue 0, Tx queue 0 with pause time 2047 testpmd> set pfc_queue_ctrl 0 rx on 0 0 tx on 0 0 2047 Signed-off-by: Sunil Kumar Kori --- app/test-pmd/cmdline.c | 122 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 22 2 files changed, 144 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 6e10afeedd..0240fcf9e7 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -544,6 +544,11 @@ static void cmd_help_long_parsed(void *parsed_result, "Set the priority flow control parameter on a" " port.\n\n" + "set pfc_queue_ctrl (port_id) rx (on|off) (tx_qid)" + " (tx_tc) tx (on|off) (rx_qid) (rx_tc) (pause_time)\n" + "Set the queue priority flow control parameter on a" + " given Rx and Tx queues of a port.\n\n" + "set stat_qmap (tx|rx) (port_id) (queue_id) (qmapping)\n" "Set statistics mapping (qmapping 0..15) for RX/TX" " queue on port.\n" @@ -7690,6 +7695,122 @@ cmdline_parse_inst_t cmd_priority_flow_control_set = { }, }; +struct cmd_queue_priority_flow_ctrl_set_result { + cmdline_fixed_string_t set; + cmdline_fixed_string_t pfc_queue_ctrl; + portid_t port_id; + cmdline_fixed_string_t rx; + cmdline_fixed_string_t rx_pfc_mode; + uint16_t tx_qid; + uint8_t tx_tc; + cmdline_fixed_string_t tx; + cmdline_fixed_string_t tx_pfc_mode; + uint16_t rx_qid; + uint8_t rx_tc; + uint16_t pause_time; +}; + +static void +cmd_queue_priority_flow_ctrl_set_parsed(void *parsed_result, + __rte_unused struct cmdline *cl, + __rte_unused void *data) +{ + struct cmd_queue_priority_flow_ctrl_set_result *res = parsed_result; + struct rte_eth_pfc_queue_conf pfc_queue_conf; + int rx_fc_enable, tx_fc_enable; + int ret; + + /* +* Rx on/off, flow control is enabled/disabled on RX side. This can +* indicate the RTE_ETH_FC_TX_PAUSE, Transmit pause frame at the Rx +* side. Tx on/off, flow control is enabled/disabled on TX side. This +* can indicate the RTE_ETH_FC_RX_PAUSE, Respond to the pause frame at +* the Tx side. +*/ + static enum rte_eth_fc_mode rx_tx_onoff_2_mode[2][2] = { + {RTE_ETH_FC_NONE, RTE_ETH_FC_TX_PAUSE}, + {RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_FULL} + }; + + memset(&pfc_queue_conf, 0, sizeof(struct rte_eth_pfc_queue_conf)); + rx_fc_enable = (!strncmp(res->rx_pfc_mode, "on", 2)) ? 1 : 0; + tx_fc_enable = (!strncmp(res->tx_pfc_mode, "on", 2)) ? 1 : 0; + pfc_queue_conf.mode = rx_tx_onoff_2_mode[rx_fc_enable][tx_fc_enable]; + pfc_queue_conf.rx_pause.tc = res->tx_tc; + pfc_queue_conf.rx_pause.tx_qid = res->tx_qid; + pfc_queue_conf.tx_pause.tc = res->rx_tc; + pfc_queue_conf.tx_pause.rx_qid = res->rx_qid; + pfc_queue_conf.tx_pause.pause_time = res->pause_time; + + ret = rte_eth_dev_priority_flow_ctrl_queue_set(res->port_id, + &pfc_queue_conf); + if (ret != 0) { + fprintf(stderr, + "bad queue priority flow control parameter, rc = %d\n", + ret); + } +} + +cmdline_parse_token_string_t cmd_q_pfc_set_set = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + set, "set"); +cmdline_parse_token_string_t cmd_q_pfc_set_flow_ctrl = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + pfc_queue_ctrl, "pfc_queue_ctrl"); +cmdline_parse_token_num_t cmd_q_pfc_set_portid = + TOKEN_NUM_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + port_id, RTE_UINT16); +cmdline_parse_token_string_t cmd_q_pfc_set_rx = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + rx, "rx"); +cmdline_parse_token_string_t cmd_q_pfc_set_rx_mode = + TOKEN_STRING_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + rx_pfc_mode, "on#off"); +cmdline_parse_token_num_t cmd_q_pfc_set_tx_qid = + TOKEN_NUM_INITIALIZER(struct cmd_queue_priority_flow_ctrl_set_result, + tx_qid, RTE_UINT16); +cmdline_parse_token_num_t cmd_q_pfc_set_tx_tc = + TOKEN_
[PATCH v1 1/2] ethdev: support queue-based priority flow control
From: Sunil Kumar Kori Based on device support and use-case need, there are two different ways to enable PFC. The first case is the port level PFC configuration, in this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to configure the PFC, and PFC frames will be generated using based on VLAN TC value. The second case is the queue level PFC configuration, in this case, Any packet field content can be used to steer the packet to the specific queue using rte_flow or RSS and then use rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping on each queue. Based on congestion selected on the specific queue, configured TC shall be used to generate PFC frames. Operation of these modes are mutually exclusive, when driver sets non zero value for rte_eth_dev_info::pfc_queue_tc_max, application must use queue level PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API instead of port level PFC configuration via rte_eth_dev_priority_flow_ctrl_set() API to realize PFC configuration. This patch enables the configuration for second case a.k.a queue based PFC also updates rte_eth_dev_priority_flow_ctrl_set() implmentaion to adheher to rte_eth_dev_info::pfc_queue_tc_max handling. Also updated libabigail.abignore to ignore the update to reserved fields in rte_eth_dev_info. Signed-off-by: Jerin Jacob Signed-off-by: Sunil Kumar Kori --- v1: - Added queue based PFC config API instead port based devtools/libabigail.abignore | 5 ++ doc/guides/nics/features.rst | 5 +- doc/guides/rel_notes/release_22_03.rst | 3 + lib/ethdev/ethdev_driver.h | 6 +- lib/ethdev/rte_ethdev.c| 109 + lib/ethdev/rte_ethdev.h| 83 ++- lib/ethdev/version.map | 3 + 7 files changed, 208 insertions(+), 6 deletions(-) diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore index 4b676f317d..1d29e60c06 100644 --- a/devtools/libabigail.abignore +++ b/devtools/libabigail.abignore @@ -11,3 +11,8 @@ ; Ignore generated PMD information strings [suppress_variable] name_regexp = _pmd_info$ + +; Ignore fields inserted in place of reserved fields of rte_eth_dev_info +[suppress_type] + name = rte_eth_dev_info + has_data_member_inserted_between = {offset_after(switch_info), end} diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index 27be2d2576..277a784f4e 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -379,9 +379,10 @@ Flow control Supports configuring link flow control. * **[implements] eth_dev_ops**: ``flow_ctrl_get``, ``flow_ctrl_set``, - ``priority_flow_ctrl_set``. + ``priority_flow_ctrl_set``, ``priority_flow_ctrl_queue_set``. * **[related]API**: ``rte_eth_dev_flow_ctrl_get()``, ``rte_eth_dev_flow_ctrl_set()``, - ``rte_eth_dev_priority_flow_ctrl_set()``. + ``rte_eth_dev_priority_flow_ctrl_set()``, ``rte_eth_dev_priority_flow_ctrl_queue_set()``. +* **[provides] rte_eth_dev_info**: ``pfc_queue_tc_max``. .. _nic_features_rate_limitation: diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst index 6d99d1eaa9..b75c0356e6 100644 --- a/doc/guides/rel_notes/release_22_03.rst +++ b/doc/guides/rel_notes/release_22_03.rst @@ -55,6 +55,9 @@ New Features Also, make sure to start the actual text at the margin. === +* **Added an API to enable queue based priority flow ctrl(PFC).** + + A new API, ``rte_eth_dev_priority_flow_ctrl_queue_set()``, was added. Removed Items - diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index d95605a355..e0bbfe89d7 100644 --- a/lib/ethdev/ethdev_driver.h +++ b/lib/ethdev/ethdev_driver.h @@ -532,6 +532,9 @@ typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev, /** @internal Setup priority flow control parameter on an Ethernet device. */ typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev, struct rte_eth_pfc_conf *pfc_conf); +/** @internal Queue setup for priority flow control parameter on an Ethernet device. */ +typedef int (*priority_flow_ctrl_queue_set_t)(struct rte_eth_dev *dev, + struct rte_eth_pfc_queue_conf *pfc_queue_conf); /** @internal Update RSS redirection table on an Ethernet device. */ typedef int (*reta_update_t)(struct rte_eth_dev *dev, @@ -1080,7 +1083,8 @@ struct eth_dev_ops { flow_ctrl_set_tflow_ctrl_set; /**< Setup flow control */ /** Setup priority flow control */ priority_flow_ctrl_set_t priority_flow_ctrl_set; - + /** Priority flow control queue setup */ + priority_flow_ctrl_queue_set_t priority_flow_ctrl_queue_set; /** Set Unicast Table Array */ eth_uc_hash_table_set_tuc_hash_table_set; /** Set Unicast hash bitmap */ diff --git a/lib
[PATCH v1 1/2] common/cnxk: support priority flow ctrl config API
From: Sunil Kumar Kori CNXK platforms support priority flow control(802.1qbb) to pause respective traffic per class on that link. Patch adds RoC interface to configure priority flow control on MAC block i.e. CGX on cn9k and RPM on cn10k. Signed-off-by: Sunil Kumar Kori --- drivers/common/cnxk/roc_mbox.h | 17 +++ drivers/common/cnxk/roc_nix.h| 21 drivers/common/cnxk/roc_nix_fc.c | 95 ++-- drivers/common/cnxk/roc_nix_priv.h | 4 +- drivers/common/cnxk/roc_nix_tm.c | 162 ++- drivers/common/cnxk/roc_nix_tm_ops.c | 14 ++- drivers/common/cnxk/version.map | 4 + 7 files changed, 298 insertions(+), 19 deletions(-) diff --git a/drivers/common/cnxk/roc_mbox.h b/drivers/common/cnxk/roc_mbox.h index b63fe108c9..12a599 100644 --- a/drivers/common/cnxk/roc_mbox.h +++ b/drivers/common/cnxk/roc_mbox.h @@ -95,6 +95,8 @@ struct mbox_msghdr { msg_rsp) \ M(CGX_STATS_RST, 0x21A, cgx_stats_rst, msg_req, msg_rsp) \ M(RPM_STATS, 0x21C, rpm_stats, msg_req, rpm_stats_rsp) \ + M(CGX_PRIO_FLOW_CTRL_CFG, 0x21F, cgx_prio_flow_ctrl_cfg, cgx_pfc_cfg, \ + cgx_pfc_rsp) \ /* NPA mbox IDs (range 0x400 - 0x5FF) */ \ M(NPA_LF_ALLOC, 0x400, npa_lf_alloc, npa_lf_alloc_req, \ npa_lf_alloc_rsp)\ @@ -540,6 +542,19 @@ struct cgx_pause_frm_cfg { uint8_t __io tx_pause; }; +struct cgx_pfc_cfg { + struct mbox_msghdr hdr; + uint8_t __io rx_pause; + uint8_t __io tx_pause; + uint16_t __io pfc_en; /* bitmap indicating enabled traffic classes */ +}; + +struct cgx_pfc_rsp { + struct mbox_msghdr hdr; + uint8_t __io rx_pause; + uint8_t __io tx_pause; +}; + struct sfp_eeprom_s { #define SFP_EEPROM_SIZE 256 uint16_t __io sff_id; @@ -1115,6 +1130,8 @@ struct nix_bp_cfg_req { * so maximum 64 channels are possible. */ #define NIX_MAX_CHAN 64 +#define NIX_CGX_MAX_CHAN 16 +#define NIX_LBK_MAX_CHAN NIX_MAX_CHAN struct nix_bp_cfg_rsp { struct mbox_msghdr hdr; /* Channel and bpid mapping */ diff --git a/drivers/common/cnxk/roc_nix.h b/drivers/common/cnxk/roc_nix.h index 69a5e8e7b4..506b05b43e 100644 --- a/drivers/common/cnxk/roc_nix.h +++ b/drivers/common/cnxk/roc_nix.h @@ -165,16 +165,27 @@ struct roc_nix_fc_cfg { struct { uint32_t rq; + uint16_t tc; uint16_t cq_drop; bool enable; } cq_cfg; struct { + uint32_t sq; + uint16_t tc; bool enable; } tm_cfg; }; }; +struct roc_nix_pfc_cfg { + enum roc_nix_fc_mode mode; + /* For SET, tc must be [0, 15]. +* For GET, TC will represent bitmap +*/ + uint16_t tc; +}; + struct roc_nix_eeprom_info { #define ROC_NIX_EEPROM_SIZE 256 uint16_t sff_id; @@ -478,6 +489,7 @@ void __roc_api roc_nix_unregister_cq_irqs(struct roc_nix *roc_nix); enum roc_nix_tm_tree { ROC_NIX_TM_DEFAULT = 0, ROC_NIX_TM_RLIMIT, + ROC_NIX_TM_PFC, ROC_NIX_TM_USER, ROC_NIX_TM_TREE_MAX, }; @@ -624,6 +636,7 @@ roc_nix_tm_shaper_default_red_algo(struct roc_nix_tm_node *node, int __roc_api roc_nix_tm_lvl_cnt_get(struct roc_nix *roc_nix); int __roc_api roc_nix_tm_lvl_have_link_access(struct roc_nix *roc_nix, int lvl); int __roc_api roc_nix_tm_prepare_rate_limited_tree(struct roc_nix *roc_nix); +int __roc_api roc_nix_tm_prepare_pfc_tree(struct roc_nix *roc_nix); bool __roc_api roc_nix_tm_is_user_hierarchy_enabled(struct roc_nix *nix); int __roc_api roc_nix_tm_tree_type_get(struct roc_nix *nix); @@ -736,6 +749,14 @@ int __roc_api roc_nix_fc_config_get(struct roc_nix *roc_nix, int __roc_api roc_nix_fc_mode_set(struct roc_nix *roc_nix, enum roc_nix_fc_mode mode); +int __roc_api roc_nix_pfc_mode_set(struct roc_nix *roc_nix, + struct roc_nix_pfc_cfg *pfc_cfg); + +int __roc_api roc_nix_pfc_mode_get(struct roc_nix *roc_nix, + struct roc_nix_pfc_cfg *pfc_cfg); + +uint16_t __roc_api roc_nix_chan_count_get(struct roc_nix *roc_nix); + enum roc_nix_fc_mode __roc_api roc_nix_fc_mode_get(struct roc_nix *roc_nix); void __roc_api rox_nix_fc_npa_bp_cfg(struct roc_nix *roc_nix, uint64_t pool_id, diff --git a/drivers/common/cnxk/roc_nix_fc.c b/drivers/common/cnxk/roc_nix_fc.c index ca29cd2bf9..95f466242a 100644 --- a/drivers/common/cnxk/roc_nix_fc.c +++ b/drivers/common/cnxk/roc_nix_fc.c @@ -36,7 +36,7 @@ nix_fc_rxchan_bpid_set(struct roc_nix *roc_nix, bool enable) str
[PATCH v1 2/2] net/cnxk: support priority flow control
From: Sunil Kumar Kori Patch implements priority flow control support for CNXK platforms. Signed-off-by: Sunil Kumar Kori --- drivers/net/cnxk/cnxk_ethdev.c | 19 drivers/net/cnxk/cnxk_ethdev.h | 16 +++ drivers/net/cnxk/cnxk_ethdev_ops.c | 177 +++-- 3 files changed, 203 insertions(+), 9 deletions(-) diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c index 74f625553d..382d88bbf3 100644 --- a/drivers/net/cnxk/cnxk_ethdev.c +++ b/drivers/net/cnxk/cnxk_ethdev.c @@ -1260,6 +1260,8 @@ cnxk_nix_configure(struct rte_eth_dev *eth_dev) goto cq_fini; } + /* Initialize TC to SQ mapping as invalid */ + memset(dev->pfc_tc_sq_map, 0xFF, sizeof(dev->pfc_tc_sq_map)); /* * Restore queue config when reconfigure followed by * reconfigure and no queue configure invoked from application case. @@ -1548,6 +1550,7 @@ struct eth_dev_ops cnxk_eth_dev_ops = { .tx_burst_mode_get = cnxk_nix_tx_burst_mode_get, .flow_ctrl_get = cnxk_nix_flow_ctrl_get, .flow_ctrl_set = cnxk_nix_flow_ctrl_set, + .priority_flow_ctrl_queue_set = cnxk_nix_priority_flow_ctrl_queue_set, .dev_set_link_up = cnxk_nix_set_link_up, .dev_set_link_down = cnxk_nix_set_link_down, .get_module_info = cnxk_nix_get_module_info, @@ -1721,6 +1724,8 @@ cnxk_eth_dev_uninit(struct rte_eth_dev *eth_dev, bool reset) { struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev); const struct eth_dev_ops *dev_ops = eth_dev->dev_ops; + struct rte_eth_pfc_queue_conf pfc_conf = {0}; + struct rte_eth_fc_conf fc_conf = {0}; struct roc_nix *nix = &dev->nix; int rc, i; @@ -1736,6 +1741,20 @@ cnxk_eth_dev_uninit(struct rte_eth_dev *eth_dev, bool reset) roc_nix_npc_rx_ena_dis(nix, false); + /* Restore 802.3 Flow control configuration */ + fc_conf.mode = RTE_ETH_FC_NONE; + rc = cnxk_nix_flow_ctrl_set(eth_dev, &fc_conf); + + pfc_conf.mode = RTE_ETH_FC_NONE; + pfc_conf.rx_pause.tc = roc_nix_chan_count_get(nix) - 1; + pfc_conf.tx_pause.tc = roc_nix_chan_count_get(nix) - 1; + rc = cnxk_nix_priority_flow_ctrl_queue_set(eth_dev, &pfc_conf); + if (rc) + plt_err("Failed to reset PFC. error code(%d)", rc); + + fc_conf.mode = RTE_ETH_FC_FULL; + rc = cnxk_nix_flow_ctrl_set(eth_dev, &fc_conf); + /* Disable and free rte_meter entries */ nix_meter_fini(dev); diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h index 5bfda3d815..28fb19307a 100644 --- a/drivers/net/cnxk/cnxk_ethdev.h +++ b/drivers/net/cnxk/cnxk_ethdev.h @@ -143,6 +143,16 @@ struct cnxk_fc_cfg { uint8_t tx_pause; }; +struct cnxk_pfc_cfg { + struct cnxk_fc_cfg fc_cfg; + uint16_t class_en; + uint16_t pause_time; + uint8_t rx_tc; + uint8_t rx_qid; + uint8_t tx_tc; + uint8_t tx_qid; +}; + struct cnxk_eth_qconf { union { struct rte_eth_txconf tx; @@ -366,6 +376,8 @@ struct cnxk_eth_dev { struct cnxk_eth_qconf *rx_qconf; /* Flow control configuration */ + uint16_t pfc_tc_sq_map[16]; + struct cnxk_pfc_cfg pfc_cfg; struct cnxk_fc_cfg fc_cfg; /* PTP Counters */ @@ -467,6 +479,8 @@ int cnxk_nix_flow_ctrl_set(struct rte_eth_dev *eth_dev, struct rte_eth_fc_conf *fc_conf); int cnxk_nix_flow_ctrl_get(struct rte_eth_dev *eth_dev, struct rte_eth_fc_conf *fc_conf); +int cnxk_nix_priority_flow_ctrl_queue_set(struct rte_eth_dev *eth_dev, + struct rte_eth_pfc_queue_conf *pfc_conf); int cnxk_nix_set_link_up(struct rte_eth_dev *eth_dev); int cnxk_nix_set_link_down(struct rte_eth_dev *eth_dev); int cnxk_nix_get_module_info(struct rte_eth_dev *eth_dev, @@ -606,6 +620,8 @@ int nix_mtr_color_action_validate(struct rte_eth_dev *eth_dev, uint32_t id, uint32_t *prev_id, uint32_t *next_id, struct cnxk_mtr_policy_node *policy, int *tree_level); +int nix_priority_flow_ctrl_configure(struct rte_eth_dev *eth_dev, +struct cnxk_pfc_cfg *conf); /* Inlines */ static __rte_always_inline uint64_t diff --git a/drivers/net/cnxk/cnxk_ethdev_ops.c b/drivers/net/cnxk/cnxk_ethdev_ops.c index ce5f1f7240..27fa2da36d 100644 --- a/drivers/net/cnxk/cnxk_ethdev_ops.c +++ b/drivers/net/cnxk/cnxk_ethdev_ops.c @@ -69,6 +69,8 @@ cnxk_nix_info_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *devinfo) devinfo->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP | RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP; devinfo->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP; + + devinfo->pfc_queue_tc_max = roc_nix_chan_count_get(&dev->nix);
RE: [PATCH v1 1/2] ethdev: support queue-based priority flow control
As per discussion with Jerin against the RFC http://patches.dpdk.org/project/dpdk/patch/20211204172458.1904300-1-jer...@marvell.com/, following change set adds queue based PFC configuration. Also corresponding implementation for the API is available at http://patches.dpdk.org/project/dpdk/patch/2022010930.751933-2-sk...@marvell.com/ for Marvell platforms. Regards Sunil Kumar Kori >-Original Message- >From: sk...@marvell.com >Sent: Sunday, January 9, 2022 4:29 PM >To: Ray Kinsella ; Thomas Monjalon >; Ferruh Yigit ; Andrew >Rybchenko >Cc: dev@dpdk.org; Sunil Kumar Kori ; Jerin Jacob >Kollanukkaran >Subject: [PATCH v1 1/2] ethdev: support queue-based priority flow control > >From: Sunil Kumar Kori > >Based on device support and use-case need, there are two different ways to >enable PFC. The first case is the port level PFC configuration, in this case, >rte_eth_dev_priority_flow_ctrl_set() API shall be used to configure the PFC, >and PFC frames will be generated using based on VLAN TC value. > >The second case is the queue level PFC configuration, in this case, Any packet >field content can be used to steer the packet to the specific queue using >rte_flow or RSS and then use >rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping on each >queue. Based on congestion selected on the specific queue, configured TC >shall be used to generate PFC frames. > >Operation of these modes are mutually exclusive, when driver sets non zero >value for rte_eth_dev_info::pfc_queue_tc_max, >application must use queue level PFC configuration via >rte_eth_dev_priority_flow_ctrl_queue_set() API instead of port level PFC >configuration via rte_eth_dev_priority_flow_ctrl_set() API to realize PFC >configuration. > >This patch enables the configuration for second case a.k.a queue based PFC >also updates rte_eth_dev_priority_flow_ctrl_set() >implmentaion to adheher to rte_eth_dev_info::pfc_queue_tc_max >handling. > >Also updated libabigail.abignore to ignore the update to reserved fields in >rte_eth_dev_info. > >Signed-off-by: Jerin Jacob >Signed-off-by: Sunil Kumar Kori >--- >v1: > - Added queue based PFC config API instead port based > > devtools/libabigail.abignore | 5 ++ > doc/guides/nics/features.rst | 5 +- > doc/guides/rel_notes/release_22_03.rst | 3 + > lib/ethdev/ethdev_driver.h | 6 +- > lib/ethdev/rte_ethdev.c| 109 + > lib/ethdev/rte_ethdev.h| 83 ++- > lib/ethdev/version.map | 3 + > 7 files changed, 208 insertions(+), 6 deletions(-) > >diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore index >4b676f317d..1d29e60c06 100644 >--- a/devtools/libabigail.abignore >+++ b/devtools/libabigail.abignore >@@ -11,3 +11,8 @@ > ; Ignore generated PMD information strings [suppress_variable] > name_regexp = _pmd_info$ >+ >+; Ignore fields inserted in place of reserved fields of >+rte_eth_dev_info [suppress_type] >+ name = rte_eth_dev_info >+ has_data_member_inserted_between = {offset_after(switch_info), >+end} >diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index >27be2d2576..277a784f4e 100644 >--- a/doc/guides/nics/features.rst >+++ b/doc/guides/nics/features.rst >@@ -379,9 +379,10 @@ Flow control > Supports configuring link flow control. > > * **[implements] eth_dev_ops**: ``flow_ctrl_get``, ``flow_ctrl_set``, >- ``priority_flow_ctrl_set``. >+ ``priority_flow_ctrl_set``, ``priority_flow_ctrl_queue_set``. > * **[related]API**: ``rte_eth_dev_flow_ctrl_get()``, >``rte_eth_dev_flow_ctrl_set()``, >- ``rte_eth_dev_priority_flow_ctrl_set()``. >+ ``rte_eth_dev_priority_flow_ctrl_set()``, >``rte_eth_dev_priority_flow_ctrl_queue_set()``. >+* **[provides] rte_eth_dev_info**: ``pfc_queue_tc_max``. > > > .. _nic_features_rate_limitation: >diff --git a/doc/guides/rel_notes/release_22_03.rst >b/doc/guides/rel_notes/release_22_03.rst >index 6d99d1eaa9..b75c0356e6 100644 >--- a/doc/guides/rel_notes/release_22_03.rst >+++ b/doc/guides/rel_notes/release_22_03.rst >@@ -55,6 +55,9 @@ New Features > Also, make sure to start the actual text at the margin. > === > >+* **Added an API to enable queue based priority flow ctrl(PFC).** >+ >+ A new API, ``rte_eth_dev_priority_flow_ctrl_queue_set()``, was added. > > Removed Items > - >diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index >d95605a355..e0bbfe89d7 100644 >--- a/lib/ethdev/ethdev_driver.h >+++ b/lib/ethdev/ethdev_driver.h >@@ -532,6 +532,9 @@ typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev, > /** @internal Setup priority flow control parameter on an Ethernet device. */ >typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev, > struct rte_eth_pfc_conf *pfc_conf); >+/** @internal Queue setup for priority flow control parameter on
RE: [PATCH v1 2/2] net/cnxk: support priority flow control
Following patch sets are dependent on http://patches.dpdk.org/project/dpdk/patch/20220109105851.734687-1-sk...@marvell.com/. Regards Sunil Kumar Kori >-Original Message- >From: sk...@marvell.com >Sent: Sunday, January 9, 2022 4:42 PM >To: Nithin Kumar Dabilpuram ; Kiran Kumar >Kokkilagadda ; Sunil Kumar Kori >; Satha Koteswara Rao Kottidi > >Cc: dev@dpdk.org >Subject: [PATCH v1 2/2] net/cnxk: support priority flow control > >From: Sunil Kumar Kori > >Patch implements priority flow control support for CNXK platforms. > >Signed-off-by: Sunil Kumar Kori >--- > drivers/net/cnxk/cnxk_ethdev.c | 19 > drivers/net/cnxk/cnxk_ethdev.h | 16 +++ > drivers/net/cnxk/cnxk_ethdev_ops.c | 177 +++-- > 3 files changed, 203 insertions(+), 9 deletions(-) > >diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c >index 74f625553d..382d88bbf3 100644 >--- a/drivers/net/cnxk/cnxk_ethdev.c >+++ b/drivers/net/cnxk/cnxk_ethdev.c >@@ -1260,6 +1260,8 @@ cnxk_nix_configure(struct rte_eth_dev *eth_dev) > goto cq_fini; > } > >+ /* Initialize TC to SQ mapping as invalid */ >+ memset(dev->pfc_tc_sq_map, 0xFF, sizeof(dev->pfc_tc_sq_map)); > /* >* Restore queue config when reconfigure followed by >* reconfigure and no queue configure invoked from application case. >@@ -1548,6 +1550,7 @@ struct eth_dev_ops cnxk_eth_dev_ops = { > .tx_burst_mode_get = cnxk_nix_tx_burst_mode_get, > .flow_ctrl_get = cnxk_nix_flow_ctrl_get, > .flow_ctrl_set = cnxk_nix_flow_ctrl_set, >+ .priority_flow_ctrl_queue_set = >cnxk_nix_priority_flow_ctrl_queue_set, > .dev_set_link_up = cnxk_nix_set_link_up, > .dev_set_link_down = cnxk_nix_set_link_down, > .get_module_info = cnxk_nix_get_module_info, @@ -1721,6 +1724,8 >@@ cnxk_eth_dev_uninit(struct rte_eth_dev *eth_dev, bool reset) { > struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev); > const struct eth_dev_ops *dev_ops = eth_dev->dev_ops; >+ struct rte_eth_pfc_queue_conf pfc_conf = {0}; >+ struct rte_eth_fc_conf fc_conf = {0}; > struct roc_nix *nix = &dev->nix; > int rc, i; > >@@ -1736,6 +1741,20 @@ cnxk_eth_dev_uninit(struct rte_eth_dev *eth_dev, >bool reset) > > roc_nix_npc_rx_ena_dis(nix, false); > >+ /* Restore 802.3 Flow control configuration */ >+ fc_conf.mode = RTE_ETH_FC_NONE; >+ rc = cnxk_nix_flow_ctrl_set(eth_dev, &fc_conf); >+ >+ pfc_conf.mode = RTE_ETH_FC_NONE; >+ pfc_conf.rx_pause.tc = roc_nix_chan_count_get(nix) - 1; >+ pfc_conf.tx_pause.tc = roc_nix_chan_count_get(nix) - 1; >+ rc = cnxk_nix_priority_flow_ctrl_queue_set(eth_dev, &pfc_conf); >+ if (rc) >+ plt_err("Failed to reset PFC. error code(%d)", rc); >+ >+ fc_conf.mode = RTE_ETH_FC_FULL; >+ rc = cnxk_nix_flow_ctrl_set(eth_dev, &fc_conf); >+ > /* Disable and free rte_meter entries */ > nix_meter_fini(dev); > >diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h >index 5bfda3d815..28fb19307a 100644 >--- a/drivers/net/cnxk/cnxk_ethdev.h >+++ b/drivers/net/cnxk/cnxk_ethdev.h >@@ -143,6 +143,16 @@ struct cnxk_fc_cfg { > uint8_t tx_pause; > }; > >+struct cnxk_pfc_cfg { >+ struct cnxk_fc_cfg fc_cfg; >+ uint16_t class_en; >+ uint16_t pause_time; >+ uint8_t rx_tc; >+ uint8_t rx_qid; >+ uint8_t tx_tc; >+ uint8_t tx_qid; >+}; >+ > struct cnxk_eth_qconf { > union { > struct rte_eth_txconf tx; >@@ -366,6 +376,8 @@ struct cnxk_eth_dev { > struct cnxk_eth_qconf *rx_qconf; > > /* Flow control configuration */ >+ uint16_t pfc_tc_sq_map[16]; >+ struct cnxk_pfc_cfg pfc_cfg; > struct cnxk_fc_cfg fc_cfg; > > /* PTP Counters */ >@@ -467,6 +479,8 @@ int cnxk_nix_flow_ctrl_set(struct rte_eth_dev >*eth_dev, > struct rte_eth_fc_conf *fc_conf); int >cnxk_nix_flow_ctrl_get(struct rte_eth_dev *eth_dev, > struct rte_eth_fc_conf *fc_conf); >+int cnxk_nix_priority_flow_ctrl_queue_set(struct rte_eth_dev *eth_dev, >+struct rte_eth_pfc_queue_conf >*pfc_conf); > int cnxk_nix_set_link_up(struct rte_eth_dev *eth_dev); int >cnxk_nix_set_link_down(struct rte_eth_dev *eth_dev); int >cnxk_nix_get_module_info(struct rte_eth_dev *eth_dev, @@ -606,6 +620,8 >@@ int nix_mtr_color_action_validate(struct rte_eth_dev *eth_dev, uint32_t >id, > uint32_t *prev_id, uint32_t *next_id, > struct cnxk_mtr_policy_node *policy, > int *tree_level); >+int nix_priority_flow_ctrl_configure(struct rte_eth_dev *eth_dev, >+ struct cnxk_pfc_cfg *conf); > > /* Inlines */ > static __rte_always_inline uint64_t >diff --git a/drivers/net/cnxk/cnxk_ethdev_ops.c >b/drivers/net/cnxk/cnxk_ethdev_ops.c >index
RE: [PATCH v1] net/mlx5: fix assertion on the flags set in pkt mbuf
Hi, > -Original Message- > From: Lior Margalit > Sent: Thursday, December 23, 2021 10:16 AM > To: Slava Ovsiienko > Cc: Lior Margalit ; dev@dpdk.org; Alexander Kozyrev > ; sta...@dpdk.org > Subject: [PATCH v1] net/mlx5: fix assertion on the flags set in pkt mbuf > > Fixed the assertion on the flags set in pkt->ol_flags for vectorized MPRQ. > With vectorized MPRQ the CQs are processed before copying the MPRQ bufs > so the valid assertion is that the expected flag is set and not that the pkt- > >ol_flags equlas this flag alone. > > Fixes: 0f20acbf5eda ("net/mlx5: implement vectorized MPRQ burst") > Cc: akozy...@nvidia.com > Cc: sta...@dpdk.org > > Signed-off-by: Lior Margalit > Acked-by: Slava Ovsiienko Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh <>
RE: [PATCH] net/mlx5: fix RSS expansion with explicit next protocol
Hi, > -Original Message- > From: Gregory Etelson > Sent: Thursday, December 23, 2021 3:08 PM > To: dev@dpdk.org > Cc: Gregory Etelson ; Raslan Darawsheh > ; sta...@dpdk.org; Matan Azrad > ; Slava Ovsiienko ; Ferruh > Yigit ; Dekel Peled > Subject: [PATCH] net/mlx5: fix RSS expansion with explicit next protocol > > The PMD RSS expansion scheme by default compiles flow rules for all > flow item types that may branch out from a stub supplied > by application. > For example, > ETH can lead to VLAN, IPv4 or IPv6. > IPv4 can lead to UDP, TCP, IPv4 or IPv6. > > If application explicitly specified next protocol type, expansion must > use that option only and not create flows with other protocol types. > > The PMD ignored explicit next protocol values in GRE and VXLAN-GPE. > > The patch updates RSS expansion for GRE and VXLAN-GPE with explicit > next protocol settings. > > Fixes: c7870bfe09dc ("ethdev: move RSS expansion code to mlx5 driver") > > Cc: sta...@dpdk.org > > Signed-off-by: Gregory Etelson > Acked-by: Matan Azrad Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh
RE: [PATCH] net/mlx5: fix GRE protocol type translation for VERB API
Hi, > -Original Message- > From: Gregory Etelson > Sent: Thursday, December 23, 2021 3:17 PM > To: dev@dpdk.org > Cc: Gregory Etelson ; Raslan Darawsheh > ; sta...@dpdk.org; Matan Azrad > ; Slava Ovsiienko ; Yongseok > Koh ; Ori Kam > Subject: [PATCH] net/mlx5: fix GRE protocol type translation for VERB API > > When application creates several flows to match on GRE tunnel without > explicitly specifying GRE protocol type value in flow rules, PMD will > translate that to zero mask. > RDMA-CORE cannot distinguish between different inner flow types and > produces identical matchers for each zero mask. > > The patch extracts inner header type from flow rule and forces it in > GRE protocol type, if application did not specify any. > > Cc: sta...@dpdk.org > > Fixes: 84c406e74524 ("net/mlx5: add flow translate function") > Signed-off-by: Gregory Etelson > Acked-by: Matan Azrad Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh
RE: [PATCH] net/mlx5: support the imissed counter on Windows
Hi, > -Original Message- > From: Tal Shnaiderman > Sent: Friday, December 24, 2021 8:47 AM > To: dev@dpdk.org > Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) ; > Matan Azrad ; Slava Ovsiienko > ; Asaf Penso ; Khoa To > > Subject: [PATCH] net/mlx5: support the imissed counter on Windows > > Add support for the imissed counter using the DevX API on Windows. > > imissed is queried by creating a queue counter for the port, attaching > it to all created RQs and querying the "out_of_buffer" field. > > If the counter cannot be created, imissed will always report 0. > > Signed-off-by: Tal Shnaiderman > Acked-by: Matan Azrad Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh
RE: [PATCH] net/mlx5: fix GCC uninitialized variable warning
Hi, > -Original Message- > From: Dmitry Kozlyuk > Sent: Tuesday, December 28, 2021 11:17 AM > To: dev@dpdk.org > Cc: Li Zhang ; sta...@dpdk.org; Stephen Hemminger > ; Matan Azrad ; Slava > Ovsiienko > Subject: [PATCH] net/mlx5: fix GCC uninitialized variable warning > > When building with -Db_sanitize=thread, GCC gives a warning: > > drivers/net/mlx5/mlx5_flow_meter.c: In function > ‘mlx5_flow_meter_create’: > drivers/net/mlx5/mlx5_flow_meter.c:1170:33: warning: ‘legacy_fm’ may be > used uninitialized in this function [-Wmaybe-uninitialized] > > This is a false-positive: legacy_fm is initialized and used if and only if > priv->sh- > >meter_aso_en is false. > Work around this by initializing legacy_fm to NULL. > Add an assertion before legacy_fm use in case the logic changes. > > Fixes: 444320186393 ("net/mlx5: support meter creation with policy") > Cc: l...@nvidia.com > Cc: sta...@dpdk.org > > Reported-by: Stephen Hemminger > Signed-off-by: Dmitry Kozlyuk > Acked-by: Matan Azrad Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh
Rss hash on mellanox 100G card
I am using Mellanox Technologies MT27800 Family [ConnectX-5], using dpdk 19 with multi rx queue with rss port_conf.rx_adv_conf.rss_conf.rss_hf=(ETH_RSS_IP | ETH_RSS_UDP | ETH_RSS_TCP) I analyze traffic and need all packet of same session to arrive to the same process ( session for now can be ip+port) So Packet that have the same ip + port arrive to the same queue. But If some packet are ip fragmented, packet arrive to different process. It is a problem! How can i calculate the hash value in the c++ code, like it is done in the card, so i can reassemble packets and send them to the same process like the non fragmented packets?
RE: Understanding Flow API action RSS
Hi Stephen and Ivan > -Original Message- > From: Stephen Hemminger > Sent: Tuesday, January 4, 2022 11:56 PM > Subject: Re: Understanding Flow API action RSS > > On Tue, 4 Jan 2022 21:29:14 +0300 (MSK) > Ivan Malov wrote: > > > Hi Stephen, > > > > On Tue, 4 Jan 2022, Stephen Hemminger wrote: > > > > > On Tue, 04 Jan 2022 13:41:55 +0100 > > > Thomas Monjalon wrote: > > > > > >> +Cc Ori Kam, rte_flow maintainer > > >> > > >> 29/12/2021 15:34, Ivan Malov: > > >>> Hi all, > > >>> > > >>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' > > >>> is > > >>> to provide "Queue indices to use". But it is unclear whether the order > > >>> of > > >>> elements is meaningful or not. Does that matter? Can queue indices > > >>> repeat? > > > > > > The order probably doesn't matter, it is like the RSS indirection table. > > > > Sorry, but RSS indirection table (RETA) assumes some structure. In it, > > queue indices can repeat, and the order is meaningful. In DPDK, RETA > > may comprise multiple "groups", each one comprising 64 entries. > > > > This 'queue' array in flow action RSS does not stick with the same > > terminology, it does not reuse the definition of RETA "group", etc. > > Just "queue indices to use". No definition of order, no structure. > > > > The API contract is not clear. Neither to users, nor to PMDs. > > >From API in RSS the queues are simply the queue ID, order doesn't matter, Duplicating the queue may affect the the spread based on the HW/PMD. In common case each queue should appear only once and the PMD may duplicate entries to get the best performance. > > > > > >rx queue = RSS_indirection_table[ RSS_hash_value % > > > RSS_indirection_table_size ] > > > > > > So you could play with multiple queues matching same hash value, but that > > > would be uncommon. > > > > > >>> An ethdev may have "global" RSS setting with an indirection table of > > >>> some > > >>> fixed size (say, 512). In what comes to flow rules, does that size > > >>> matter? > > > > > > Global RSS is only used if the incoming packet does not match any rte_flow > > > action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or > > > RTE_FLOW_ACTION_TYPE_RSS > > > these take precedence. > > > > Yes, I know all of that. The question is how does the PMD select RETA size > > for this action? Can it select an arbitrary value? Or should it stick with > > the "global" one (eg. 512)? How does the user know the table size? > > > > If the user simply wants to spread traffic across the given queues, > > the effective table size is a don't care to them, and the existing > > API contract is fine. But if the user expects that certain packets > > hit some precise queues, they need to know the table size for that. > > Just like you said RSS simply spread the traffic to the given queues. If application wants to send traffic to some queue it should use the queue action. > > So, the question is whether the users should or should not build > > any expectations of the effective table size and, if they should, > > are they supposed to use the "global" table size for that? > > You are right this area is completely undocumented. Personally would really > like > it if rte_flow had a reference software implementation and all the HW vendors > had to make sure their HW matched the SW reference version. But this a case > where the funding is all on the HW side, and no one has time or resources > to do a complete SW version.. > > A sane implementation would configure RSS indirection as across all > rx queues that were available when the device was started; ie all queues > that did not have deferred start set. Then the application would start/stop > queues and use rte_flow to reach them. > > But it doesn't appear the HW follows that model. > > > > >>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, > > >>> does > > >>> that allow the PMD to configure an arbitrary, non-Toeplitz hash > > >>> algorithm? > > > > > > No the default is always Toeplitz. This goes back to the original > > > definition > > > of RSS which is in Microsoft NDIS and uses Toeplitz. > > > > Then why have a dedicated enum named TOEPLITZ? Also, once again, the > > documentation should be more specific to say which algorithm exactly > > this DEFAULT choice provides. Otherwise, it is very vague. > > > > > > > > DPDK should have more examples of using rte_flow, I have some samples > > > but they aren't that useful. > > > > > > > I could not agree more. Feel free to add/suggest what example are missing. > > > > Thanks, > > Ivan M. Best, Ori
RE: [RFC 1/3] ethdev: support GRE optional fields
Hi Sean, > -Original Message- > From: Sean Zhang > Subject: [RFC 1/3] ethdev: support GRE optional fields > > Add flow pattern items and header format for matching optional fields > (checksum/key/sequence) in GRE header. And the flags in gre item should > be correspondingly set with the new added items. > > Signed-off-by: Sean Zhang > --- > doc/guides/prog_guide/rte_flow.rst | 16 > lib/ethdev/rte_flow.c | 1 + > lib/ethdev/rte_flow.h | 18 ++ > 3 files changed, 35 insertions(+) > > diff --git a/doc/guides/prog_guide/rte_flow.rst > b/doc/guides/prog_guide/rte_flow.rst > index c51ed88..48d5685 100644 > --- a/doc/guides/prog_guide/rte_flow.rst > +++ b/doc/guides/prog_guide/rte_flow.rst > @@ -1113,6 +1113,22 @@ This should be preceded by item ``GRE``. > - Value to be matched is a big-endian 32 bit integer. > - When this item present it implicitly match K bit in default mask as "1" > > +Item: ``GRE_OPTION`` > + > + > +Matches a GRE optional fields (checksum/key/sequence). > +This should be preceded by item ``GRE``. > + > +- ``checksum``: checksum. > +- ``key``: key. > +- ``sequence``: sequence. > +- The items in GRE_OPTION do not change bit flags(c_bit/k_bit/s_bit) in GRE > + item. The bit flags need be set with GRE item by application. When the > items > + present, the corresponding bits in GRE spec and mask should be set "1" by > + application, it means to match specified value of the fields. When the > items > + no present, but the corresponding bits in GRE spec and mask is "1", it > means > + to match any value of the fields. > + > Item: ``FUZZY`` > ^^^ > > diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c > index a93f68a..03bd1df 100644 > --- a/lib/ethdev/rte_flow.c > +++ b/lib/ethdev/rte_flow.c > @@ -139,6 +139,7 @@ struct rte_flow_desc_data { > MK_FLOW_ITEM(META, sizeof(struct rte_flow_item_meta)), > MK_FLOW_ITEM(TAG, sizeof(struct rte_flow_item_tag)), > MK_FLOW_ITEM(GRE_KEY, sizeof(rte_be32_t)), > + MK_FLOW_ITEM(GRE_OPTION, sizeof(struct rte_gre_hdr_option)), I think that this new item is making the gre_key redundant, why not deprecate it? > MK_FLOW_ITEM(GTP_PSC, sizeof(struct rte_flow_item_gtp_psc)), > MK_FLOW_ITEM(PPPOES, sizeof(struct rte_flow_item_pppoe)), > MK_FLOW_ITEM(PPPOED, sizeof(struct rte_flow_item_pppoe)), > diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h > index 1031fb2..27b4140 100644 > --- a/lib/ethdev/rte_flow.h > +++ b/lib/ethdev/rte_flow.h > @@ -660,6 +660,13 @@ enum rte_flow_item_type { >* See struct rte_flow_item_ppp. >*/ > RTE_FLOW_ITEM_TYPE_PPP, > + > + /** > + * Matches GRE optional fields. > + * > + * See struct rte_gre_hdr_option. > + */ > + RTE_FLOW_ITEM_TYPE_GRE_OPTION, > }; > > /** > @@ -1196,6 +1203,17 @@ struct rte_flow_item_gre { > #endif > > /** > + * RTE_FLOW_ITEM_TYPE_GRE_OPTION. > + * > + * Matches GRE optional fields in header. > + */ > +struct rte_gre_hdr_option { > + rte_be16_t checksum; > + rte_be32_t key; > + rte_be32_t sequence; > +}; > + > +/** > * RTE_FLOW_ITEM_TYPE_FUZZY > * > * Fuzzy pattern match, expect faster than default. > -- > 1.8.3.1 Best, Ori
RE: Understanding Flow API action RSS
Hi Ori, On Sun, 9 Jan 2022, Ori Kam wrote: Hi Stephen and Ivan -Original Message- From: Stephen Hemminger Sent: Tuesday, January 4, 2022 11:56 PM Subject: Re: Understanding Flow API action RSS On Tue, 4 Jan 2022 21:29:14 +0300 (MSK) Ivan Malov wrote: Hi Stephen, On Tue, 4 Jan 2022, Stephen Hemminger wrote: On Tue, 04 Jan 2022 13:41:55 +0100 Thomas Monjalon wrote: +Cc Ori Kam, rte_flow maintainer 29/12/2021 15:34, Ivan Malov: Hi all, In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is to provide "Queue indices to use". But it is unclear whether the order of elements is meaningful or not. Does that matter? Can queue indices repeat? The order probably doesn't matter, it is like the RSS indirection table. Sorry, but RSS indirection table (RETA) assumes some structure. In it, queue indices can repeat, and the order is meaningful. In DPDK, RETA may comprise multiple "groups", each one comprising 64 entries. This 'queue' array in flow action RSS does not stick with the same terminology, it does not reuse the definition of RETA "group", etc. Just "queue indices to use". No definition of order, no structure. The API contract is not clear. Neither to users, nor to PMDs. From API in RSS the queues are simply the queue ID, order doesn't matter, Duplicating the queue may affect the the spread based on the HW/PMD. In common case each queue should appear only once and the PMD may duplicate entries to get the best performance. Look. In a DPDK PMD, one has "global" RSS table. Consider the following example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue indices may repeat. They may have different order: 1, 1, 0, 0, ... . The order is of great importance. If you send a packet to a DPDK-powered server, you can know in advance its hash value. Hence, you may strictly predict which RSS table entry this hash will point at. That predicts the target Rx queue. So the questions which one should attempt to clarify, are as follows: 1) Is the 'queue' array ordered? (Does the order of elements matter?) 2) Can its elements repeat? (*allowed* or *not allowed*?) rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ] So you could play with multiple queues matching same hash value, but that would be uncommon. An ethdev may have "global" RSS setting with an indirection table of some fixed size (say, 512). In what comes to flow rules, does that size matter? Global RSS is only used if the incoming packet does not match any rte_flow action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS these take precedence. Yes, I know all of that. The question is how does the PMD select RETA size for this action? Can it select an arbitrary value? Or should it stick with the "global" one (eg. 512)? How does the user know the table size? If the user simply wants to spread traffic across the given queues, the effective table size is a don't care to them, and the existing API contract is fine. But if the user expects that certain packets hit some precise queues, they need to know the table size for that. Just like you said RSS simply spread the traffic to the given queues. Yes, to the given queues. The question is whether the 'queue' array has RETA properties (order matters; elements can repeat) or not. If application wants to send traffic to some queue it should use the queue action. Yes, but that's not what I mean. Consider the following example. The user generates packets with random IP addresses at machine A. These packets hit DPDK at machine B. For a given *packet*, the sender (A) can compute its RSS hash in software. This will point out the RETA entry index. But, in order to predict the exact *queue* index, the sender has to know the table (its contents, its size). For a "global" DPDK RSS setting, the table can be easily obtained with an ethdev callback / API. Very simple. Fixed-size table, and it can be queried. But how does one obtain similar knowledge for RSS action? So, the question is whether the users should or should not build any expectations of the effective table size and, if they should, are they supposed to use the "global" table size for that? You are right this area is completely undocumented. Personally would really like it if rte_flow had a reference software implementation and all the HW vendors had to make sure their HW matched the SW reference version. But this a case where the funding is all on the HW side, and no one has time or resources to do a complete SW version.. A sane implementation would configure RSS indirection as across all rx queues that were available when the device was started; ie all queues that did not have deferred start set. Then the application would start/stop queues and use rte_flow to reach them. But it doesn't appear the HW follows that model. When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does that allow the PMD to configure an arb
[PATCH v5 1/2] eal: add API for bus close
From: Rohit Raj As per the current code we have API for bus probe, but the bus close API is missing. This breaks the multi process scenarios as objects are not cleaned while terminating the secondary processes. This patch adds a new API rte_bus_close() for cleanup of bus objects which were acquired during probe. Signed-off-by: Rohit Raj --- Rebased on this patch series: https://patches.dpdk.org/project/dpdk/list/?series=21049 v5: * Updated release notes for new feature and API change. * Added support for error checking while closing bus. * Added experimental banner for new API. * Squashed changes related to freebsd and windows into single patch. * Discarded patch to fix a bug which is already fixed on latest release. v4: * Added comments to clarify responsibility of rte_bus_close. * Added support for rte_bus_close on freebsd. * Added support for rte_bus_close on windows. v3: * nit: combined nested if statements. v2: * Moved rte_bus_close call to rte_eal_cleanup path. doc/guides/rel_notes/release_22_03.rst | 8 +++ lib/eal/common/eal_common_bus.c| 33 +- lib/eal/freebsd/eal.c | 1 + lib/eal/include/rte_bus.h | 30 ++- lib/eal/linux/eal.c| 8 +++ lib/eal/version.map| 3 +++ lib/eal/windows/eal.c | 1 + 7 files changed, 82 insertions(+), 2 deletions(-) diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst index 6d99d1eaa9..7417606a2a 100644 --- a/doc/guides/rel_notes/release_22_03.rst +++ b/doc/guides/rel_notes/release_22_03.rst @@ -55,6 +55,11 @@ New Features Also, make sure to start the actual text at the margin. === +* **Added support to close bus.** + + Added capability to allow a user to do cleanup of bus objects which + were acquired during bus probe. + Removed Items - @@ -84,6 +89,9 @@ API Changes Also, make sure to start the actual text at the margin. === + * eal: Added new API ``rte_bus_close`` to perform cleanup bus objects which + were acquired during bus probe. + ABI Changes --- diff --git a/lib/eal/common/eal_common_bus.c b/lib/eal/common/eal_common_bus.c index baa5b532af..2c3c0a90d2 100644 --- a/lib/eal/common/eal_common_bus.c +++ b/lib/eal/common/eal_common_bus.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright 2016 NXP + * Copyright 2016,2022 NXP */ #include @@ -85,6 +85,37 @@ rte_bus_probe(void) return 0; } +/* Close all devices of all buses */ +int +rte_bus_close(void) +{ + int ret; + struct rte_bus *bus, *vbus = NULL; + + TAILQ_FOREACH(bus, &rte_bus_list, next) { + if (!strcmp(bus->name, "vdev")) { + vbus = bus; + continue; + } + + if (bus->close) { + ret = bus->close(); + if (ret) + RTE_LOG(ERR, EAL, "Bus (%s) close failed.\n", + bus->name); + } + } + + if (vbus && vbus->close) { + ret = vbus->close(); + if (ret) + RTE_LOG(ERR, EAL, "Bus (%s) close failed.\n", + vbus->name); + } + + return 0; +} + /* Dump information of a single bus */ static int bus_dump_one(FILE *f, struct rte_bus *bus) diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c index a1cd2462db..87d70c6898 100644 --- a/lib/eal/freebsd/eal.c +++ b/lib/eal/freebsd/eal.c @@ -984,6 +984,7 @@ rte_eal_cleanup(void) { struct internal_config *internal_conf = eal_get_internal_configuration(); + rte_bus_close(); rte_service_finalize(); rte_mp_channel_cleanup(); /* after this point, any DPDK pointers will become dangling */ diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h index 6efd28..c6211bbd95 100644 --- a/lib/eal/include/rte_bus.h +++ b/lib/eal/include/rte_bus.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright 2016 NXP + * Copyright 2016,2022 NXP */ #ifndef _RTE_BUS_H_ @@ -66,6 +66,23 @@ typedef int (*rte_bus_scan_t)(void); */ typedef int (*rte_bus_probe_t)(void); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Implementation specific close function which is responsible for resetting all + * detected devices on the bus to a default state, closing UIO nodes or VFIO + * groups and also freeing any memory allocated during rte_bus_probe like + * private resources for device list. + * + * This is called while iterating over each registered bus. + * + * @return + * 0 for successful close + * !0 for any error while closing + */ +type
[PATCH v5 2/2] bus/fslmc: support bus close API
From: Rohit Raj This patch add support for closing the bus objects which were acquired In the bus probe. Some devices need to be cleaned while in both primary and secondary process and while some need to be cleaned only in case of primary process. The devices are closed as per the white list used while creating the objects in a particular process. Signed-off-by: Rohit Raj --- drivers/bus/fslmc/fslmc_bus.c| 15 +++- drivers/bus/fslmc/fslmc_vfio.c | 89 +++- drivers/bus/fslmc/fslmc_vfio.h | 3 +- drivers/bus/fslmc/mc/fsl_dpcon.h | 3 +- drivers/bus/fslmc/portal/dpaa2_hw_dpbp.c | 30 +++- drivers/bus/fslmc/portal/dpaa2_hw_dpci.c | 30 +++- drivers/bus/fslmc/portal/dpaa2_hw_dpio.c | 36 +- drivers/bus/fslmc/portal/dpaa2_hw_dprc.c | 31 - drivers/bus/fslmc/rte_fslmc.h| 5 +- drivers/bus/fslmc/version.map| 1 + drivers/event/dpaa2/dpaa2_hw_dpcon.c | 31 - drivers/net/dpaa2/dpaa2_mux.c| 20 +- 12 files changed, 282 insertions(+), 12 deletions(-) diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c index a3c0d838c4..79053233a1 100644 --- a/drivers/bus/fslmc/fslmc_bus.c +++ b/drivers/bus/fslmc/fslmc_bus.c @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * - * Copyright 2016,2018-2021 NXP + * Copyright 2016,2018-2022 NXP * */ @@ -493,6 +493,18 @@ rte_fslmc_probe(void) return 0; } +static int +rte_fslmc_close(void) +{ + int ret = 0; + + ret = fslmc_vfio_close_group(); + if (ret) + DPAA2_BUS_ERR("Unable to close devices %d", ret); + + return 0; +} + static struct rte_device * rte_fslmc_find_device(const struct rte_device *start, rte_dev_cmp_t cmp, const void *data) @@ -668,6 +680,7 @@ struct rte_fslmc_bus rte_fslmc_bus = { .bus = { .scan = rte_fslmc_scan, .probe = rte_fslmc_probe, + .close = rte_fslmc_close, .parse = rte_fslmc_parse, .find_device = rte_fslmc_find_device, .get_iommu_class = rte_dpaa2_get_iommu_class, diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c index 1b89a56bbc..59bb0f0171 100644 --- a/drivers/bus/fslmc/fslmc_vfio.c +++ b/drivers/bus/fslmc/fslmc_vfio.c @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 2015-2016 Freescale Semiconductor, Inc. All rights reserved. - * Copyright 2016-2021 NXP + * Copyright 2016-2022 NXP * */ @@ -1066,3 +1066,90 @@ fslmc_vfio_setup_group(void) return 0; } + +static void +fslmc_close_iodevices(struct rte_dpaa2_device *dev) +{ + struct rte_dpaa2_object *object = NULL; + struct rte_dpaa2_driver *drv; + int ret, probe_all; + + switch (dev->dev_type) { + case DPAA2_IO: + case DPAA2_CON: + case DPAA2_CI: + case DPAA2_BPOOL: + case DPAA2_MUX: + TAILQ_FOREACH(object, &dpaa2_obj_list, next) { + if (dev->dev_type == object->dev_type) + object->close(dev->object_id); + else + continue; + } + break; + case DPAA2_ETH: + case DPAA2_CRYPTO: + case DPAA2_QDMA: + probe_all = rte_fslmc_bus.bus.conf.scan_mode != + RTE_BUS_SCAN_ALLOWLIST; + TAILQ_FOREACH(drv, &rte_fslmc_bus.driver_list, next) { + if (drv->drv_type != dev->dev_type) + continue; + if (rte_dev_is_probed(&dev->device)) + continue; + if (probe_all || + (dev->device.devargs && +dev->device.devargs->policy == +RTE_DEV_ALLOWED)) { + ret = drv->remove(dev); + if (ret) + DPAA2_BUS_ERR("Unable to remove"); + } + } + break; + default: + break; + } + + DPAA2_BUS_LOG(DEBUG, "Device (%s) Closed", + dev->device.name); +} + +int +fslmc_vfio_close_group(void) +{ + struct rte_dpaa2_device *dev, *dev_temp; + + RTE_TAILQ_FOREACH_SAFE(dev, &rte_fslmc_bus.device_list, next, dev_temp) { + if (dev->device.devargs && + dev->device.devargs->policy == RTE_DEV_BLOCKED) { + DPAA2_BUS_LOG(DEBUG, "%s Blacklisted, skipping", + dev->device.name); + TAILQ_REMOVE(&rte_fslmc_bus.device_list, dev, next); + continue; + } + switch (dev->dev_type) { + case DPAA2_ET
RE: [PATCH v2] net/axgbe: use PCI root complex device to distinguish AMD hardware
[Public] Hi, Gentle reminder. This is patch is required to submit changes for new AMD products. Please review and let us know if any modification are required. Regards, Chandu -Original Message- From: Namburu, Chandu-babu Sent: Wednesday, December 22, 2021 2:58 PM To: dev@dpdk.org; david.march...@redhat.com; ferruh.yi...@intel.com Cc: Sebastian, Selwin ; arsalan_a...@mentor.com; sta...@dpdk.org Subject: RE: [PATCH v2] net/axgbe: use PCI root complex device to distinguish AMD hardware [Public] Hi David Marchand, I have submitted v2 patch with your suggestion in git log. Please review changes. Regards, Chandu -Original Message- From: Namburu, Chandu-babu Sent: Thursday, December 2, 2021 9:41 PM To: dev@dpdk.org; david.march...@redhat.com Cc: Sebastian, Selwin ; arsalan_a...@mentor.com; ferruh.yi...@intel.com; Namburu, Chandu-babu ; sta...@dpdk.org Subject: [PATCH v2] net/axgbe: use PCI root complex device to distinguish AMD hardware "bus/pci: optimize bus scan" broke axgbe on V1000/R1000. RV root complex pci device does not have any kernel driver assigned so it is removed from pci scan list which is used in "net/axgbe: add a HW quirk for register definitions". Get root complex device id directly from pci sysfs instead of pci scan list. Fixes: 991e0b1dbc4a (net/axgbe: add a HW quirk for register definitions) Cc: sta...@dpdk.org Signed-off-by: Chandubabu Namburu --- drivers/net/axgbe/axgbe_ethdev.c | 39 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/drivers/net/axgbe/axgbe_ethdev.c b/drivers/net/axgbe/axgbe_ethdev.c index 7d40c18a86..7b8d94ca3c 100644 --- a/drivers/net/axgbe/axgbe_ethdev.c +++ b/drivers/net/axgbe/axgbe_ethdev.c @@ -10,6 +10,8 @@ #include "axgbe_regs.h" #include "rte_time.h" +#include "eal_filesystem.h" + static int eth_axgbe_dev_init(struct rte_eth_dev *eth_dev); static int axgbe_dev_configure(struct rte_eth_dev *dev); static int axgbe_dev_start(struct rte_eth_dev *dev); @@ -2117,28 +2119,27 @@ static void axgbe_default_config(struct axgbe_port *pdata) pdata->power_down = 0; } -static int -pci_device_cmp(const struct rte_device *dev, const void *_pci_id) +/* + * Return PCI root complex device id on success else 0 */ static +uint16_t +get_pci_rc_devid(void) { - const struct rte_pci_device *pdev = RTE_DEV_TO_PCI_CONST(dev); - const struct rte_pci_id *pcid = _pci_id; + char pci_sysfs[PATH_MAX]; + const struct rte_pci_addr pci_rc_addr = {0, 0, 0, 0}; + unsigned long device_id; - if (pdev->id.vendor_id == AMD_PCI_VENDOR_ID && - pdev->id.device_id == pcid->device_id) - return 0; - return 1; -} + snprintf(pci_sysfs, sizeof(pci_sysfs), "%s/" PCI_PRI_FMT "/device", +rte_pci_get_sysfs_path(), pci_rc_addr.domain, +pci_rc_addr.bus, pci_rc_addr.devid, pci_rc_addr.function); -static bool -pci_search_device(int device_id) -{ - struct rte_bus *pci_bus; - struct rte_pci_id dev_id; + /* get device id */ + if (eal_parse_sysfs_value(pci_sysfs, &device_id) < 0) { + PMD_INIT_LOG(ERR, "Error in reading PCI sysfs\n"); + return 0; + } - dev_id.device_id = device_id; - pci_bus = rte_bus_find_by_name("pci"); - return (pci_bus != NULL) && - (pci_bus->find_device(NULL, pci_device_cmp, &dev_id) != NULL); + return (uint16_t)device_id; } /* @@ -2180,7 +2181,7 @@ eth_axgbe_dev_init(struct rte_eth_dev *eth_dev) /* * Use root complex device ID to differentiate RV AXGBE vs SNOWY AXGBE */ - if (pci_search_device(AMD_PCI_RV_ROOT_COMPLEX_ID)) { + if ((get_pci_rc_devid()) == AMD_PCI_RV_ROOT_COMPLEX_ID) { pdata->xpcs_window_def_reg = PCS_V2_RV_WINDOW_DEF; pdata->xpcs_window_sel_reg = PCS_V2_RV_WINDOW_SELECT; } else { -- 2.25.1
Re: [PATCH 0/1] mempool: implement index-based per core cache
On Sat, Jan 8, 2022 at 3:07 PM Morten Brørup wrote: > > > From: Bruce Richardson [mailto:bruce.richard...@intel.com] > > Sent: Friday, 7 January 2022 14.51 > > > > On Fri, Jan 07, 2022 at 12:29:23PM +0100, Morten Brørup wrote: > > > > From: Bruce Richardson [mailto:bruce.richard...@intel.com] > > > > Sent: Friday, 7 January 2022 12.16 > > > > > > > > On Sat, Dec 25, 2021 at 01:16:03AM +0100, Morten Brørup wrote: > > > > > > From: Dharmik Thakkar [mailto:dharmik.thak...@arm.com] Sent: > > > > Friday, 24 > > > > > > December 2021 23.59 > > > > > > > > > > > > Current mempool per core cache implementation stores pointers > > to > > > > mbufs > > > > > > On 64b architectures, each pointer consumes 8B This patch > > replaces > > > > it > > > > > > with index-based implementation, where in each buffer is > > addressed > > > > by > > > > > > (pool base address + index) It reduces the amount of > > memory/cache > > > > > > required for per core cache > > > > > > > > > > > > L3Fwd performance testing reveals minor improvements in the > > cache > > > > > > performance (L1 and L2 misses reduced by 0.60%) with no change > > in > > > > > > throughput > > > > > > > > > > > > Micro-benchmarking the patch using mempool_perf_test shows > > > > significant > > > > > > improvement with majority of the test cases > > > > > > > > > > > > > > > > I still think this is very interesting. And your performance > > numbers > > > > are > > > > > looking good. > > > > > > > > > > However, it limits the size of a mempool to 4 GB. As previously > > > > > discussed, the max mempool size can be increased by multiplying > > the > > > > index > > > > > with a constant. > > > > > > > > > > I would suggest using sizeof(uintptr_t) as the constant > > multiplier, > > > > so > > > > > the mempool can hold objects of any size divisible by > > > > sizeof(uintptr_t). > > > > > And it would be silly to use a mempool to hold objects smaller > > than > > > > > sizeof(uintptr_t). > > > > > > > > > > How does the performance look if you multiply the index by > > > > > sizeof(uintptr_t)? > > > > > > > > > > > > > Each mempool entry is cache aligned, so we can use that if we want > > a > > > > bigger > > > > multiplier. > > > > > > Thanks for chiming in, Bruce. > > > > > > Please also read this discussion about the multiplier: > > > http://inbox.dpdk.org/dev/calbae1prqyyog96f6ecew1vpf3toh1h7mzzuliy95z9xjbr...@mail.gmail.com/ > > > > > > > I actually wondered after I had sent the email whether we had indeed an > > option to disable the cache alignment or not! Thanks for pointing out > > that > > we do. This brings a couple additional thoughts: > > > > * Using indexes for the cache should probably be a runtime flag rather > > than > > a build-time one. > > * It would seem reasonable to me to disallow use of the indexed-cache > > flag > > and the non-cache aligned flag simultaneously. > > * On the offchance that that restriction is unacceptable, then we can > > make things a little more complicated by doing a runtime computation > > of > > the "index-shiftwidth" to use. > > > > Overall, I think defaulting to cacheline shiftwidth and disallowing > > index-based addressing when using unaligned buffers is simplest and > > easiest > > unless we can come up with a valid usecase for needing more than that. > > > > /Bruce > > This feature is a performance optimization. > > With that in mind, it should not introduce function pointers or similar > run-time checks or in the fast path, to determine what kind of cache to use > per mempool. And if an index multiplier is implemented, it should be a > compile time constant, probably something between sizeof(uintptr_t) or > RTE_MEMPOOL_ALIGN (=RTE_CACHE_LINE_SIZE). > > The patch comes with a tradeoff between better performance and limited > mempool size, and possibly some limitations regarding very small objects that > are not cache line aligned to avoid wasting memory > (RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ). > > With no multiplier, the only tradeoff is that the mempool size is limited to > 4 GB. > > If the multiplier is small (i.e. 8 bytes) the only tradeoff is that the > mempool size is limited to 32 GB. (And a waste of memory for objects smaller > than 8 byte; but I don't think anyone would use a mempool to hold objects > smaller than 8 byte.) > > If the multiplier is larger (i.e. 64 bytes cache line size), the mempool size > is instead limited to 256 GB, but RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ has no > effect. > > Note: 32 bit platforms have no benefit from this patch: The pointer already > only uses 4 bytes, so replacing the pointer with a 4 byte index makes no > difference. > > > Since this feature is a performance optimization only, and doesn't provide > any new features, I don't mind it being a compile time option. > > If this feature is a compile time option, and the mempool library is compiled > with the large multiplier, then RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ could be > made undefined in the p
Re: [PATCH 0/3] Wait for NPA pools to get filled
On Tue, Nov 30, 2021 at 11:39 AM Ashwin Sekhar T K wrote: > > NPA could take some time to reflect the pointers which > has been freed into pools. So, after populating a pool > with pointers, wait until the populated pointers are > reflected in the pool. Series Acked-by: Jerin Jacob Series applied to dpdk-next-net-mrvl/for-next-net. Thanks. > > Ashwin Sekhar T K (3): > common/cnxk: add support to wait for pool filling > common/cnxk: wait for sqb pool to fill > common/cnxk: wait for xaq pool to fill > > drivers/common/cnxk/roc_nix_queue.c | 10 ++ > drivers/common/cnxk/roc_npa.h | 26 ++ > drivers/common/cnxk/roc_sso.c | 9 + > 3 files changed, 45 insertions(+) > > -- > 2.32.0 >
Re: [PATCH v1 1/1] net/qede: fix redundant condition in debug code
On Tue, Nov 30, 2021 at 10:29 PM Anatoly Burakov wrote: > > Expression "a && 1" is equivalent to just "a", so fix the accidental > inclusion of a literal in code. > > Cc: sta...@dpdk.org > > Fixes: ec55c118792b ("net/qede: add infrastructure for debug data collection") > Cc: rm...@marvell.com Hi @Rasesh Mody @Devendra Singh Rawat @Igor Russkikh Please review this patch to merge. > > Signed-off-by: Anatoly Burakov > --- > > Notes: > This isn't a bug, this is just a syntactic anomaly, likely a remnant of > some > kind of debugging code. > > This issue was found with Control Flag [1], which i ran on DPDK codebase > just > out of curiosity. This was the only issue worth addressing that the tool > produced output for. > > [1] https://github.com/IntelLabs/control-flag > > drivers/net/qede/qede_debug.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/qede/qede_debug.c b/drivers/net/qede/qede_debug.c > index 2297d245c4..ba807ea680 100644 > --- a/drivers/net/qede/qede_debug.c > +++ b/drivers/net/qede/qede_debug.c > @@ -3522,7 +3522,7 @@ static enum dbg_status qed_grc_dump(struct ecore_hwfn > *p_hwfn, > > /* Dump MCP HW Dump */ > if (qed_grc_is_included(p_hwfn, DBG_GRC_PARAM_DUMP_MCP_HW_DUMP) && > - !qed_grc_get_param(p_hwfn, DBG_GRC_PARAM_NO_MCP) && 1) > + !qed_grc_get_param(p_hwfn, DBG_GRC_PARAM_NO_MCP)) > offset += qed_grc_dump_mcp_hw_dump(p_hwfn, >p_ptt, >dump_buf + offset, dump); > -- > 2.25.1 >
Re: [PATCH] mempool: optimize incomplete cache handling
On Fri, Jan 7, 2022 at 2:16 PM Morten Brørup wrote: > > > From: Jerin Jacob [mailto:jerinjac...@gmail.com] > > Sent: Thursday, 6 January 2022 17.55 > > > > On Thu, Jan 6, 2022 at 5:54 PM Morten Brørup > > wrote: > > > > > > A flush threshold for the mempool cache was introduced in DPDK > > version > > > 1.3, but rte_mempool_do_generic_get() was not completely updated back > > > then. > > > > > > The incompleteness did not cause any functional bugs, so this patch > > > could be considered refactoring for the purpose of cleaning up. > > > > > > This patch completes the update of rte_mempool_do_generic_get() as > > > follows: > > > > > > 1. A few comments were malplaced or no longer correct. > > > Some comments have been updated/added/corrected. > > > > > > 2. The code that initially screens the cache request was not updated. > > > The initial screening compared the request length to the cache size, > > > which was correct before, but became irrelevant with the introduction > > of > > > the flush threshold. E.g. the cache can hold up to flushthresh > > objects, > > > which is more than its size, so some requests were not served from > > the > > > cache, even though they could be. > > > The initial screening has now been corrected to match the initial > > > screening in rte_mempool_do_generic_put(), which verifies that a > > cache > > > is present, and that the length of the request does not overflow the > > > memory allocated for the cache. > > > > > > 3. The code flow for satisfying the request from the cache was weird. > > > The likely code path where the objects are simply served from the > > cache > > > was treated as unlikely; now it is treated as likely. > > > And in the code path where the cache was backfilled first, numbers > > were > > > added and subtracted from the cache length; now this code path simply > > > sets the cache length to its final value. > > > > > > 4. The objects were returned in reverse order. > > > Returning the objects in reverse order is not necessary, so > > rte_memcpy() > > > is now used instead. > > > > Have you checked the performance with network workload? > > IMO, reverse order makes sense(LIFO vs FIFO). > > The LIFO makes the cache warm as the same buffers are reused > > frequently. > > I have not done any performance testing. We probably agree that the only > major difference lies in how the objects are returned. And we probably also > agree that rte_memcpy() is faster than the copy loop it replaced, especially > when n is constant at compile time. So the performance difference mainly > depends on the application, which I will discuss below. > > Let's first consider LIFO vs. FIFO. > > The key argument for the rte_memcpy() optimization is that we are still > getting the burst of objects from the top of the stack (LIFO); only the order > of the objects inside the burst is not reverse anymore. > > Here is an example: > > The cache initially contains 8 objects: 01234567. > > 8 more objects are put into the cache: 89ABCDEF. > > The cache now holds: 0123456789ABCDEF. Agree. However I think, it may matter with less sized L1 cache machines and burst size is more where it plays role what can be in L1 with the scheme. I would suggest splitting each performance improvement as a separate patch for better tracking and quantity of the performance improvement. I think, mempool performance test and tx only stream mode in testpmd can quantify patches. > > Getting 4 objects from the cache gives us CDEF instead of FEDC, i.e. we are > still getting the 4 objects most recently put into the cache. > > Furthermore, if the application is working with fixed size bursts, it will > usually put and get the same size burst, i.e. put the burst 89ABCDEF into the > cache, and then get the burst 89ABCDEF from the cache again. > > > Here is an example unfavorable scenario: > > The cache initially contains 4 objects, which have gone cold: 0123. > > 4 more objects, which happen to be hot, are put into the cache: 4567. > > Getting 8 objects from the cache gives us 01234567 instead of 76543210. > > Now, if the application only processes the first 4 of the 8 objects in the > burst, it would have benefitted from those objects being the hot 7654 objects > instead of the cold 0123 objects. > > However, I think that most applications process the complete burst, so I do > consider this scenario unlikely. > > Similarly, a pipelined application doesn't process objects in reverse order > at every other step in the pipeline, even though the previous step in the > pipeline most recently touched the last object of the burst. > > > My overall conclusion was that the benefit of using rte_memcpy() outweighs > the disadvantage of the unfavorable scenario, because I consider the > probability of the unfavorable scenario occurring very low. But again: it > mainly depends on the application. > > If anyone disagrees with the risk analysis described above, I will happily > provide a version 2 of the patc
RE: [EXT] Re: [PATCH v1 1/1] net/qede: fix redundant condition in debug code
> -Original Message- > From: Jerin Jacob > Sent: Monday, January 10, 2022 12:50 PM > To: Anatoly Burakov > Cc: dpdk-dev ; Rasesh Mody ; > Devendra Singh Rawat ; Igor Russkikh > ; dpdk stable > Subject: [EXT] Re: [PATCH v1 1/1] net/qede: fix redundant condition in debug > code > > External Email > > -- > On Tue, Nov 30, 2021 at 10:29 PM Anatoly Burakov > wrote: > > > > Expression "a && 1" is equivalent to just "a", so fix the accidental > > inclusion of a literal in code. > > > > Cc: sta...@dpdk.org > > > > Fixes: ec55c118792b ("net/qede: add infrastructure for debug data > collection") > > Cc: rm...@marvell.com > > Hi @Rasesh Mody @Devendra Singh Rawat @Igor Russkikh > > Please review this patch to merge. > > > > > > Signed-off-by: Anatoly Burakov > > --- > > > > Notes: > > This isn't a bug, this is just a syntactic anomaly, likely a remnant of > > some > > kind of debugging code. > > > > This issue was found with Control Flag [1], which i ran on DPDK codebase > just > > out of curiosity. This was the only issue worth addressing that the tool > > produced output for. > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https- > 3A__github.com_IntelLabs_control- > 2Dflag&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=iWi6e5iy4mFBFqTG1Ch > Syc_IBZ4tMsC6Dc-2s-Bibs0&m=WqaBCeO-fo40cdvFLvtypkt6kdMjkQjWb5ye- > OKaU6_NyhRAIa826WfjkJPsGWzE&s=HCbK75ncfGCoqDF4S3h8Ik_HVKwTzN8 > bL4DOGgILGeE&e= > > > > drivers/net/qede/qede_debug.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/qede/qede_debug.c > b/drivers/net/qede/qede_debug.c > > index 2297d245c4..ba807ea680 100644 > > --- a/drivers/net/qede/qede_debug.c > > +++ b/drivers/net/qede/qede_debug.c > > @@ -3522,7 +3522,7 @@ static enum dbg_status qed_grc_dump(struct > ecore_hwfn *p_hwfn, > > > > /* Dump MCP HW Dump */ > > if (qed_grc_is_included(p_hwfn, > DBG_GRC_PARAM_DUMP_MCP_HW_DUMP) && > > - !qed_grc_get_param(p_hwfn, DBG_GRC_PARAM_NO_MCP) && 1) > > + !qed_grc_get_param(p_hwfn, DBG_GRC_PARAM_NO_MCP)) > > offset += qed_grc_dump_mcp_hw_dump(p_hwfn, > >p_ptt, > >dump_buf + offset, dump); > > -- > > 2.25.1 > > Thanks. Acked-by: Devendra Singh Rawat