On 2/14/2023 3:48 PM, Jiawei Wang wrote:
> When multiple ports are aggregated into a single DPDK port,
> (example: Linux bonding, DPDK bonding, failsafe, etc.),
> we want to know which port use for Tx via a queue.
> 
> This patch introduces the new ethdev API
> rte_eth_dev_map_aggr_tx_affinity(), it's used to map a Tx queue
> with an aggregated port of the DPDK port (specified with port_id),
> The affinity is the number of the aggregated port.
> Value 0 means no affinity and traffic could be routed to any
> aggregated port, this is the default current behavior.
> 
> The maximum number of affinity is given by rte_eth_dev_count_aggr_ports().
> 
> Add the trace point for ethdev rte_eth_dev_count_aggr_ports()
> and rte_eth_dev_map_aggr_tx_affinity() functions.
> 
> Add the testpmd command line:
> testpmd> port config (port_id) txq (queue_id) affinity (value)
> 
> For example, there're two physical ports connected to
> a single DPDK port (port id 0), and affinity 1 stood for
> the first physical port and affinity 2 stood for the second
> physical port.
> Use the below commands to config tx phy affinity for per Tx Queue:
>         port config 0 txq 0 affinity 1
>         port config 0 txq 1 affinity 1
>         port config 0 txq 2 affinity 2
>         port config 0 txq 3 affinity 2
> 
> These commands config the Tx Queue index 0 and Tx Queue index 1 with
> phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets,
> these packets will be sent from the first physical port, and similar
> with the second physical port if sending packets with Tx Queue 2
> or Tx Queue 3.
> 
> Signed-off-by: Jiawei Wang <jiaw...@nvidia.com>

<...>

> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index dc0a4eb12c..1d5b3a16b2 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -6915,6 +6915,55 @@ rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t 
> port_id, uint32_t *ptypes
>       return j;
>  }
>  
> +int rte_eth_dev_count_aggr_ports(uint16_t port_id)
> +{
> +     struct rte_eth_dev *dev;
> +     int ret;
> +
> +     RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +     dev = &rte_eth_devices[port_id];
> +
> +     if (*dev->dev_ops->count_aggr_ports == NULL)
> +             return -ENOTSUP;

What do you think to return a default value when dev_ops is not defined,
assuming device is not a bounded device.
Not sure which one is better for application, return a default value or
error.


> +     ret = eth_err(port_id, (*dev->dev_ops->count_aggr_ports)(port_id));
> +
> +     rte_eth_trace_count_aggr_ports(port_id, ret);
> +
> +     return ret;
> +}
> +
> +int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
> +                                  uint8_t affinity)
> +{
> +     struct rte_eth_dev *dev;
> +     int ret;
> +
> +     RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +     dev = &rte_eth_devices[port_id];
> +
> +     if (tx_queue_id >= dev->data->nb_tx_queues) {
> +             RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u\n", tx_queue_id);
> +             return -EINVAL;
> +     }
> +

Although documentation says this API should be called before configure,
if user misses it I guess above can crash, is there a way to add runtime
check, like checking 'dev->data->dev_configured'?


> +     if (*dev->dev_ops->map_aggr_tx_affinity == NULL)
> +             return -ENOTSUP;
> +
> +     if (dev->data->dev_started) {
> +             RTE_ETHDEV_LOG(ERR,
> +                     "Port %u must be stopped to allow configuration\n",
> +                     port_id);
> +             return -EBUSY;
> +     }
> +
> +     ret = eth_err(port_id, (*dev->dev_ops->map_aggr_tx_affinity)(port_id,
> +                             tx_queue_id, affinity));
> +

Should API check if port_id is a bonding port before it continue with
mapping?

> +     rte_eth_trace_map_aggr_tx_affinity(port_id, tx_queue_id, affinity, ret);
> +
> +     return ret;
> +}
> +
>  RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
>  
>  RTE_INIT(ethdev_init_telemetry)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c129ca1eaf..07b8250eb8 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -2589,6 +2589,52 @@ int rte_eth_hairpin_bind(uint16_t tx_port, uint16_t 
> rx_port);
>  __rte_experimental
>  int rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port);
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get the number of aggregated ports.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @return
> + *   - (>=0) the number of aggregated port if success.
> + *   - (-ENOTSUP) if not supported.
> + */
> +__rte_experimental
> +int rte_eth_dev_count_aggr_ports(uint16_t port_id);


Can you please give more details in the function description, in the
context of this patch it is clear, but someone sees it first time can be
confused what is "aggregated ports" is.

What is expected value for regular pysical port, that doesn't have any
sub-devices, 0 or 1? Can you please document?


> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + *  Map a Tx queue with an aggregated port of the DPDK port (specified with 
> port_id).
> + *  When multiple ports are aggregated into a single one,
> + *  it allows to choose which port to use for Tx via a queue.
> + *
> + *  The application should use rte_eth_dev_map_aggr_tx_affinity()
> + *  after rte_eth_dev_configure(), rte_eth_tx_queue_setup(), and
> + *  before rte_eth_dev_start().
> + *
> + * @param port_id
> + *   The identifier of the port used in rte_eth_tx_burst().
> + * @param tx_queue_id
> + *   The index of the transmit queue used in rte_eth_tx_burst().
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param affinity
> + *   The number of the aggregated port.
> + *   Value 0 means no affinity and traffic could be routed to any aggregated 
> port.
> + *   The first aggregated port is number 1 and so on.
> + *   The maximum number is given by rte_eth_dev_count_aggr_ports().
> + *
> + * @return
> + *   Zero if successful. Non-zero otherwise.
> + */
> +__rte_experimental
> +int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
> +                                  uint8_t affinity);
> +
>  /**
>   * Return the NUMA socket to which an Ethernet device is connected
>   *
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index dbc2bffe64..685aa71e51 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -300,6 +300,8 @@ EXPERIMENTAL {
>       rte_mtr_meter_profile_get;
>  
>       # added in 23.03
> +     rte_eth_dev_count_aggr_ports;
> +     rte_eth_dev_map_aggr_tx_affinity;
>       rte_flow_async_create_by_index;
>  };
>  

Reply via email to