RE: [PATCH v2] net/i40e: remove redundant judgment

2023-04-27 Thread Zhang, Qi Z



> -Original Message-
> From: Feifei Wang 
> Sent: Tuesday, March 28, 2023 3:28 PM
> To: Richardson, Bruce ; Konstantin Ananyev
> ; Zhang, Yuying
> ; Xing, Beilei ; David
> Christensen ; Ruifeng Wang
> 
> Cc: dev@dpdk.org; n...@arm.com; Feifei Wang ;
> Honnappa Nagarahalli 
> Subject: [PATCH v2] net/i40e: remove redundant judgment
> 
> Merged variable updates under the same condition. It reduces branch.
> 
> In ampere-altra, there is no performance improvement with this patch.
> In x86 sse and avx2 path, there is also no performance improvement.

Thanks for sharing the results. While the code implements some best practices, 
such as reducing branching and adding compiler hints, which should generally 
improve performance, it's not necessary to highlight that it didn't provide 
benefits on certain specific platforms.

Would it be ok to remove the last two lines when merging the patch?

Otherwise
Acked-by: Qi Zhang 


> 
> 
> v2:
> 1. add change for avx and altivec path.
> 
> Suggested-by: Honnappa Nagarahalli 
> Signed-off-by: Feifei Wang 
> Reviewed-by: Ruifeng Wang 
> ---
>  drivers/net/i40e/i40e_rxtx_common_avx.h  | 9 +
> drivers/net/i40e/i40e_rxtx_vec_altivec.c | 9 +
>  drivers/net/i40e/i40e_rxtx_vec_neon.c| 9 +
>  drivers/net/i40e/i40e_rxtx_vec_sse.c | 9 +
>  4 files changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h
> b/drivers/net/i40e/i40e_rxtx_common_avx.h
> index cfc1e63173..85958d6c81 100644
> --- a/drivers/net/i40e/i40e_rxtx_common_avx.h
> +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h
> @@ -198,14 +198,15 @@ i40e_rxq_rearm_common(struct i40e_rx_queue
> *rxq, __rte_unused bool avx512)  #endif
> 
>   rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH;
> - if (rxq->rxrearm_start >= rxq->nb_rx_desc)
> + rx_id = rxq->rxrearm_start - 1;
> +
> + if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) {
>   rxq->rxrearm_start = 0;
> + rx_id = rxq->nb_rx_desc - 1;
> + }
> 
>   rxq->rxrearm_nb -= RTE_I40E_RXQ_REARM_THRESH;
> 
> - rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
> -  (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1));
> -
>   /* Update the tail pointer on the NIC */
>   I40E_PCI_REG_WC_WRITE(rxq->qrx_tail, rx_id);  } diff --git
> a/drivers/net/i40e/i40e_rxtx_vec_altivec.c
> b/drivers/net/i40e/i40e_rxtx_vec_altivec.c
> index 2dfa04599c..8672ad1c41 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c
> +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c
> @@ -89,14 +89,15 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)
>   }
> 
>   rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH;
> - if (rxq->rxrearm_start >= rxq->nb_rx_desc)
> + rx_id = rxq->rxrearm_start - 1;
> +
> + if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) {
>   rxq->rxrearm_start = 0;
> + rx_id = rxq->nb_rx_desc - 1;
> + }
> 
>   rxq->rxrearm_nb -= RTE_I40E_RXQ_REARM_THRESH;
> 
> - rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
> -  (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1));
> -
>   /* Update the tail pointer on the NIC */
>   I40E_PCI_REG_WRITE(rxq->qrx_tail, rx_id);  } diff --git
> a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> index 12e6f1cbcb..49391fe4c7 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> @@ -64,14 +64,15 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)
>   }
> 
>   rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH;
> - if (rxq->rxrearm_start >= rxq->nb_rx_desc)
> + rx_id = rxq->rxrearm_start - 1;
> +
> + if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) {
>   rxq->rxrearm_start = 0;
> + rx_id = rxq->nb_rx_desc - 1;
> + }
> 
>   rxq->rxrearm_nb -= RTE_I40E_RXQ_REARM_THRESH;
> 
> - rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
> -  (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1));
> -
>   rte_io_wmb();
>   /* Update the tail pointer on the NIC */
>   I40E_PCI_REG_WRITE_RELAXED(rxq->qrx_tail, rx_id); diff --git
> a/drivers/net/i40e/i40e_rxtx_vec_sse.c
> b/drivers/net/i40e/i40e_rxtx_vec_sse.c
> index bdc979a839..baf83cb3df 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c
> +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c
> @@ -77,14 +77,15 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)
>   }
> 
>   rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH;
> - if (rxq->rxrearm_start >= rxq->nb_rx_desc)
> + rx_id = rxq->rxrearm_start - 1;
> +
> + if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) {
>   rxq->rxrearm_start = 0;
> + rx_id = rxq->nb_rx_desc - 1;
> + }
> 
>   rxq->rxrearm_nb -= RTE_I40E_RXQ_REARM_THRESH;
> 
> - rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
> -  (rxq->nb_rx_desc - 1) : (rxq->rxrea

RE: 22.11.2 patches review and test

2023-04-27 Thread Xu, HailinX


> -Original Message-
> From: Xueming Li 
> Sent: Sunday, April 23, 2023 5:34 PM
> To: sta...@dpdk.org
> Cc: xuemi...@nvidia.com; dev@dpdk.org; Abhishek Marathe
> ; Ali Alnubani ;
> Walker, Benjamin ; David Christensen
> ; Hemant Agrawal ;
> Stokes, Ian ; Jerin Jacob ;
> Mcnamara, John ; Ju-Hyoung Lee
> ; Kevin Traynor ; Luca
> Boccassi ; Pei Zhang ; Xu, Qian Q
> ; Raslan Darawsheh ; Thomas
> Monjalon ; Yanghang Liu ;
> Peng, Yuan ; Chen, Zhaoyan
> 
> Subject: 22.11.2 patches review and test
> 
> Hi all,
> 
> Here is a list of patches targeted for stable release 22.11.2.
> 
> The planned date for the final release is 5th MAY.
> 
> Please help with testing and validation of your use cases and report any
> issues/results with reply-all to this mail. For the final release the fixes 
> and
> reported validations will be added to the release notes.
> 
> A release candidate tarball can be found at:
> 
> https://dpdk.org/browse/dpdk-stable/tag/?id=v22.11.2-rc1
> 
> These patches are located at branch 22.11 of dpdk-stable repo:
> https://dpdk.org/browse/dpdk-stable/
> 
> Thanks.
> 
> Xueming Li 

Update the test status for Intel part. Till now dpdk22.11.2-rc1 validation test 
rate is 85%. No critical issue is found.
# Basic Intel(R) NIC testing
* Build & CFLAG compile: cover the build test combination with latest GCC/Clang 
version and the popular OS revision such as
  Ubuntu20.04, Ubuntu22.04, Fedora35, Fedora37, RHEL8.6, RHEL8.4, FreeBSD13.1, 
SUSE15, CentOS7.9, openEuler22.03-SP1  etc.
- All test done. No new dpdk issue is found.
* PF(i40e, ixgbe): test scenarios including RTE_FLOW/TSO/Jumboframe/checksum 
offload/VLAN/VXLAN, etc. 
- All test done. No new dpdk issue is found.
* VF(i40e, ixgbe): test scenarios including VF-RTE_FLOW/TSO/Jumboframe/checksum 
offload/VLAN/VXLAN, etc.
- All test done. No new dpdk issue is found.
* PF/VF(ice): test scenarios including Switch features/Package Management/Flow 
Director/Advanced Tx/Advanced RSS/ACL/DCF/Flexible Descriptor, etc.
- All test done. No new dpdk issue is found.
* Intel NIC single core/NIC performance: test scenarios including PF/VF single 
core performance test, etc.
- All test done. No new dpdk issue is found.
* IPsec: test scenarios including ipsec/ipsec-gw/ipsec library basic test - 
QAT&SW/FIB library, etc.
- Execution rate is 90%. No new dpdk issue is found.

# Basic cryptodev and virtio testing
* Virtio: both function and performance test are covered. Such as 
PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf testing/VMAWARE 
ESXI 8.0, etc.
- All test done. No new dpdk issue is found.
* Cryptodev: 
  *Function test: test scenarios including Cryptodev API testing/CompressDev 
ISA-L/QAT/ZLIB PMD Testing/FIPS, etc.
- Execution rate is 75%. No new dpdk issue is found.
  *Performance test: test scenarios including Thoughput Performance/Cryptodev 
Latency, etc.
- All test done. No new dpdk issue is found.

Regards,
Xu, Hailin
 



Re: [RFC] app/testpmd: use RSS conf from software when configuring DCB

2023-04-27 Thread zhoumin

Kindly ping.

Any comments or suggestions will be appreciated.

Best regards
Min


On 2023/4/12 下午5:52, Min Zhou wrote:

In the testpmd command, we have to stop the port firstly before configuring
the DCB. However, some PMDs may execute a hardware reset during the port
stop, such as ixgbe. Some kind of reset operations of PMD could clear the
configurations of RSS in the hardware register. This would cause the loss
of RSS configurations that were set during the testpmd initialization. As
a result, I find that I cannot enable RSS and DCB at the same time in the
testpmd command when using Intel 82599 NIC.

Although this patch can solve the problem I encountered, is there any risk
of using rss conf from software instead of reading from the hardware
register when configuring DCB?

Signed-off-by: Min Zhou 
---
  app/test-pmd/testpmd.c | 11 +--
  1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5cb6f92523..3c382267b8 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -4247,14 +4247,12 @@ const uint16_t vlan_tags[] = {
  };
  
  static  int

-get_eth_dcb_conf(portid_t pid, struct rte_eth_conf *eth_conf,
+get_eth_dcb_conf(portid_t pid __rte_unused, struct rte_eth_conf *eth_conf,
 enum dcb_mode_enable dcb_mode,
 enum rte_eth_nb_tcs num_tcs,
 uint8_t pfc_en)
  {
uint8_t i;
-   int32_t rc;
-   struct rte_eth_rss_conf rss_conf;
  
  	/*

 * Builds up the correct configuration for dcb+vt based on the vlan 
tags array
@@ -4296,12 +4294,6 @@ get_eth_dcb_conf(portid_t pid, struct rte_eth_conf 
*eth_conf,
struct rte_eth_dcb_tx_conf *tx_conf =
ð_conf->tx_adv_conf.dcb_tx_conf;
  
-		memset(&rss_conf, 0, sizeof(struct rte_eth_rss_conf));

-
-   rc = rte_eth_dev_rss_hash_conf_get(pid, &rss_conf);
-   if (rc != 0)
-   return rc;
-
rx_conf->nb_tcs = num_tcs;
tx_conf->nb_tcs = num_tcs;
  
@@ -4313,7 +4305,6 @@ get_eth_dcb_conf(portid_t pid, struct rte_eth_conf *eth_conf,

eth_conf->rxmode.mq_mode =
(enum rte_eth_rx_mq_mode)
(rx_mq_mode & RTE_ETH_MQ_RX_DCB_RSS);
-   eth_conf->rx_adv_conf.rss_conf = rss_conf;
eth_conf->txmode.mq_mode = RTE_ETH_MQ_TX_DCB;
}
  




Re: [PATCH] net/ixgbe: consider DCB/VMDq conf when getting RSS conf

2023-04-27 Thread zhoumin

Kindly ping.

Any comments or suggestions will be appreciated.

Best regards
Min


On Wed, Apr 12, 2023 at 6:01PM, Min Zhou wrote:

The mrqe field of MRQC register is an enum. From the Intel 82599 datasheet,
we know that these values below for the mrqe field are all related to RSS
configuration:
b = RSS disabled.
0001b = RSS only -- Single set of RSS 16 queues.
0010b = DCB enabled and RSS disabled -- 8 TCs, each allocated 1 queue.
0011b = DCB enabled and RSS disabled -- 4 TCs, each allocated 1 queue.
0100b = DCB and RSS -- 8 TCs, each allocated 16 RSS queues.
0101b = DCB and RSS -- 4 TCs, each allocated 16 RSS queues.
1000b = Virtualization only -- 64 pools, no RSS, each pool allocated
2 queues.
1010b = Virtualization and RSS -- 32 pools, each allocated 4 RSS queues.
1011b = Virtualization and RSS -- 64 pools, each allocated 2 RSS queues.

The ixgbe pmd will check whether the rss is enabled or not when getting
rss conf. So, beside comparing the value of mrqe field with xxx0b and
xxx1b, we also needto consider the other configurations, such as
DCB + RSS or VMDQ + RSS. Otherwise, we may not get the correct rss conf
in some cases, such as when we use DCB and RSS with 8 TCs which corresponds
to 0100b for the mrqe field.

Signed-off-by: Min Zhou 
---
  drivers/net/ixgbe/ixgbe_rxtx.c | 91 ++
  1 file changed, 80 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index c9d6ca9efe..1eff0053ed 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -3461,18 +3461,89 @@ static uint8_t rss_intel_key[40] = {
0x6A, 0x42, 0xB7, 0x3B, 0xBE, 0xAC, 0x01, 0xFA,
  };
  
+/*

+ * This function removes the rss configuration in the mrqe field of MRQC
+ * register and tries to maintain other configurations in the field, such
+ * DCB and Virtualization.
+ *
+ * The MRQC register supplied in section 7.1.2.8.3 of the Intel 82599 
datasheet.
+ * From the datasheet, we know that the mrqe field is an enum. So, masking the
+ * mrqe field with '~IXGBE_MRQC_RSSEN' may not completely disable rss
+ * configuration. For example, the value of mrqe is equal to 0101b when DCB and
+ * RSS with 4 TCs configured, however 'mrqe &= ~0x01' is equal to 0100b which
+ * corresponds to DCB and RSS with 8 TCs.
+ */
+static void
+ixgbe_mrqc_rss_remove(struct ixgbe_hw *hw)
+{
+   uint32_t mrqc;
+   uint32_t mrqc_reg;
+   uint32_t mrqe_val;
+
+   mrqc_reg = ixgbe_mrqc_reg_get(hw->mac.type);
+   mrqc = IXGBE_READ_REG(hw, mrqc_reg);
+   mrqe_val = mrqc & IXGBE_MRQC_MRQE_MASK;
+
+   switch (mrqe_val) {
+   case IXGBE_MRQC_RSSEN:
+   /* Completely disable rss */
+   mrqe_val = 0;
+   break;
+   case IXGBE_MRQC_RTRSS8TCEN:
+   mrqe_val = IXGBE_MRQC_RT8TCEN;
+   break;
+   case IXGBE_MRQC_RTRSS4TCEN:
+   mrqe_val = IXGBE_MRQC_RT4TCEN;
+   break;
+   case IXGBE_MRQC_VMDQRSS64EN:
+   /* FIXME. Can 32 pools with rss convert to 64 pools without rss? */
+   case IXGBE_MRQC_VMDQRSS32EN:
+   mrqe_val = IXGBE_MRQC_VMDQEN;
+   break;
+   default:
+   /* No rss configured, leave it as it is */
+   break;
+   }
+   mrqc = (mrqc & ~IXGBE_MRQC_MRQE_MASK) | mrqe_val;
+   IXGBE_WRITE_REG(hw, mrqc_reg, mrqc);
+}
+
  static void
  ixgbe_rss_disable(struct rte_eth_dev *dev)
  {
struct ixgbe_hw *hw;
+
+   hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   /* Remove the rss configuration and maintain the other configurations */
+   ixgbe_mrqc_rss_remove(hw);
+}
+
+/*
+ * This function checks whether the rss is enabled or not by comparing the mrqe
+ * field with some RSS related enums and also considers the configurations for
+ * DCB + RSS and Virtualization + RSS. It is necessary for getting the correct
+ * rss hash configurations from the RSS Field Enable field of MRQC register
+ * when both RSS and DCB/VMDQ are used.
+ */
+static bool
+ixgbe_rss_enabled(struct ixgbe_hw *hw)
+{
uint32_t mrqc;
uint32_t mrqc_reg;
+   uint32_t mrqe_val;
  
-	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);

mrqc_reg = ixgbe_mrqc_reg_get(hw->mac.type);
mrqc = IXGBE_READ_REG(hw, mrqc_reg);
-   mrqc &= ~IXGBE_MRQC_RSSEN;
-   IXGBE_WRITE_REG(hw, mrqc_reg, mrqc);
+   mrqe_val = mrqc & IXGBE_MRQC_MRQE_MASK;
+
+   if (mrqe_val == IXGBE_MRQC_RSSEN ||
+   mrqe_val == IXGBE_MRQC_RTRSS8TCEN ||
+   mrqe_val == IXGBE_MRQC_RTRSS4TCEN ||
+   mrqe_val == IXGBE_MRQC_VMDQRSS64EN ||
+   mrqe_val == IXGBE_MRQC_VMDQRSS32EN)
+   return true;
+
+   return false;
  }
  
  static void

@@ -3530,9 +3601,7 @@ ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,

Re: [PATCH next 4/7] vmxnet3: add command to set ring buffer sizes

2023-04-27 Thread Ferruh Yigit
On 4/26/2023 6:27 PM, Ronak Doshi wrote:
> 
> 
> On 4/26/23, 9:58 AM, "Ferruh Yigit"  > wrote:
> 
>> As far as I can see these "vmware_pack_begin.h" & "vmware_pack_end.h"
>> has only file license comment, and I can see this is used in a few other
>> type declaration.
>>
>> What is the reasoning behind using these headers?
> 
> This has been the case since driver was added to dpdk. The information is 
> present in README.
> I did not want to change the format being used for the declarations in the 
> driver.
> 

README doesn't say much.

The usage is not standard, and intention is not clear.
Can you please dig this issue more to learn the the intention, may be we
can find a better way or get rid of them completely?




Just to record, usage is:
 typedef
 #include "vmware_pack_begin.h"
 struct Vmxnet3_RingBufferSize {
__le16  ring1BufSizeType0;
__le16  ring1BufSizeType1;
__le16  ring2BufSizeType1;
__le16  pad;
 }
 #include "vmware_pack_end.h"
 Vmxnet3_RingBufferSize;


Content of "vmware_pack_begin.h":
```
/* SPDX-License-Identifier: BSD-3-Clause
 * Copyright(c) 2010-2014 Intel Corporation
 */
```

"vmware_pack_end.h":
```
/* SPDX-License-Identifier: BSD-3-Clause
 * Copyright(c) 2010-2014 Intel Corporation
 */
```


Re: Event device early back-pressure indication

2023-04-27 Thread Mattias Rönnblom
On 2023-04-19 13:06, Jerin Jacob wrote:
> On Mon, Apr 17, 2023 at 9:06 PM Mattias Rönnblom
>  wrote:
>>
>> On 2023-04-17 14:52, Jerin Jacob wrote:
>>> On Thu, Apr 13, 2023 at 12:24 PM Mattias Rönnblom
>>>  wrote:


 void
 rte_event_return_new_credits(...);

 Thoughts?
>>>
>>> I see the following cons on this approach.
>>>
>>
>> Does the use case in my original e-mail seem like a reasonable one to
>> you? If yes, is there some way one could solve this problem with a
>> clever use of the current Eventdev API? That would obviously be preferable.
> 
> I think, the use case is reasonable. For me, most easy path to achieve
> the functionality
> is setting rte_event_dev_config::nb_events_limit as for a given
> application always
> targeted to work X number of packets per second. Giving that upfront
> kind of make
> life easy for application writers and drivers at the cost of
> allocating required memory.
> 

Could you unpack that a little? How would you derive the nb_events_limit 
from the targeted pps throughput? In my world, they are pretty much 
orthogonal. nb_events_limit just specifies the maximum number of 
buffered events (i.e., events/packets in-flight in the pipeline).

Are you thinking about a system where you do input rate shaping (e.g, on 
the aggregate flow, e.g., from NIC+timer wheel), to some fixed rate? A 
rate you know with some reasonable certainty can be sustained.

Most non-trivial applications will vary in capacity depending on packet 
size, number of flows, types of flows, flow life time, non-packet 
processing cache or DDR pressure, etc.

In any system where you never risk accepting new items of work at a 
higher pace than the system is able to finish them, any mechanism 
designed to help you deal with work scheduler back pressure (at the 
point of new event enqueue) is pointless of course.

Or maybe you are you thinking of a system where the EAL threads almost 
never enqueues new events (only forward-type events)? In other words, a 
system where NIC and/or timer hardware is the source of almost-all new 
work, and both of those are tightly integrated with the event device?

>>
>>> # Adding multiple APIs in fast path to driver layer may not
>>> performance effective solution.
>>
>> For event devices with a software-managed credit system, pre-allocation
>> would be very cheap. And, if an application prefer to handle back
>> pressure after-the-fact, that option is still available.
> 
> I am worried about exposing PMD calls and application starts calling per 
> packet,
> e.s.p with burst size = 1 for latency critical applications.
> 
> 
>>
>>> # At least for cnxk HW, credits are for device, not per port. So cnxk
>>> HW implementation can not use this scheme.
>>>
>>
>> DSW's credit pool is also per device, but are cached on a per-port
>> basis. Does cnxk driver rely on the hardware to signal "new event" back
>> pressure? (From the driver code it looks like that is the case.)
> 
> Yes. But we can not really cache it per port without introducing
> complex atomic logic.
> 

You could defer back pressure management to software altogether. If you 
trade some accuracy (in terms of exactly how many in-flight events are 
allowed), the mechanism is both pretty straight-forward to implement and 
cycle-efficient.

>>
>>> Alternative solution could be, adding new flag for
>>> rte_enqueue_new_burst(), where drivers waits until credit is available
>>> to reduce the application overhead > and support in different HW 
>>> implementations if this use case critical.
>>>
>>>#define RTE_EVENT_FLAG_WAIT_TILL_CREDIT_AVILABLE (UINT32_C(1) << 0)
>>>
>>>
>>
>> This solution only works if the event device is the only source of work
>> for the EAL thread. That is a really nice model, but I wouldn't trust on
>> that to always be the case.
> 
> For non EAL thread, I am assuming it is HW event adapter kind of case,

What case is this? I think we can leave out non-EAL threads (registered 
threads, unregistered can't even call into the Eventdev API), since to 
the extent the use an event device, it will be very limited.

> In such case, they don't need to wait. I think, for SW EAL threads case only
> we need to wait as application is expecting to make sure wait till
> the credit is available to avoid error handling in application.
> 

That sounds potentially very wasteful, if the time it has to wait is 
long. In the worst case, if all lcores hit this limit at the same time, 
the result is a deadlock, where every thread waits for some other 
threads to finish off enough work from the pipeline's backlog, to make 
the in-flight events go under the new_event_threshold.

>>
>> Also, there may be work that should only be performed, if the system is
>> not under very high load. Credits being available, especially combined
>> with a flexible new even threshold would be an indicator.
>>
>> Another way would be to just provide an API call that gave an indication
>> of a particular threshold has been reached (

Re: [PATCH next 0/7] vmxnet3: upgrade to version 7

2023-04-27 Thread Ferruh Yigit
On 4/26/2023 7:33 PM, Ronak Doshi wrote:
> 
> > On 4/26/23, 11:16 AM, "Ferruh Yigit"  > wrote:
>>
>> btw, while checking the driver documentation, it seems it is not updated
>> for a while, the paper it refers is old.
>> Do you have any newer version of documentation to reference?
> 
> I don’t think we have any other document for perf, but let me update the 
> document with some
> details regarding features supported. Also, I will update release notes for 
> these patches.
> 

That document focuses on 1G and 10G network speeds, and uses "AMD
Opteron 2384 Processors" (which seems discontinued at this point).
I would expect there is an up to date version of the document, but if
there isn't is the existing one still relevant to keep?



RE: [PATCH v2] common/idpf: remove unnecessary compile option

2023-04-27 Thread Xing, Beilei



> -Original Message-
> From: Zhang, Qi Z 
> Sent: Wednesday, April 26, 2023 11:39 PM
> To: Xing, Beilei 
> Cc: dev@dpdk.org; Zhang, Qi Z 
> Subject: [PATCH v2] common/idpf: remove unnecessary compile option
> 
> Remove compile option "__KERNEL" which should not be considered in DPDK.
> Also only #include  in idpf_osdep.h.
> 
> Signed-off-by: Qi Zhang 

Acked-by: Beilei Xing 


RE: [PATCH v2] common/idpf: remove unnecessary compile option

2023-04-27 Thread Zhang, Qi Z



> -Original Message-
> From: Xing, Beilei 
> Sent: Thursday, April 27, 2023 5:36 PM
> To: Zhang, Qi Z 
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v2] common/idpf: remove unnecessary compile option
> 
> 
> 
> > -Original Message-
> > From: Zhang, Qi Z 
> > Sent: Wednesday, April 26, 2023 11:39 PM
> > To: Xing, Beilei 
> > Cc: dev@dpdk.org; Zhang, Qi Z 
> > Subject: [PATCH v2] common/idpf: remove unnecessary compile option
> >
> > Remove compile option "__KERNEL" which should not be considered in
> DPDK.
> > Also only #include  in idpf_osdep.h.
> >
> > Signed-off-by: Qi Zhang 
> 
> Acked-by: Beilei Xing 

Applied to dpdk-next-net-intel.

Thanks
Qi


RE: [PATCH] net/ixgbe: consider DCB/VMDq conf when getting RSS conf

2023-04-27 Thread Zhang, Qi Z



> -Original Message-
> From: Min Zhou 
> Sent: Wednesday, April 12, 2023 6:02 PM
> To: Yang, Qiming ; Wu, Wenjun1
> ; zhou...@loongson.cn
> Cc: dev@dpdk.org; maob...@loongson.cn
> Subject: [PATCH] net/ixgbe: consider DCB/VMDq conf when getting RSS conf
> 
> The mrqe field of MRQC register is an enum. From the Intel 82599 datasheet,
> we know that these values below for the mrqe field are all related to RSS
> configuration:
>   b = RSS disabled.
>   0001b = RSS only -- Single set of RSS 16 queues.
>   0010b = DCB enabled and RSS disabled -- 8 TCs, each allocated 1
> queue.
>   0011b = DCB enabled and RSS disabled -- 4 TCs, each allocated 1
> queue.
>   0100b = DCB and RSS -- 8 TCs, each allocated 16 RSS queues.
>   0101b = DCB and RSS -- 4 TCs, each allocated 16 RSS queues.
>   1000b = Virtualization only -- 64 pools, no RSS, each pool allocated
>   2 queues.
>   1010b = Virtualization and RSS -- 32 pools, each allocated 4 RSS
> queues.
>   1011b = Virtualization and RSS -- 64 pools, each allocated 2 RSS
> queues.
> 
> The ixgbe pmd will check whether the rss is enabled or not when getting rss
> conf. So, beside comparing the value of mrqe field with xxx0b and xxx1b, we
> also needto consider the other configurations, such as DCB + RSS or VMDQ +
> RSS. Otherwise, we may not get the correct rss conf in some cases, such as
> when we use DCB and RSS with 8 TCs which corresponds to 0100b for the
> mrqe field.
> 
> Signed-off-by: Min Zhou 
> ---
>  drivers/net/ixgbe/ixgbe_rxtx.c | 91 ++
>  1 file changed, 80 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index c9d6ca9efe..1eff0053ed 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -3461,18 +3461,89 @@ static uint8_t rss_intel_key[40] = {
>   0x6A, 0x42, 0xB7, 0x3B, 0xBE, 0xAC, 0x01, 0xFA,  };
> 
> +/*
> + * This function removes the rss configuration in the mrqe field of
> +MRQC
> + * register and tries to maintain other configurations in the field,
> +such
> + * DCB and Virtualization.
> + *
> + * The MRQC register supplied in section 7.1.2.8.3 of the Intel 82599
> datasheet.
> + * From the datasheet, we know that the mrqe field is an enum. So,
> +masking the
> + * mrqe field with '~IXGBE_MRQC_RSSEN' may not completely disable rss
> + * configuration. For example, the value of mrqe is equal to 0101b when
> +DCB and
> + * RSS with 4 TCs configured, however 'mrqe &= ~0x01' is equal to 0100b
> +which
> + * corresponds to DCB and RSS with 8 TCs.
> + */
> +static void
> +ixgbe_mrqc_rss_remove(struct ixgbe_hw *hw) {
> + uint32_t mrqc;
> + uint32_t mrqc_reg;
> + uint32_t mrqe_val;
> +
> + mrqc_reg = ixgbe_mrqc_reg_get(hw->mac.type);
> + mrqc = IXGBE_READ_REG(hw, mrqc_reg);
> + mrqe_val = mrqc & IXGBE_MRQC_MRQE_MASK;
> +
> + switch (mrqe_val) {
> + case IXGBE_MRQC_RSSEN:
> + /* Completely disable rss */
> + mrqe_val = 0;
> + break;
> + case IXGBE_MRQC_RTRSS8TCEN:
> + mrqe_val = IXGBE_MRQC_RT8TCEN;
> + break;
> + case IXGBE_MRQC_RTRSS4TCEN:
> + mrqe_val = IXGBE_MRQC_RT4TCEN;
> + break;
> + case IXGBE_MRQC_VMDQRSS64EN:
> + /* FIXME. Can 32 pools with rss convert to 64 pools without rss? */
> + case IXGBE_MRQC_VMDQRSS32EN:

better not change the pool number, can we just print a warning and break?

Otherwise
Acked-by: Qi Zhang 




Re: [PATCH] net/ixgbe: consider DCB/VMDq conf when getting RSS conf

2023-04-27 Thread zhoumin

Hi Qi,

Thanks your kind review.

On Thu, Apr 27, 2023 at 7:01PM, Zhang, Qi Z wrote:


-Original Message-
From: Min Zhou 
Sent: Wednesday, April 12, 2023 6:02 PM
To: Yang, Qiming ; Wu, Wenjun1
; zhou...@loongson.cn
Cc: dev@dpdk.org; maob...@loongson.cn
Subject: [PATCH] net/ixgbe: consider DCB/VMDq conf when getting RSS conf

The mrqe field of MRQC register is an enum. From the Intel 82599 datasheet,
we know that these values below for the mrqe field are all related to RSS
configuration:
b = RSS disabled.
0001b = RSS only -- Single set of RSS 16 queues.
0010b = DCB enabled and RSS disabled -- 8 TCs, each allocated 1
queue.
0011b = DCB enabled and RSS disabled -- 4 TCs, each allocated 1
queue.
0100b = DCB and RSS -- 8 TCs, each allocated 16 RSS queues.
0101b = DCB and RSS -- 4 TCs, each allocated 16 RSS queues.
1000b = Virtualization only -- 64 pools, no RSS, each pool allocated
2 queues.
1010b = Virtualization and RSS -- 32 pools, each allocated 4 RSS
queues.
1011b = Virtualization and RSS -- 64 pools, each allocated 2 RSS
queues.

The ixgbe pmd will check whether the rss is enabled or not when getting rss
conf. So, beside comparing the value of mrqe field with xxx0b and xxx1b, we
also needto consider the other configurations, such as DCB + RSS or VMDQ +
RSS. Otherwise, we may not get the correct rss conf in some cases, such as
when we use DCB and RSS with 8 TCs which corresponds to 0100b for the
mrqe field.

Signed-off-by: Min Zhou 
---
  drivers/net/ixgbe/ixgbe_rxtx.c | 91 ++
  1 file changed, 80 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index c9d6ca9efe..1eff0053ed 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -3461,18 +3461,89 @@ static uint8_t rss_intel_key[40] = {
0x6A, 0x42, 0xB7, 0x3B, 0xBE, 0xAC, 0x01, 0xFA,  };

+/*
+ * This function removes the rss configuration in the mrqe field of
+MRQC
+ * register and tries to maintain other configurations in the field,
+such
+ * DCB and Virtualization.
+ *
+ * The MRQC register supplied in section 7.1.2.8.3 of the Intel 82599
datasheet.
+ * From the datasheet, we know that the mrqe field is an enum. So,
+masking the
+ * mrqe field with '~IXGBE_MRQC_RSSEN' may not completely disable rss
+ * configuration. For example, the value of mrqe is equal to 0101b when
+DCB and
+ * RSS with 4 TCs configured, however 'mrqe &= ~0x01' is equal to 0100b
+which
+ * corresponds to DCB and RSS with 8 TCs.
+ */
+static void
+ixgbe_mrqc_rss_remove(struct ixgbe_hw *hw) {
+   uint32_t mrqc;
+   uint32_t mrqc_reg;
+   uint32_t mrqe_val;
+
+   mrqc_reg = ixgbe_mrqc_reg_get(hw->mac.type);
+   mrqc = IXGBE_READ_REG(hw, mrqc_reg);
+   mrqe_val = mrqc & IXGBE_MRQC_MRQE_MASK;
+
+   switch (mrqe_val) {
+   case IXGBE_MRQC_RSSEN:
+   /* Completely disable rss */
+   mrqe_val = 0;
+   break;
+   case IXGBE_MRQC_RTRSS8TCEN:
+   mrqe_val = IXGBE_MRQC_RT8TCEN;
+   break;
+   case IXGBE_MRQC_RTRSS4TCEN:
+   mrqe_val = IXGBE_MRQC_RT4TCEN;
+   break;
+   case IXGBE_MRQC_VMDQRSS64EN:
+   /* FIXME. Can 32 pools with rss convert to 64 pools without rss? */
+   case IXGBE_MRQC_VMDQRSS32EN:

better not change the pool number, can we just print a warning and break?



Yes. The implicit changing of the pool number is not a good way. I think 
just printing a warning and keeping the 'mrqe' field unchanged  may be 
better.


I will do that in the v2 patch.



Otherwise
Acked-by: Qi Zhang 


Best regards,

Min




[PATCH] net/mlx5: enhance error log for tunnel offloading

2023-04-27 Thread David Marchand
Tunnel offloading is linked to running a port with the non-obvious
dv_xmeta_en=3 devargs. Hint at it for "normal" users.

Fixes: 4ec6360de37d ("net/mlx5: implement tunnel offload")

Signed-off-by: David Marchand 
---
 drivers/net/mlx5/mlx5_flow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index d0275fdd00..51f6ecd25d 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -11295,7 +11295,7 @@ mlx5_flow_tunnel_validate(struct rte_eth_dev *dev,
if (!is_tunnel_offload_active(dev))
return rte_flow_error_set(error, ENOTSUP,
  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
- "tunnel offload was not activated");
+ "tunnel offload was not activated, 
consider setting dv_xmeta_en=3");
if (!tunnel)
return rte_flow_error_set(error, EINVAL,
  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
-- 
2.40.0



Re: [PATCH] net/mlx5: enhance error log for tunnel offloading

2023-04-27 Thread David Marchand
Hello guys,

On Thu, Apr 27, 2023 at 1:56 PM David Marchand
 wrote:
>
> Tunnel offloading is linked to running a port with the non-obvious
> dv_xmeta_en=3 devargs. Hint at it for "normal" users.
>
> Fixes: 4ec6360de37d ("net/mlx5: implement tunnel offload")
>
> Signed-off-by: David Marchand 

Regardless of what happens to this patch, would it be possible to move
away from those dv_xxx_en devargs things and go with a more
straightforward API, like requesting tunnel offloading (or other) in
rte_flow_configure?


-- 
David Marchand



RE: [PATCH] net/mlx5: enhance error log for tunnel offloading

2023-04-27 Thread Gregory Etelson
Hello David,

> Tunnel offloading is linked to running a port with the non-obvious
> dv_xmeta_en=3 devargs. Hint at it for "normal" users.
> 
> Fixes: 4ec6360de37d ("net/mlx5: implement tunnel offload")
> 
> Signed-off-by: David Marchand 
> ---
>  drivers/net/mlx5/mlx5_flow.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index d0275fdd00..51f6ecd25d 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -11295,7 +11295,7 @@ mlx5_flow_tunnel_validate(struct rte_eth_dev
> *dev,
> if (!is_tunnel_offload_active(dev))
> return rte_flow_error_set(error, ENOTSUP,
>   RTE_FLOW_ERROR_TYPE_ACTION_CONF, 
> NULL,
> - "tunnel offload was not activated");
> + "tunnel offload was not activated, 
> consider
> setting dv_xmeta_en=3");
> if (!tunnel)
> return rte_flow_error_set(error, EINVAL,
>   RTE_FLOW_ERROR_TYPE_ACTION_CONF, 
> NULL,
> --
> 2.40.0

Reviewed-by: Gregory Etelson 



RE: [EXT] [PATCH v5 02/15] graph: split graph worker into common and default model

2023-04-27 Thread Pavan Nikhilesh Bhagavatula



> -Original Message-
> From: Zhirun Yan 
> Sent: Friday, March 31, 2023 9:33 AM
> To: dev@dpdk.org; Jerin Jacob Kollanukkaran ; Kiran
> Kumar Kokkilagadda ; Nithin Kumar Dabilpuram
> ; step...@networkplumber.org
> Cc: cunming.li...@intel.com; haiyue.w...@intel.com; Zhirun Yan
> 
> Subject: [EXT] [PATCH v5 02/15] graph: split graph worker into common and
> default model
> 
> External Email
> 
> --
> To support multiple graph worker model, split graph into common
> and default. Naming the current walk function as rte_graph_model_rtc
> cause the default model is RTC(Run-to-completion).
> 
> Signed-off-by: Haiyue Wang 
> Signed-off-by: Cunming Liang 
> Signed-off-by: Zhirun Yan 
> ---
>  lib/graph/graph_pcap.c  |  2 +-
>  lib/graph/graph_private.h   |  2 +-
>  lib/graph/meson.build   |  2 +-
>  lib/graph/rte_graph_model_rtc.h | 61
> +
>  lib/graph/rte_graph_worker.h| 34 
>  lib/graph/rte_graph_worker_common.h | 57 ---
>  6 files changed, 98 insertions(+), 60 deletions(-)
>  create mode 100644 lib/graph/rte_graph_model_rtc.h
>  create mode 100644 lib/graph/rte_graph_worker.h
> 
> diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
> index 8a220370fa..6c43330029 100644
> --- a/lib/graph/graph_pcap.c
> +++ b/lib/graph/graph_pcap.c
> @@ -10,7 +10,7 @@
>  #include 
>  #include 
> 
> -#include "rte_graph_worker_common.h"
> +#include "rte_graph_worker.h"
> 
>  #include "graph_pcap_private.h"
> 
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index f08dbc7e9d..7d1b30b8ac 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -12,7 +12,7 @@
>  #include 
> 
>  #include "rte_graph.h"
> -#include "rte_graph_worker_common.h"
> +#include "rte_graph_worker.h"
> 
>  extern int rte_graph_logtype;
> 
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index 4e2b612ad3..3526d1b5d4 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -16,6 +16,6 @@ sources = files(
>  'graph_populate.c',
>  'graph_pcap.c',
>  )
> -headers = files('rte_graph.h', 'rte_graph_worker_common.h')
> +headers = files('rte_graph.h', 'rte_graph_worker.h')
> 
>  deps += ['eal', 'pcapng']
> diff --git a/lib/graph/rte_graph_model_rtc.h
> b/lib/graph/rte_graph_model_rtc.h
> new file mode 100644
> index 00..665560f831
> --- /dev/null
> +++ b/lib/graph/rte_graph_model_rtc.h
> @@ -0,0 +1,61 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Intel Corporation
> + */
> +

Please retain Marvell copyright too.

> +#include "rte_graph_worker_common.h"
> +
> +/**
> + * Perform graph walk on the circular buffer and invoke the process
> function
> + * of the nodes and collect the stats.
> + *
> + * @param graph
> + *   Graph pointer returned from rte_graph_lookup function.
> + *
> + * @see rte_graph_lookup()
> + */
> +static inline void
> +rte_graph_walk_rtc(struct rte_graph *graph)
> +{
> + const rte_graph_off_t *cir_start = graph->cir_start;
> + const rte_node_t mask = graph->cir_mask;
> + uint32_t head = graph->head;
> + struct rte_node *node;
> + uint64_t start;
> + uint16_t rc;
> + void **objs;
> +
> + /*
> +  * Walk on the source node(s) ((cir_start - head) -> cir_start) and
> then
> +  * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
> +  * in a circular buffer fashion.
> +  *
> +  *  +-+ <= cir_start - head [number of source nodes]
> +  *  | |
> +  *  | ... | <= source nodes
> +  *  | |
> +  *  +-+ <= cir_start [head = 0] [tail = 0]
> +  *  | |
> +  *  | ... | <= pending streams
> +  *  | |
> +  *  +-+ <= cir_start + mask
> +  */
> + while (likely(head != graph->tail)) {
> + node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> + RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> + objs = node->objs;
> + rte_prefetch0(objs);
> +
> + if (rte_graph_has_stats_feature()) {
> + start = rte_rdtsc();
> + rc = node->process(graph, node, objs, node->idx);
> + node->total_cycles += rte_rdtsc() - start;
> + node->total_calls++;
> + node->total_objs += rc;
> + } else {
> + node->process(graph, node, objs, node->idx);
> + }
> + node->idx = 0;
> + head = likely((int32_t)head > 0) ? head & mask :
> head;
> + }
> + graph->tail = 0;
> +}
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> new file mode 100644
> index 00..7ea18ba80a
> --- /dev/null
> +++ b/lib/graph/rte_graph_worker.h
> @@ -0

RE: [EXT] [PATCH v5 09/15] graph: introduce stream moving cross cores

2023-04-27 Thread Pavan Nikhilesh Bhagavatula
> This patch introduces key functions to allow a worker thread to
> enable enqueue and move streams of objects to the next nodes over
> different cores.
> 
> Signed-off-by: Haiyue Wang 
> Signed-off-by: Cunming Liang 
> Signed-off-by: Zhirun Yan 
> ---
>  lib/graph/graph_private.h|  27 +
>  lib/graph/meson.build|   2 +-
>  lib/graph/rte_graph_model_dispatch.c | 145
> +++
>  lib/graph/rte_graph_model_dispatch.h |  37 +++
>  lib/graph/version.map|   2 +
>  5 files changed, 212 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index b66b18ebbc..e1a2a4bfd8 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g);
>   */
>  void node_dump(FILE *f, struct node *n);
> 
> +/**
> + * @internal
> + *
> + * Create the graph schedule work queue. And all cloned graphs attached to
> the
> + * parent graph MUST be destroyed together for fast schedule design
> limitation.
> + *
> + * @param _graph
> + *   The graph object
> + * @param _parent_graph
> + *   The parent graph object which holds the run-queue head.
> + *
> + * @return
> + *   - 0: Success.
> + *   - <0: Graph schedule work queue related error.
> + */
> +int graph_sched_wq_create(struct graph *_graph, struct graph
> *_parent_graph);
> +
> +/**
> + * @internal
> + *
> + * Destroy the graph schedule work queue.
> + *
> + * @param _graph
> + *   The graph object
> + */
> +void graph_sched_wq_destroy(struct graph *_graph);
> +
>  #endif /* _RTE_GRAPH_PRIVATE_H_ */
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index c729d984b6..e21affa280 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -20,4 +20,4 @@ sources = files(
>  )
>  headers = files('rte_graph.h', 'rte_graph_worker.h')
> 
> -deps += ['eal', 'pcapng']
> +deps += ['eal', 'pcapng', 'mempool', 'ring']
> diff --git a/lib/graph/rte_graph_model_dispatch.c
> b/lib/graph/rte_graph_model_dispatch.c
> index 4a2f99496d..a300fefb85 100644
> --- a/lib/graph/rte_graph_model_dispatch.c
> +++ b/lib/graph/rte_graph_model_dispatch.c
> @@ -5,6 +5,151 @@
>  #include "graph_private.h"
>  #include "rte_graph_model_dispatch.h"
> 
> +int
> +graph_sched_wq_create(struct graph *_graph, struct graph
> *_parent_graph)
> +{
> + struct rte_graph *parent_graph = _parent_graph->graph;
> + struct rte_graph *graph = _graph->graph;
> + unsigned int wq_size;
> +
> + wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes);
> + wq_size = rte_align32pow2(wq_size + 1);

Hi Zhirun,

We should introduce a new function `rte_graph_configure` which can help 
application to control the ring size and mempool size of the work queue?
We could fallback to default values if nothing is configured.

rte_graph_configure should take a 
struct rte_graph_config {
struct {
u64 rsvd[8];
} rtc;
struct {
u16 wq_size;
...
} dispatch;
};

This will help future graph models to have their own configuration.

We can have a rte_graph_config_init() function to initialize the 
rte_graph_config structure.


> +
> + graph->wq = rte_ring_create(graph->name, wq_size, graph->socket,
> + RING_F_SC_DEQ);
> + if (graph->wq == NULL)
> + SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ");
> +
> + graph->mp = rte_mempool_create(graph->name, wq_size,
> +sizeof(struct graph_sched_wq_node),
> +0, 0, NULL, NULL, NULL, NULL,
> +graph->socket, MEMPOOL_F_SP_PUT);
> + if (graph->mp == NULL)
> + SET_ERR_JMP(EIO, fail_mp,
> + "Failed to allocate graph WQ schedule entry");
> +
> + graph->lcore_id = _graph->lcore_id;
> +
> + if (parent_graph->rq == NULL) {
> + parent_graph->rq = &parent_graph->rq_head;
> + SLIST_INIT(parent_graph->rq);
> + }
> +
> + graph->rq = parent_graph->rq;
> + SLIST_INSERT_HEAD(graph->rq, graph, rq_next);
> +
> + return 0;
> +
> +fail_mp:
> + rte_ring_free(graph->wq);
> + graph->wq = NULL;
> +fail:
> + return -rte_errno;
> +}
> +
> +void
> +graph_sched_wq_destroy(struct graph *_graph)
> +{
> + struct rte_graph *graph = _graph->graph;
> +
> + if (graph == NULL)
> + return;
> +
> + rte_ring_free(graph->wq);
> + graph->wq = NULL;
> +
> + rte_mempool_free(graph->mp);
> + graph->mp = NULL;
> +}
> +
> +static __rte_always_inline bool
> +__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph
> *graph)
> +{
> + struct graph_sched_wq_node *wq_node;
> + uint16_t off = 0;
> + uint16_t size;
> +
> +submit_again:
> + if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0)
> + goto fallback;
> +
> + size = RTE_MI

RE: [EXT] [PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch

2023-04-27 Thread Pavan Nikhilesh Bhagavatula
> This patch introduces the task scheduler mechanism to enable dispatching
> tasks to another worker cores. Currently, there is only a local work
> queue for one graph to walk. We introduce a scheduler worker queue in
> each worker core for dispatching tasks. It will perform the walk on
> scheduler work queue first, then handle the local work queue.
> 
> Signed-off-by: Haiyue Wang 
> Signed-off-by: Cunming Liang 
> Signed-off-by: Zhirun Yan 
> ---
>  lib/graph/rte_graph_model_dispatch.h | 42
> 
>  1 file changed, 42 insertions(+)
> 
> diff --git a/lib/graph/rte_graph_model_dispatch.h
> b/lib/graph/rte_graph_model_dispatch.h
> index 18fa7ce0ab..65b2cc6d87 100644
> --- a/lib/graph/rte_graph_model_dispatch.h
> +++ b/lib/graph/rte_graph_model_dispatch.h
> @@ -73,6 +73,48 @@ __rte_experimental
>  int rte_graph_model_dispatch_lcore_affinity_set(const char *name,
>   unsigned int lcore_id);
> 
> +/**
> + * Perform graph walk on the circular buffer and invoke the process
> function
> + * of the nodes and collect the stats.
> + *
> + * @param graph
> + *   Graph pointer returned from rte_graph_lookup function.
> + *
> + * @see rte_graph_lookup()
> + */
> +__rte_experimental
> +static inline void
> +rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
> +{
> + const rte_graph_off_t *cir_start = graph->cir_start;
> + const rte_node_t mask = graph->cir_mask;
> + uint32_t head = graph->head;
> + struct rte_node *node;

I think we should add a RTE_ASSERT here to make sure that the graph object is a 
cloned graph.

> +
> + if (graph->wq != NULL)
> + __rte_graph_sched_wq_process(graph);
> +
> + while (likely(head != graph->tail)) {
> + node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> +
> + /* skip the src nodes which not bind with current worker */
> + if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id)
> + continue;
> +
> + /* Schedule the node until all task/objs are done */
> + if (node->lcore_id != RTE_MAX_LCORE &&
> + graph->lcore_id != node->lcore_id && graph->rq != NULL
> &&
> + __rte_graph_sched_node_enqueue(node, graph->rq))
> + continue;
> +
> + __rte_node_process(graph, node);
> +
> + head = likely((int32_t)head > 0) ? head & mask : head;
> + }
> +
> + graph->tail = 0;
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.37.2



RE: [EXT] [PATCH v5 03/15] graph: move node process into inline function

2023-04-27 Thread Pavan Nikhilesh Bhagavatula
> Node process is a single and reusable block, move the code into an inline
> function.
> 
> Signed-off-by: Haiyue Wang 
> Signed-off-by: Cunming Liang 
> Signed-off-by: Zhirun Yan 
> ---
>  lib/graph/rte_graph_model_rtc.h | 20 ++---
>  lib/graph/rte_graph_worker_common.h | 33
> +
>  2 files changed, 35 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/graph/rte_graph_model_rtc.h
> b/lib/graph/rte_graph_model_rtc.h
> index 665560f831..0dcb7151e9 100644
> --- a/lib/graph/rte_graph_model_rtc.h
> +++ b/lib/graph/rte_graph_model_rtc.h
> @@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph)
>   const rte_node_t mask = graph->cir_mask;
>   uint32_t head = graph->head;
>   struct rte_node *node;
> - uint64_t start;
> - uint16_t rc;
> - void **objs;
> 
>   /*
>* Walk on the source node(s) ((cir_start - head) -> cir_start) and
> then
> @@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph)
>*/
>   while (likely(head != graph->tail)) {
>   node = (struct rte_node *)RTE_PTR_ADD(graph,
> cir_start[(int32_t)head++]);
> - RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> - objs = node->objs;
> - rte_prefetch0(objs);
> -
> - if (rte_graph_has_stats_feature()) {
> - start = rte_rdtsc();

Since we are refactoring this function could you change rte_rdtsc() to 
rte_rdtsc_precise().

> - rc = node->process(graph, node, objs, node->idx);
> - node->total_cycles += rte_rdtsc() - start;
> - node->total_calls++;
> - node->total_objs += rc;
> - } else {
> - node->process(graph, node, objs, node->idx);
> - }
> - node->idx = 0;
> - head = likely((int32_t)head > 0) ? head & mask :
> head;
> + __rte_node_process(graph, node);
> + head = likely((int32_t)head > 0) ? head & mask : head;
>   }
>   graph->tail = 0;
>  }
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index b58f8f6947..41428974db 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct
> rte_graph *graph,
> 
>  /* Fast path helper functions */
> 
> +/**
> + * @internal
> + *
> + * Enqueue a given node to the tail of the graph reel.
> + *
> + * @param graph
> + *   Pointer Graph object.
> + * @param node
> + *   Pointer to node object to be enqueued.
> + */
> +static __rte_always_inline void
> +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> +{
> + uint64_t start;
> + uint16_t rc;
> + void **objs;
> +
> + RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> + objs = node->objs;
> + rte_prefetch0(objs);
> +
> + if (rte_graph_has_stats_feature()) {
> + start = rte_rdtsc();
> + rc = node->process(graph, node, objs, node->idx);
> + node->total_cycles += rte_rdtsc() - start;
> + node->total_calls++;
> + node->total_objs += rc;
> + } else {
> + node->process(graph, node, objs, node->idx);
> + }
> + node->idx = 0;
> +}
> +
>  /**
>   * @internal
>   *
> --
> 2.37.2



DPDK 20.11.8 released

2023-04-27 Thread luca . boccassi
Hi all,

Here is a new stable release:
https://fast.dpdk.org/rel/dpdk-20.11.8.tar.xz

The git tree is at:
https://dpdk.org/browse/dpdk-stable/?h=20.11

Luca Boccassi

---
 .github/workflows/build.yml|   14 +-
 VERSION|2 +-
 app/test-bbdev/test_bbdev_perf.c   |   38 +-
 app/test-compress-perf/comp_perf_options.h |6 +-
 app/test-compress-perf/comp_perf_options_parse.c   |8 +-
 app/test-compress-perf/comp_perf_test_common.c |  127 ++-
 app/test-compress-perf/comp_perf_test_cyclecount.c |   71 +-
 app/test-compress-perf/comp_perf_test_throughput.c |   66 +-
 app/test-compress-perf/comp_perf_test_verify.c |   62 +-
 app/test-compress-perf/main.c  |8 +
 app/test-crypto-perf/cperf_options_parsing.c   |1 +
 app/test-crypto-perf/cperf_test_common.c   |8 +-
 app/test-crypto-perf/cperf_test_vector_parsing.c   |1 +
 app/test-flow-perf/main.c  |8 +-
 app/test-pmd/cmdline.c |   32 +-
 app/test-pmd/cmdline_flow.c|   48 +-
 app/test-pmd/csumonly.c|   17 +-
 app/test-pmd/ieee1588fwd.c |2 +-
 app/test-pmd/noisy_vnf.c   |5 +-
 app/test-pmd/testpmd.c |  112 +-
 app/test-pmd/testpmd.h |1 +
 app/test/packet_burst_generator.c  |   26 +-
 app/test/test_cryptodev.c  |   12 +-
 .../test_cryptodev_security_pdcp_test_vectors.h|8 +-
 app/test/test_mbuf.c   |2 +
 app/test/test_reorder.c|2 +
 config/meson.build |6 +-
 devtools/check-git-log.sh  |2 +-
 doc/guides/linux_gsg/enable_func.rst   |4 +-
 doc/guides/prog_guide/cryptodev_lib.rst|  123 +--
 doc/guides/prog_guide/event_timer_adapter.rst  |2 +-
 doc/guides/rel_notes/release_20_11.rst |  269 +
 doc/guides/sample_app_ug/l2_forward_cat.rst|9 +-
 doc/guides/sample_app_ug/pipeline.rst  |2 +-
 drivers/baseband/acc100/rte_acc100_pmd.c   |9 +-
 drivers/baseband/turbo_sw/meson.build  |   26 +-
 drivers/bus/ifpga/ifpga_bus.c  |3 +-
 drivers/common/mlx5/mlx5_common.h  |7 +-
 drivers/compress/qat/qat_comp_pmd.c|2 +-
 drivers/crypto/ccp/ccp_dev.c   |4 +-
 drivers/crypto/ccp/ccp_pci.c   |3 +-
 drivers/crypto/ccp/rte_ccp_pmd.c   |2 +-
 drivers/crypto/qat/qat_sym_session.c   |5 +-
 drivers/crypto/snow3g/rte_snow3g_pmd.c |   13 +-
 drivers/net/bnxt/bnxt_ethdev.c |2 +-
 drivers/net/bnxt/bnxt_rxq.c|6 +
 drivers/net/bnxt/bnxt_rxr.c|1 -
 drivers/net/bnxt/bnxt_txr.c|6 +
 drivers/net/e1000/em_rxtx.c|3 +-
 drivers/net/hns3/hns3_cmd.h|1 +
 drivers/net/hns3/hns3_ethdev.c |   36 +-
 drivers/net/hns3/hns3_ethdev.h |9 -
 drivers/net/hns3/hns3_ethdev_vf.c  |   25 +-
 drivers/net/hns3/hns3_fdir.h   |   10 +-
 drivers/net/hns3/hns3_flow.c   | 1152 +---
 drivers/net/hns3/hns3_mp.c |4 +-
 drivers/net/hns3/hns3_rss.c|  695 
 drivers/net/hns3/hns3_rss.h|  135 ++-
 drivers/net/hns3/hns3_rxtx.c   |   82 +-
 drivers/net/hns3/hns3_rxtx.h   |2 +
 drivers/net/i40e/i40e_ethdev.c |   67 +-
 drivers/net/i40e/i40e_ethdev.h |5 +-
 drivers/net/i40e/i40e_flow.c   |8 +
 drivers/net/iavf/iavf.h|1 +
 drivers/net/iavf/iavf_ethdev.c |3 +
 drivers/net/iavf/iavf_generic_flow.c   |3 +-
 drivers/net/iavf/iavf_vchnl.c  |   72 +-
 drivers/net/ice/ice_generic_flow.c |8 +
 drivers/net/ipn3ke/ipn3ke_ethdev.c |2 +-
 drivers/net/ipn3ke/ipn3ke_representor.c|2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c   |   31 +-
 drivers/net/ixgbe/ixgbe_flow.c |   12 +-
 drivers/net/mlx5/linux/mlx5_ethdev_os.c|   19 +-
 drivers/net/mlx5/mlx5_flow.c   |5 +-
 drivers/net/mlx5/mlx5_rxtx.c   |  135 ++-
 drivers/net/mlx5/mlx5_rxtx.h   |4 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c   |   13 +-
 drivers/net/mlx5/m

RE: [PATCH] net/mana: use RTE_LOG_DP for logs on datapath

2023-04-27 Thread Long Li


> -Original Message-
> From: Ferruh Yigit 
> Sent: Friday, March 3, 2023 5:15 PM
> To: Long Li ; Stephen Hemminger
> 
> Cc: lon...@linuxonhyperv.com; Thomas Monjalon ;
> Andrew Rybchenko ; Jerin Jacob
> Kollanukkaran ; David Marchand
> ; dev@dpdk.org; Ajay Sharma
> ; sta...@dpdk.org; Luca Boccassi
> ; Qi Z Zhang ; Ajit Khaparde
> ; Bruce Richardson
> ; Konstantin Ananyev
> ; Olivier Matz ;
> Honnappa Nagarahalli ; techbo...@dpdk.org
> Subject: Re: [PATCH] net/mana: use RTE_LOG_DP for logs on datapath
> 
> On 3/3/2023 7:04 PM, Long Li wrote:
> >> Subject: Re: [PATCH] net/mana: use RTE_LOG_DP for logs on datapath
> >>
> >> On 3/3/2023 2:16 AM, Long Li wrote:
>  Subject: Re: [PATCH] net/mana: use RTE_LOG_DP for logs on datapath
> 
>  On Thu, 23 Feb 2023 10:09:17 -0800
>  Stephen Hemminger  wrote:
> 
> > On Thu, 23 Feb 2023 14:07:25 + Ferruh Yigit
> >  wrote:
> >
> >> Overall I am not sure if anyone is interested in driver datapath
> >> logs other than driver developers themselves.
> >>
> >> For datapath logging I think there are two concerns,
> >> 1) It should not eat *any* cycles unless explicitly enabled
> >> 2) Capability of enable/disable them because of massive amount of
> >> log it can generate
> >>
> >>
> >> Currently there are two existing approaches for driver datapath 
> >> logging:
> >> i)  Controlled by 'RTE_ETHDEV_DEBUG_RX/TX' compile time flag,
> >> when enabled 'rte_log()' is used with Rx/Tx specific log type.
> >> ii) 'RTE_LOG_DP' ', compile time control per logtype via
> >> 'RTE_LOG_DP_LEVEL',
> >>  when enabled 'rte_log()' is used with PMD logtype.
> >>
> >>
> >> In (ii), need to re-compile code when you need to increase the
> >> log verbosity, and it leaks to production code as mentioned above.
> >>
> >> For (i), developer compiles once enabling debug, later can fine
> >> grain log level dynamically. This is more DPDK developer focused
> >> approach.
> >>
> >>
> >> [1]
> >> According above, what do you think to retire 'RTE_LOG_DP', (at
> >> least within ethdev datapath), and chose (i) as preferred
> >> datapath
> >> logging?
> >
> > I agree, the current tx/rx logging is a mess.
> > Each driver is different, each driver has to have something to
> > enable it; and it really isn't useful beyond the driver developer.
> >
> > Using tracing seems like a much better option. Could we agree on a
> > common set of trace points for drivers and fix all drivers to use
> > the same
>  thing.
> > Probably will cause some upset among driver developers:
> > "where did my nice printf's go, now I have to learn tracing"
> > but DPDK has a good facility here, lets use it.
> >
> > My proposal would be:
> > - agree on common set of trace points
> > - apply to all drivers
> > - remove RTE_LOG_DP()
> > - remove per driver RX/TX options
> > - side effect, more uses of RTE_LOGTYPE_PMD go away.
> 
>  Here is an example of using tracepoints instead.
>  Compile tested for example only.
> 
>  Note: using tracepoints it is possible to keep some of the
>  tracepoints even if fastpath is not enabled.  Things like running
>  out of Tx or Mbuf is not something that is perf critical; but would
>  be good for
> >> application to see.
> >>>
> >>> Thank you for the example.
> >>>
> >>> I sent another patch converting data path logs (mana) to trace points.
> >>>
> >>
> >> Hi Long,
> >>
> >> Thanks for the effort, you were quick on this while discussion is going on.
> >>
> >> Although tracepoint is a good feature, I am not sure if it can fully
> >> replace the logging.
> >> I think usage is slightly different, trace is missing custom human
> >> readable message, which can be very helpful for end user.
> >>
> >> And overall, it is a high level decision to switch logging to trace,
> >> it is inconsistent to switch only single driver, perhaps techboard
> >> (cc'ed) can discuss this.
> >>
> >> Until such consensus reached, I think driver should continue with logging.
> >>
> >>
> >>
> >> And for the logging, I suggest option (i) above, I was hoping more
> >> comments but since it is missing I hope this can be discussed in
> >> techboard for a conclusion.
> >
> > Hi Ferruh,
> >
> > Are you suggesting that MANA should use 'RTE_ETHDEV_DEBUG_RX/TX'?
> >
> > I'm happy to implement the logging in this way.
> >
> 
> Yes, that looks to me better balance for compile time / runtime config for 
> drive
> developers.
> 
> But it prevents product code / end user to get data path logs, although I 
> believe
> this is OK I am not sure how useful datapath logs in production code, that is 
> why I
> am looking for more comment for a decision.
> 
> Let's wait for next techboard meeting, in case this is discussed there, before
> making new

RE: [PATCH v2 2/7] net/idpf: save main time by alarm

2023-04-27 Thread Zhang, Qi Z



> -Original Message-
> From: Qiao, Wenjing 
> Sent: Friday, April 21, 2023 3:16 PM
> To: Wu, Jingjing ; Xing, Beilei
> ; Zhang, Qi Z 
> Cc: dev@dpdk.org; Qiao, Wenjing ;
> sta...@dpdk.org
> Subject: [PATCH v2 2/7] net/idpf: save main time by alarm
> 
> Using alarm to save main time from registers every 1 second.
> 
> Fixes: 8c6098afa075 ("common/idpf: add Rx/Tx data path")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Wenjing Qiao 
> ---
>  drivers/net/idpf/idpf_ethdev.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/idpf/idpf_ethdev.c b/drivers/net/idpf/idpf_ethdev.c
> index e02ec2ec5a..3f33ffbc78 100644
> --- a/drivers/net/idpf/idpf_ethdev.c
> +++ b/drivers/net/idpf/idpf_ethdev.c
> @@ -761,6 +761,12 @@ idpf_dev_start(struct rte_eth_dev *dev)
>   goto err_vec;
>   }
> 
> + if (dev->data->dev_conf.rxmode.offloads &
> RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
> + rte_eal_alarm_set(1000 * 1000,

Please use a macro for easy read.

> +   &idpf_dev_read_time_hw,
> +   (void *)base);

It seems that the alarm logic in the driver/idpf is being continued in 
common/idpf, which can make the code messy. It would be better to wrap this as 
internal logic and expose API like "idpf_rx_timestamp_start/stop" in 
common/idpf for better organization and maintainability.

> + }
> +
>   ret = idpf_vc_vectors_alloc(vport, req_vecs_num);
>   if (ret != 0) {
>   PMD_DRV_LOG(ERR, "Failed to allocate interrupt vectors");
> @@ -810,6 +816,7 @@ static int  idpf_dev_stop(struct rte_eth_dev *dev)  {
>   struct idpf_vport *vport = dev->data->dev_private;
> + struct idpf_adapter *base = vport->adapter;
> 
>   if (vport->stopped == 1)
>   return 0;
> @@ -822,6 +829,11 @@ idpf_dev_stop(struct rte_eth_dev *dev)
> 
>   idpf_vc_vectors_dealloc(vport);
> 
> + if (dev->data->dev_conf.rxmode.offloads &
> RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
> + rte_eal_alarm_cancel(idpf_dev_read_time_hw,
> +  base);
> + }
> +
>   vport->stopped = 1;
> 
>   return 0;
> --
> 2.25.1



RE: [PATCH v3 6/7] net/cpfl: register timestamp mbuf when starting dev

2023-04-27 Thread Zhang, Qi Z



> -Original Message-
> From: Qiao, Wenjing 
> Sent: Monday, April 24, 2023 5:17 PM
> To: Wu, Jingjing ; Xing, Beilei
> ; Zhang, Qi Z 
> Cc: dev@dpdk.org; Liu, Mingxia ; Qiao, Wenjing
> ; sta...@dpdk.org
> Subject: [PATCH v3 6/7] net/cpfl: register timestamp mbuf when starting dev
> 
> Due to only support timestamp at port level, registering timestamp mbuf
> should be at dev start stage.
> 
> Fixes: 8c6098afa075 ("common/idpf: add Rx/Tx data path")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Wenjing Qiao 
> Suggested-by: Jingjing Wu 
> ---
>  drivers/net/cpfl/cpfl_ethdev.c | 7 +++  drivers/net/cpfl/cpfl_ethdev.h |
> 3 +++
>  drivers/net/cpfl/cpfl_rxtx.c   | 2 ++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/drivers/net/cpfl/cpfl_ethdev.c b/drivers/net/cpfl/cpfl_ethdev.c
> index 82d8147494..416273f567 100644
> --- a/drivers/net/cpfl/cpfl_ethdev.c
> +++ b/drivers/net/cpfl/cpfl_ethdev.c
> @@ -771,6 +771,13 @@ cpfl_dev_start(struct rte_eth_dev *dev)
>   rte_eal_alarm_set(1000 * 1000,
> &idpf_dev_read_time_hw,
> (void *)base);
> + /* Register mbuf field and flag for Rx timestamp */
> + ret =
> rte_mbuf_dyn_rx_timestamp_register(&idpf_timestamp_dynfield_offset,
> +
> &idpf_timestamp_dynflag);

Can we also wrap this into common module, so we don't need to expose 
idpf_timestamp_dynfield_offset and idpf_timestamp_dynflag which is not used 
directly by PMD?

> + if (ret != 0) {
> + PMD_DRV_LOG(ERR, "Cannot register mbuf field/flag
> for timestamp");
> + return -EINVAL;
> + }
>   }
> 
>   ret = idpf_vc_vectors_alloc(vport, req_vecs_num); diff --git
> a/drivers/net/cpfl/cpfl_ethdev.h b/drivers/net/cpfl/cpfl_ethdev.h index
> 200dfcac02..eec253bc77 100644
> --- a/drivers/net/cpfl/cpfl_ethdev.h
> +++ b/drivers/net/cpfl/cpfl_ethdev.h
> @@ -57,6 +57,9 @@
>  /* Device IDs */
>  #define IDPF_DEV_ID_CPF  0x1453
> 
> +extern int idpf_timestamp_dynfield_offset; extern uint64_t
> +idpf_timestamp_dynflag;
> +
>  struct cpfl_vport_param {
>   struct cpfl_adapter_ext *adapter;
>   uint16_t devarg_id; /* arg id from user */ diff --git
> a/drivers/net/cpfl/cpfl_rxtx.c b/drivers/net/cpfl/cpfl_rxtx.c index
> de59b31b3d..cdb5b37da0 100644
> --- a/drivers/net/cpfl/cpfl_rxtx.c
> +++ b/drivers/net/cpfl/cpfl_rxtx.c
> @@ -529,6 +529,8 @@ cpfl_rx_queue_init(struct rte_eth_dev *dev,
> uint16_t rx_queue_id)
>   frame_size > rxq->rx_buf_len)
>   dev->data->scattered_rx = 1;
> 
> + if (dev->data->dev_conf.rxmode.offloads &
> RTE_ETH_RX_OFFLOAD_TIMESTAMP)
> + rxq->ts_enable = TRUE;
>   err = idpf_qc_ts_mbuf_register(rxq);
>   if (err != 0) {
>   PMD_DRV_LOG(ERR, "fail to register timestamp mbuf %u",
> --
> 2.25.1



[PATCH] net/mana: suppress TX CQE generation whenever possible

2023-04-27 Thread longli
From: Long Li 

When sending TX packets, we don't need a completion for every packet sent.
If packets are sent in a series, the completion of the last packet can be
used to indicate completion of all prior packets.

Signed-off-by: Long Li 
Cc: sta...@dpdk.org
---
 drivers/net/mana/mana.h |  3 ++-
 drivers/net/mana/tx.c   | 33 ++---
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index c79d39daa2..f280d66f6e 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -353,6 +353,7 @@ struct mana_priv {
 struct mana_txq_desc {
struct rte_mbuf *pkt;
uint32_t wqe_size_in_bu;
+   bool suppress_tx_cqe;
 };
 
 struct mana_rxq_desc {
@@ -401,7 +402,7 @@ struct mana_txq {
/* desc_ring_head is where we put pending requests to ring,
 * completion pull off desc_ring_tail
 */
-   uint32_t desc_ring_head, desc_ring_tail;
+   uint32_t desc_ring_head, desc_ring_tail, desc_ring_len;
 
struct mana_mr_btree mr_btree;
struct mana_stats stats;
diff --git a/drivers/net/mana/tx.c b/drivers/net/mana/tx.c
index ee0319c71d..c8d3911f85 100644
--- a/drivers/net/mana/tx.c
+++ b/drivers/net/mana/tx.c
@@ -43,9 +43,11 @@ mana_stop_tx_queues(struct rte_eth_dev *dev)
 
txq->desc_ring_tail =
(txq->desc_ring_tail + 1) % txq->num_desc;
+   txq->desc_ring_len--;
}
txq->desc_ring_head = 0;
txq->desc_ring_tail = 0;
+   txq->desc_ring_len = 0;
 
memset(&txq->gdma_sq, 0, sizeof(txq->gdma_sq));
memset(&txq->gdma_cq, 0, sizeof(txq->gdma_cq));
@@ -173,13 +175,14 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
int ret;
void *db_page;
uint16_t pkt_sent = 0;
-   uint32_t num_comp;
+   uint32_t num_comp, i;
 
/* Process send completions from GDMA */
num_comp = gdma_poll_completion_queue(&txq->gdma_cq,
txq->gdma_comp_buf, txq->num_desc);
 
-   for (uint32_t i = 0; i < num_comp; i++) {
+   i = 0;
+   while (i < num_comp) {
struct mana_txq_desc *desc =
&txq->desc_ring[txq->desc_ring_tail];
struct mana_tx_comp_oob *oob = (struct mana_tx_comp_oob *)
@@ -204,7 +207,16 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
 
desc->pkt = NULL;
txq->desc_ring_tail = (txq->desc_ring_tail + 1) % txq->num_desc;
+   txq->desc_ring_len--;
txq->gdma_sq.tail += desc->wqe_size_in_bu;
+
+   /* If TX CQE suppression is used, don't read more CQE but move
+* on to the next packet
+*/
+   if (desc->suppress_tx_cqe)
+   continue;
+
+   i++;
}
 
/* Post send requests to GDMA */
@@ -215,6 +227,9 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
struct one_sgl sgl;
uint16_t seg_idx;
 
+   if (txq->desc_ring_len >= txq->num_desc)
+   break;
+
/* Drop the packet if it exceeds max segments */
if (m_pkt->nb_segs > priv->max_send_sge) {
DRV_LOG(ERR, "send packet segments %d exceeding max",
@@ -310,7 +325,6 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
tx_oob.short_oob.tx_compute_UDP_checksum = 0;
}
 
-   tx_oob.short_oob.suppress_tx_CQE_generation = 0;
tx_oob.short_oob.VCQ_number = txq->gdma_cq.id;
 
tx_oob.short_oob.VSQ_frame_num =
@@ -362,6 +376,16 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
if (seg_idx != m_pkt->nb_segs)
continue;
 
+   /* If we can at least queue post two WQEs and there are at
+* least two packets to send, use TX CQE suppression for the
+* current WQE
+*/
+   if (txq->desc_ring_len + 1 < txq->num_desc &&
+   pkt_idx + 1 < nb_pkts)
+   tx_oob.short_oob.suppress_tx_CQE_generation = 1;
+   else
+   tx_oob.short_oob.suppress_tx_CQE_generation = 0;
+
struct gdma_work_request work_req;
uint32_t wqe_size_in_bu;
 
@@ -384,8 +408,11 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
/* Update queue for tracking pending requests */
desc->pkt = m_pkt;
desc->wqe_size_in_bu = wqe_size_in_bu;
+   desc->suppress_tx_cqe =
+   tx_oob.short_oob.su

RE: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions

2023-04-27 Thread Zhang, Qi Z



> -Original Message-
> From: Min Zhou 
> Sent: Monday, April 24, 2023 5:06 PM
> To: Yang, Qiming ; Wu, Wenjun1
> ; zhou...@loongson.cn
> Cc: dev@dpdk.org; maob...@loongson.cn
> Subject: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx
> functions
> 
> Segmentation fault has been observed while running the
> ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000
> processor which has 64 cores and 4 NUMA nodes.
> 
> From the ixgbe_recv_pkts_lro() function, we found that as long as the first
> packet has the EOP bit set, and the length of this packet is less than or 
> equal
> to rxq->crc_len, the segmentation fault will definitely happen even though
> on the other platforms, such as X86.
> 
> Because when processd the first packet the first_seg->next will be NULL, if at
> the same time this packet has the EOP bit set and its length is less than or
> equal to rxq->crc_len, the following loop will be excecuted:
> 
> for (lp = first_seg; lp->next != rxm; lp = lp->next)
> ;
> 
> We know that the first_seg->next will be NULL under this condition. So the
> expression of lp->next->next will cause the segmentation fault.
> 
> Normally, the length of the first packet with EOP bit set will be greater than
> rxq->crc_len. However, the out-of-order execution of CPU may make the
> read ordering of the status and the rest of the descriptor fields in this
> function not be correct. The related codes are as following:
> 
> rxdp = &rx_ring[rx_id];
>  #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
> 
> if (!(staterr & IXGBE_RXDADV_STAT_DD))
> break;
> 
>  #2 rxd = *rxdp;
> 
> The sentence #2 may be executed before sentence #1. This action is likely to
> make the ready packet zero length. If the packet is the first packet and has
> the EOP bit set, the above segmentation fault will happen.
> 
> So, we should add rte_rmb() to ensure the read ordering be correct. We also
> did the same thing in the ixgbe_recv_pkts() function to make the rxd data be
> valid even thougth we did not find segmentation fault in this function.
> 
> Signed-off-by: Min Zhou 
> ---
> v2:
> - Make the calling of rte_rmb() for all platforms
> ---
>  drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index c9d6ca9efe..302a5ab7ff 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts,
>   staterr = rxdp->wb.upper.status_error;
>   if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD)))
>   break;
> +
> + rte_rmb();

So "volatile" does not prevent re-order with Loongson compiler?


>   rxd = *rxdp;
> 
>   /*
> @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct
> rte_mbuf **rx_pkts, uint16_t nb_pkts,
>   if (!(staterr & IXGBE_RXDADV_STAT_DD))
>   break;
> 
> + rte_rmb();
>   rxd = *rxdp;
> 
>   PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
> --
> 2.31.1



RE: [PATCH] net/ice: updated 23.03 recommended matching list

2023-04-27 Thread Zhang, Qi Z



> -Original Message-
> From: Yang, Qiming 
> Sent: Wednesday, April 19, 2023 10:35 AM
> To: dev@dpdk.org
> Cc: Zhang, Qi Z ; Yang, Qiming
> 
> Subject: [PATCH] net/ice: updated 23.03 recommended matching list
> 
> Signed-off-by: Qiming Yang 
> ---
>  doc/guides/nics/ice.rst | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst index
> bcc1cab491..c351c6bd74 100644
> --- a/doc/guides/nics/ice.rst
> +++ b/doc/guides/nics/ice.rst
> @@ -68,6 +68,8 @@ The detailed information can refer to chapter Tested
> Platforms/Tested NICs in re
> 
> +---+---+-+---+--+---+
> |22.11  | 1.10.1|  1.3.30 |  1.3.37   |1.3.10
> |4.1|
> 
> +---+---+-+---+--+---+
> +   |23.03  | 1.11.1|  1.3.30 |  1.3.40   |1.3.10
> |4.2|
> +
> + +---+---+-+---+---
> + ---+---+
> 
> 
>  Configuration
> --
> 2.25.1

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi


RE: [PATCH v3 3/3] net/iavf: support Rx timestamp offload on SSE

2023-04-27 Thread Tang, Yaqi


> -Original Message-
> From: Zeng, ZhichaoX 
> Sent: Thursday, April 27, 2023 11:14 AM
> To: dev@dpdk.org
> Cc: Zhang, Qi Z ; Tang, Yaqi ;
> Zeng, ZhichaoX ; Richardson, Bruce
> ; Konstantin Ananyev
> ; Wu, Jingjing ;
> Xing, Beilei 
> Subject: [PATCH v3 3/3] net/iavf: support Rx timestamp offload on SSE
> 
> This patch enables Rx timestamp offload on SSE data path.
> 
> Enable timestamp offload with the command '--enable-rx-timestamp', pay
> attention that getting Rx timestamp offload will drop the performance.
> 
> Signed-off-by: Zhichao Zeng 
> 
> ---
> v3: logging with driver dedicated macro
> ---
> v2: fix compile warning and timestamp error
> ---

Functional test passed. Cover SSE, AVX2 and AVX512 paths.

Tested-by: Yaqi Tang 


RE: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions

2023-04-27 Thread Morten Brørup
> From: Zhang, Qi Z [mailto:qi.z.zh...@intel.com]
> Sent: Friday, 28 April 2023 05.44
> 
> > From: Min Zhou 
> > Sent: Monday, April 24, 2023 5:06 PM
> >
> > Segmentation fault has been observed while running the
> > ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000
> > processor which has 64 cores and 4 NUMA nodes.
> >
> > From the ixgbe_recv_pkts_lro() function, we found that as long as the first
> > packet has the EOP bit set, and the length of this packet is less than or
> equal
> > to rxq->crc_len, the segmentation fault will definitely happen even though
> > on the other platforms, such as X86.
> >
> > Because when processd the first packet the first_seg->next will be NULL, if
> at
> > the same time this packet has the EOP bit set and its length is less than or
> > equal to rxq->crc_len, the following loop will be excecuted:
> >
> > for (lp = first_seg; lp->next != rxm; lp = lp->next)
> > ;
> >
> > We know that the first_seg->next will be NULL under this condition. So the
> > expression of lp->next->next will cause the segmentation fault.
> >
> > Normally, the length of the first packet with EOP bit set will be greater
> than
> > rxq->crc_len. However, the out-of-order execution of CPU may make the
> > read ordering of the status and the rest of the descriptor fields in this
> > function not be correct. The related codes are as following:
> >
> > rxdp = &rx_ring[rx_id];
> >  #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
> >
> > if (!(staterr & IXGBE_RXDADV_STAT_DD))
> > break;
> >
> >  #2 rxd = *rxdp;
> >
> > The sentence #2 may be executed before sentence #1. This action is likely to
> > make the ready packet zero length. If the packet is the first packet and has
> > the EOP bit set, the above segmentation fault will happen.
> >
> > So, we should add rte_rmb() to ensure the read ordering be correct. We also
> > did the same thing in the ixgbe_recv_pkts() function to make the rxd data be
> > valid even thougth we did not find segmentation fault in this function.
> >
> > Signed-off-by: Min Zhou 
> > ---
> > v2:
> > - Make the calling of rte_rmb() for all platforms
> > ---
> >  drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> > index c9d6ca9efe..302a5ab7ff 100644
> > --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf
> > **rx_pkts,
> > staterr = rxdp->wb.upper.status_error;
> > if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD)))
> > break;
> > +
> > +   rte_rmb();
> 
> So "volatile" does not prevent re-order with Loongson compiler?

"Volatile" does not prevent re-ordering on any compiler. "Volatile" only 
prevents caching of the variable marked volatile.

https://wiki.sei.cmu.edu/confluence/display/c/CON02-C.+Do+not+use+volatile+as+a+synchronization+primitive

Thinking out loud: I don't know the performance cost of rte_rmb(); perhaps 
using atomic accesses with the optimal memory ordering would be a better 
solution in the long term.

> 
> 
> > rxd = *rxdp;
> >
> > /*
> > @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct
> > rte_mbuf **rx_pkts, uint16_t nb_pkts,
> > if (!(staterr & IXGBE_RXDADV_STAT_DD))
> > break;
> >
> > +   rte_rmb();
> > rxd = *rxdp;
> >
> > PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
> > --
> > 2.31.1



Re: [PATCH 5/7] app/testpmd: add setting and querying of LLRS FEC mode

2023-04-27 Thread Singh, Aman Deep



On 4/8/2023 7:57 AM, Dongdong Liu wrote:

From: Jie Hai 

This patch supports setting and querying of LLRS FEC mode.

Signed-off-by: Jie Hai 
Signed-off-by: Dongdong Liu 


Acked-by: Aman Singh 


---
  app/test-pmd/cmdline.c  | 5 -
  app/test-pmd/config.c   | 4 
  doc/guides/testpmd_app_ug/testpmd_funcs.rst | 2 +-
  3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 7b20bef4e9..38fa0f507c 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -11973,6 +11973,9 @@ cmd_show_fec_mode_parsed(void *parsed_result,
case RTE_ETH_FEC_MODE_CAPA_MASK(RS):
strlcpy(buf, "rs", sizeof(buf));
break;
+   case RTE_ETH_FEC_MODE_CAPA_MASK(LLRS):
+   strlcpy(buf, "llrs", sizeof(buf));
+   break;
default:
return;
}
@@ -12068,7 +12071,7 @@ cmd_set_port_fec_mode_parsed(
  static cmdline_parse_inst_t cmd_set_fec_mode = {
.f = cmd_set_port_fec_mode_parsed,
.data = NULL,
-   .help_str = "set port  fec_mode auto|off|rs|baser",
+   .help_str = "set port  fec_mode auto|off|rs|baser|llrs",
.tokens = {
(void *)&cmd_set_port_fec_mode_set,
(void *)&cmd_set_port_fec_mode_port,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 096c218c12..f306d678f9 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -170,6 +170,10 @@ static const struct {
.mode = RTE_ETH_FEC_RS,
.name = "rs",
},
+   {
+   .mode = RTE_ETH_FEC_LLRS,
+   .name = "llrs",
+   },
  };
  
  static const struct {

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 8f23847859..fa1cea3ed6 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1800,7 +1800,7 @@ Set fec mode
  
  Set fec mode for a specific port::
  
-  testpmd> set port (port_id) fec_mode auto|off|rs|baser

+  testpmd> set port (port_id) fec_mode auto|off|rs|baser|llrs
  
  Config Sample actions list

  ~~