Re: [PATCH v2] lib/hash: feature reclaim defer queue

2024-03-04 Thread Abdullah Ömer Yamaç
Just one more question.

On Sun, Mar 3, 2024 at 10:14 PM Honnappa Nagarahalli <
honnappa.nagaraha...@arm.com> wrote:

> Hello Abdullah,
> Thank you for the patch, few comments inline.
>
> The short commit log could be changed as follows:
>
> "lib/hash: add defer queue reclaim API”
>
> > On Mar 2, 2024, at 3:27 PM, Abdullah Ömer Yamaç 
> wrote:
> >
> > This patch adds a new feature to the hash library to allow the user to
> > reclaim the defer queue. This is useful when the user wants to force
> > reclaim resources that are not being used. This API is only available
> > if the RCU is enabled.
> >
> > Signed-off-by: Abdullah Ömer Yamaç 
> > Acked-by: Honnappa Nagarahalli 
> Please add this only after you get an explicit Ack on the patch.
>
> > ---
> > lib/hash/rte_cuckoo_hash.c | 23 +++
> > lib/hash/rte_hash.h| 14 ++
> > lib/hash/version.map   |  7 +++
> > 3 files changed, 44 insertions(+)
> >
> > diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> > index 9cf94645f6..254fa80cc5 100644
> > --- a/lib/hash/rte_cuckoo_hash.c
> > +++ b/lib/hash/rte_cuckoo_hash.c
> > @@ -1588,6 +1588,27 @@ rte_hash_rcu_qsbr_add(struct rte_hash *h, struct
> rte_hash_rcu_config *cfg)
> > return 0;
> > }
> >
> > +int
> > +rte_hash_rcu_qsbr_dq_reclaim(struct rte_hash *h)
> We need to add freed, pending and available parameters to this API. I
> think this information will be helpful for the users. For ex: in your use
> case, you could use the pending value to calculate the available hash
> entries.
>
> The second parameter, "Maximum number of resources to free.", should be
available also? I set this value to " h->hash_rcu_cfg->max_reclaim_size",
but it can be a parameter in addition to the above parameters

> > +{
> > + int ret;
> > +
> > + if (h->hash_rcu_cfg == NULL || h->dq == NULL) {
> We can skip NULL check for h->dq as the RCU reclaim API makes the same
> check.
>
> > + rte_errno = EINVAL;
> > + return -1;
> > + }
> > +
> > + ret = rte_rcu_qsbr_dq_reclaim(h->dq,
> h->hash_rcu_cfg->max_reclaim_size, NULL, NULL, NULL);
> > + if (ret != 0) {
> > + HASH_LOG(ERR,
> > + "%s: could not reclaim the defer queue in hash table",
> > + __func__);
> > + return -1;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static inline void
> > remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt,
> > unsigned int i)
> > diff --git a/lib/hash/rte_hash.h b/lib/hash/rte_hash.h
> > index 7ecc02..c119477d50 100644
> > --- a/lib/hash/rte_hash.h
> > +++ b/lib/hash/rte_hash.h
> > @@ -674,6 +674,21 @@ rte_hash_iterate(const struct rte_hash *h, const
> void **key, void **data, uint32
> >  */
> > int rte_hash_rcu_qsbr_add(struct rte_hash *h, struct rte_hash_rcu_config
> *cfg);
> >
> > +/**
> > + * Reclaim resources from the defer queue.
> > + * This API reclaim the resources from the defer queue if rcu is
> enabled.
> > + *
> > + * @param h
> > + *   the hash object to reclaim resources
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with error code set in rte_errno.
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - invalid pointer or invalid rcu mode
> We can remove the ‘invalid rcd mode’.
>
> > + */
> > +__rte_experimental
> > +int rte_hash_rcu_qsbr_dq_reclaim(struct rte_hash *h);
> > +
> > #ifdef __cplusplus
> > }
> > #endif
> > diff --git a/lib/hash/version.map b/lib/hash/version.map
> > index 6b2afebf6b..cec0e8fc67 100644
> > --- a/lib/hash/version.map
> > +++ b/lib/hash/version.map
> > @@ -48,3 +48,9 @@ DPDK_24 {
> >
> > local: *;
> > };
> > +
> > +EXPERIMENTAL {
> > + global:
> > +
> > + rte_hash_rcu_qsbr_dq_reclaim;
> > +}
> > \ No newline at end of file
> > --
> > 2.34.1
> >
>
>


Re: [PATCH 1/3] net/nfp: add the elf module

2024-03-04 Thread Ferruh Yigit
On 3/4/2024 1:13 AM, Chaoyong He wrote:
>> On 2/28/2024 10:18 PM, Stephen Hemminger wrote:
>>> On Tue, 27 Feb 2024 19:15:49 +0800
>>> Chaoyong He  wrote:
>>>
 From: Peng Zhang 

 Add the elf module, which can get mip information from the firmware
 ELF file.

 Signed-off-by: Peng Zhang 
 Reviewed-by: Chaoyong He 
 Reviewed-by: Long Wu 
 ---
>>>
>>> Why are you rolling your own ELF parser?
>>> There are libraries to do this such as libelf.
>>> Libelf is already used in the BPF part of DPDK.
>>>
>>
>> There cons and pros to depend external library, as this is in the limited 
>> scope of
>> the driver I am less concerned about local code.
>>
>> Chaoyong, what is your take on the issue, did you consider using libelf 
>> library
>> option?
> 
> Firstly, the nffw firmware file is a customed ELF file, we are not sure the 
> libelf library can meet our needs totally.
> Then, we share the same logic with our BSP code, and we don't want to have 
> two different logic for the same requirement.
>

Looks reasonable to me, thanks for clarification.



[PATCH 04/33] net/ena: sub-optimal configuration notifications support

2024-03-04 Thread shaibran
From: Shai Brandes 

ENA device will send asynchronous notifications to the
driver in order to notify users about sub-optimal configurations
and refer them to public AWS documentation for further action.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_24_03.rst|  1 +
 .../net/ena/base/ena_defs/ena_admin_defs.h| 11 +++-
 drivers/net/ena/ena_ethdev.c  | 26 +--
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index fb66d67d32..f47073c7dc 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -104,6 +104,7 @@ New Features
 * **Updated Amazon ena (Elastic Network Adapter) net driver.**
 
   * Removed the reporting of `rx_overruns` errors from xstats and instead 
updated `imissed` stat with its value.
+  * Added support for sub-optimal configuration notifications from the device.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/base/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/base/ena_defs/ena_admin_defs.h
index fa43e22918..4172916551 100644
--- a/drivers/net/ena/base/ena_defs/ena_admin_defs.h
+++ b/drivers/net/ena/base/ena_defs/ena_admin_defs.h
@@ -1214,7 +1214,8 @@ enum ena_admin_aenq_group {
ENA_ADMIN_NOTIFICATION  = 3,
ENA_ADMIN_KEEP_ALIVE= 4,
ENA_ADMIN_REFRESH_CAPABILITIES  = 5,
-   ENA_ADMIN_AENQ_GROUPS_NUM   = 6,
+   ENA_ADMIN_CONF_NOTIFICATIONS= 6,
+   ENA_ADMIN_AENQ_GROUPS_NUM   = 7,
 };
 
 enum ena_admin_aenq_notification_syndrome {
@@ -1251,6 +1252,14 @@ struct ena_admin_aenq_keep_alive_desc {
uint32_t rx_overruns_high;
 };
 
+struct ena_admin_aenq_conf_notifications_desc {
+   struct ena_admin_aenq_common_desc aenq_common_desc;
+
+   uint64_t notifications_bitmap;
+
+   uint64_t reserved;
+};
+
 struct ena_admin_ena_mmio_req_read_less_resp {
uint16_t req_id;
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index d3f395a832..3157237c0d 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -36,6 +36,10 @@
 
 #define ENA_MIN_RING_DESC  128
 
+#define BITS_PER_BYTE 8
+
+#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
+
 /*
  * We should try to keep ENA_CLEANUP_BUF_SIZE lower than
  * RTE_MEMPOOL_CACHE_MAX_SIZE, so we can fit this in mempool local cache.
@@ -1842,7 +1846,8 @@ static int ena_device_init(struct ena_adapter *adapter,
  BIT(ENA_ADMIN_NOTIFICATION) |
  BIT(ENA_ADMIN_KEEP_ALIVE) |
  BIT(ENA_ADMIN_FATAL_ERROR) |
- BIT(ENA_ADMIN_WARNING);
+ BIT(ENA_ADMIN_WARNING) |
+ BIT(ENA_ADMIN_CONF_NOTIFICATIONS);
 
aenq_groups &= get_feat_ctx->aenq.supported_groups;
 
@@ -4021,6 +4026,22 @@ static void ena_keep_alive(void *adapter_data,
adapter->dev_stats.tx_drops = tx_drops;
 }
 
+static void ena_suboptimal_configuration(__rte_unused void *adapter_data,
+struct ena_admin_aenq_entry *aenq_e)
+{
+   struct ena_admin_aenq_conf_notifications_desc *desc;
+   int bit, num_bits;
+
+   desc = (struct ena_admin_aenq_conf_notifications_desc *)aenq_e;
+   num_bits = BITS_PER_TYPE(desc->notifications_bitmap);
+   for (bit = 0; bit < num_bits; bit++) {
+   if (desc->notifications_bitmap & RTE_BIT64(bit)) {
+   PMD_DRV_LOG(WARNING,
+   "Sub-optimal configuration notification code: 
%d\n", bit + 1);
+   }
+   }
+}
+
 /**
  * This handler will called for unknown event group or unimplemented handlers
  **/
@@ -4035,7 +4056,8 @@ static struct ena_aenq_handlers aenq_handlers = {
.handlers = {
[ENA_ADMIN_LINK_CHANGE] = ena_update_on_link_change,
[ENA_ADMIN_NOTIFICATION] = ena_notification,
-   [ENA_ADMIN_KEEP_ALIVE] = ena_keep_alive
+   [ENA_ADMIN_KEEP_ALIVE] = ena_keep_alive,
+   [ENA_ADMIN_CONF_NOTIFICATIONS] = ena_suboptimal_configuration
},
.unimplemented_handler = unimplemented_aenq_handler
 };
-- 
2.17.1



[PATCH 01/33] net/ena: rework the metrics multi-process functions

2024-03-04 Thread shaibran
From: Shai Brandes 

1. Changed the rte_memcpy call to use the precomputed buf_size.
2. Removed redundant address operators (ampersand symbol)
   when providing memcpy source address parameter.
3. Code style related change.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index beb17c4125..6d500bfa78 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -531,8 +531,8 @@ __extension__ ({
 __extension__ ({
ENA_TOUCH(rsp);
ENA_TOUCH(ena_dev);
-   if (stats != (struct ena_admin_eni_stats *)&adapter->metrics_stats)
-   rte_memcpy(stats, &adapter->metrics_stats, sizeof(*stats));
+   if (stats != (struct ena_admin_eni_stats *)adapter->metrics_stats)
+   rte_memcpy(stats, adapter->metrics_stats, sizeof(*stats));
 }),
struct ena_com_dev *ena_dev, struct ena_admin_eni_stats *stats);
 
@@ -590,9 +590,8 @@ __extension__ ({
 __extension__ ({
ENA_TOUCH(rsp);
ENA_TOUCH(ena_dev);
-   ENA_TOUCH(buf_size);
-   if (buf != (char *)&adapter->metrics_stats)
-   rte_memcpy(buf, &adapter->metrics_stats, adapter->metrics_num * 
sizeof(uint64_t));
+   if (buf != (char *)adapter->metrics_stats)
+   rte_memcpy(buf, adapter->metrics_stats, buf_size);
 }),
struct ena_com_dev *ena_dev, char *buf, size_t buf_size);
 
@@ -4088,7 +4087,7 @@ ena_mp_primary_handle(const struct rte_mp_msg *mp_msg, 
const void *peer)
case ENA_MP_CUSTOMER_METRICS_GET:
res = ena_com_get_customer_metrics(ena_dev,
(char *)adapter->metrics_stats,
-   sizeof(uint64_t) * adapter->metrics_num);
+   adapter->metrics_num * sizeof(uint64_t));
break;
case ENA_MP_SRD_STATS_GET:
res = ena_com_get_ena_srd_info(ena_dev,
-- 
2.17.1



[PATCH 02/33] net/ena: report new supported link speed capabilities

2024-03-04 Thread shaibran
From: Shai Brandes 

Updated the rte_eth_dev_info device supported speed
bitmap to include 200Gbps and 400Gbps capabilities.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 6d500bfa78..b1e7de0541 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -2542,7 +2542,9 @@ static int ena_infos_get(struct rte_eth_dev *dev,
RTE_ETH_LINK_SPEED_25G  |
RTE_ETH_LINK_SPEED_40G  |
RTE_ETH_LINK_SPEED_50G  |
-   RTE_ETH_LINK_SPEED_100G;
+   RTE_ETH_LINK_SPEED_100G |
+   RTE_ETH_LINK_SPEED_200G |
+   RTE_ETH_LINK_SPEED_400G;
 
/* Inform framework about available features */
dev_info->rx_offload_capa = ena_get_rx_port_offloads(adapter);
-- 
2.17.1



[PATCH 00/33] net/ena: v2.9.0 driver release

2024-03-04 Thread shaibran
From: Shai Brandes 

Hi all, the ena v2.9.0 release introduces:
1. HAL upgrade:
   - renamed the 'base' folder to be 'hal'
   - separated the HAL patches instead of a bulk update.
2. Restructured ena stats and metrics.
3. Restructured the LLQ configuration:
   - configurable via devarg.
   - support device recommendation.
   - restructure the logic in driver.
4. Added support for the admin queue to work only in poll-mode
   - configurable via devarg.
   - allows to bind ports to uio_pci_generic kernel driver.
5. Reworked the device close to exhaust interrupt callbacks and alarms.
6. Fixed a bug in fast mbuf free.

Best regards,
Shai

Shai Brandes (33):
  net/ena: rework the metrics multi-process functions
  net/ena: report new supported link speed capabilities
  net/ena: update imissed stat with Rx overruns
  net/ena: sub-optimal configuration notifications support
  net/ena: fix fast mbuf free
  net/ena: rename base folder to hal
  net/ena: restructure the llq policy setting process
  net/ena/hal: exponential backoff exp limit
  net/ena/hal: add a new csum offload bit
  net/ena/hal: added a bus parameter to ena memcpy macro
  net/ena/hal: optimize Rx ring submission queue
  net/ena/hal: rename fields in completion descriptors
  net/ena/hal: use correct read once on u8 field
  net/ena/hal: add completion descriptor corruption check
  net/ena/hal: malformed Tx descriptor error reason
  net/ena/hal: phc feature modifications
  net/ena/hal: restructure interrupt handling
  net/ena/hal: add unlikely to error checks
  net/ena/hal: missing admin interrupt reset reason
  net/ena/hal: check for existing keep alive notification
  net/ena/hal: modify memory barrier comment
  net/ena/hal: rework Rx ring submission queue
  net/ena/hal: remove operating system type enum
  net/ena/hal: handle command abort
  net/ena/hal: add support for device reset request
  net/ena: cosmetic changes
  net/ena/hal: modify customer metrics memory management
  net/ena/hal: cosmetic changes
  net/ena: update device-preferred size of rings
  net/ena: exhaust interrupt callbacks in device close
  net/ena: support max large llq depth from the device
  net/ena: control path pure polling mode
  net/ena: upgrade driver version to 2.9.0

 doc/guides/nics/ena.rst   |  58 ++--
 doc/guides/rel_notes/release_24_03.rst|  11 +
 drivers/net/ena/ena_ethdev.c  | 316 --
 drivers/net/ena/ena_ethdev.h  |  17 +-
 drivers/net/ena/{base => hal}/ena_com.c   | 240 +
 drivers/net/ena/{base => hal}/ena_com.h   |  53 ++-
 .../{base => hal}/ena_defs/ena_admin_defs.h   |  92 +++--
 .../{base => hal}/ena_defs/ena_common_defs.h  |   0
 .../{base => hal}/ena_defs/ena_eth_io_defs.h  |  49 ++-
 .../ena/{base => hal}/ena_defs/ena_gen_info.h |   0
 .../ena/{base => hal}/ena_defs/ena_includes.h |   0
 .../{base => hal}/ena_defs/ena_regs_defs.h|   3 +
 drivers/net/ena/{base => hal}/ena_eth_com.c   |  56 ++--
 drivers/net/ena/{base => hal}/ena_eth_com.h   |  14 +-
 drivers/net/ena/{base => hal}/ena_plat.h  |   0
 drivers/net/ena/{base => hal}/ena_plat_dpdk.h |   9 +-
 drivers/net/ena/meson.build   |   6 +-
 17 files changed, 666 insertions(+), 258 deletions(-)
 rename drivers/net/ena/{base => hal}/ena_com.c (94%)
 rename drivers/net/ena/{base => hal}/ena_com.h (96%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_admin_defs.h (96%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_common_defs.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_eth_io_defs.h (95%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_gen_info.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_includes.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_regs_defs.h (97%)
 rename drivers/net/ena/{base => hal}/ena_eth_com.c (93%)
 rename drivers/net/ena/{base => hal}/ena_eth_com.h (94%)
 rename drivers/net/ena/{base => hal}/ena_plat.h (100%)
 rename drivers/net/ena/{base => hal}/ena_plat_dpdk.h (97%)

-- 
2.17.1



RE: [RFC 2/7] eal: add generic bit manipulation macros

2024-03-04 Thread Heng Wang
Hi Mattias,
  I have a comment about the _Generic. What if the user gives uint8_t * or 
uint16_t * as the address. One improvement is that we could add a default 
branch in _Generic to throw a compiler error or assert false.
  Another question is what if nr >= sizeof(type) ? What if you do, for example, 
(uint32_t)1 << 35? Maybe we could add an assert in the implementation?

Regards,
Heng

-Original Message-
From: Mattias Rönnblom  
Sent: Saturday, March 2, 2024 2:53 PM
To: dev@dpdk.org
Cc: hof...@lysator.liu.se; Heng Wang ; Mattias Rönnblom 

Subject: [RFC 2/7] eal: add generic bit manipulation macros

Add bit-level test/set/clear/assign macros operating on both 32-bit and 64-bit 
words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom 
---
 lib/eal/include/rte_bitops.h | 81 
 1 file changed, 81 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h index 
9a368724d5..afd0f11033 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -107,6 +107,87 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr) \
+   _Generic((addr),\
+uint32_t *: rte_bit_test32,\
+uint64_t *: rte_bit_test64)(addr, nr)
+
+/**
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)  \
+   _Generic((addr),\
+uint32_t *: rte_bit_set32, \
+uint64_t *: rte_bit_set64)(addr, nr)
+
+/**
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)\
+   _Generic((addr),\
+uint32_t *: rte_bit_clear32,   \
+uint64_t *: rte_bit_clear64)(addr, nr)
+
+/**
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 
+64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)\
+   _Generic((addr),\
+uint32_t *: rte_bit_assign32,  \
+uint64_t *: rte_bit_assign64)(addr, nr, value)
+
 /**
  * Test if a particular bit in a 32-bit word is set.
  *
--
2.34.1



[PATCH 08/33] net/ena/hal: exponential backoff exp limit

2024-03-04 Thread shaibran
From: Shai Brandes 

limits the exponent in the exponential backoff
mechanism in order to avoid the value overflowing.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 6953a1fa33..31c37b0ab3 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -34,6 +34,8 @@
 
 #define ENA_REGS_ADMIN_INTR_MASK 1
 
+#define ENA_MAX_BACKOFF_DELAY_EXP 16U
+
 #define ENA_MIN_ADMIN_POLL_US 100
 
 #define ENA_MAX_ADMIN_POLL_US 5000
@@ -545,8 +547,9 @@ static int ena_com_comp_status_to_errno(struct 
ena_com_admin_queue *admin_queue,
 
 static void ena_delay_exponential_backoff_us(u32 exp, u32 delay_us)
 {
+   exp = ENA_MIN32(ENA_MAX_BACKOFF_DELAY_EXP, exp);
delay_us = ENA_MAX32(ENA_MIN_ADMIN_POLL_US, delay_us);
-   delay_us = ENA_MIN32(delay_us * (1U << exp), ENA_MAX_ADMIN_POLL_US);
+   delay_us = ENA_MIN32(ENA_MAX_ADMIN_POLL_US, delay_us * (1U << exp));
ENA_USLEEP(delay_us);
 }
 
-- 
2.17.1



[PATCH 03/33] net/ena: update imissed stat with Rx overruns

2024-03-04 Thread shaibran
From: Shai Brandes 

Depending on its acceleration support, the device updates
a different statistic when an ingress packet is dropped
because no buffers are available to hold it.
- In AWS instance types from later generations
'rx_overruns' is updated.
- Otherwise, in legacy instance types,
'rx_dropped_cnt' is updated.

That is, there is no need to report rx_overruns separately
as an xstat and the driver can simply sum up the two
self-contained counters as the 'imissed' statistic.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_24_03.rst | 4 
 drivers/net/ena/ena_ethdev.c   | 8 +---
 drivers/net/ena/ena_ethdev.h   | 1 -
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 879bb4944c..fb66d67d32 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -101,6 +101,10 @@ New Features
   * ``rte_flow_template_table_resize_complete()``.
 Complete table resize.
 
+* **Updated Amazon ena (Elastic Network Adapter) net driver.**
+
+  * Removed the reporting of `rx_overruns` errors from xstats and instead 
updated `imissed` stat with its value.
+
 * **Updated Atomic Rules' Arkville driver.**
 
   * Added support for Atomic Rules' TK242 packet-capture family of devices
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index b1e7de0541..d3f395a832 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -93,7 +93,6 @@ static const struct ena_stats ena_stats_global_strings[] = {
ENA_STAT_GLOBAL_ENTRY(dev_start),
ENA_STAT_GLOBAL_ENTRY(dev_stop),
ENA_STAT_GLOBAL_ENTRY(tx_drops),
-   ENA_STAT_GLOBAL_ENTRY(rx_overruns),
 };
 
 /*
@@ -4014,9 +4013,12 @@ static void ena_keep_alive(void *adapter_data,
tx_drops = ((uint64_t)desc->tx_drops_high << 32) | desc->tx_drops_low;
rx_overruns = ((uint64_t)desc->rx_overruns_high << 32) | 
desc->rx_overruns_low;
 
-   adapter->drv_stats->rx_drops = rx_drops;
+   /*
+* Depending on its acceleration support, the device updates a 
different statistic when
+* Rx packet is dropped because there are no available buffers to 
accommodate it.
+*/
+   adapter->drv_stats->rx_drops = rx_drops + rx_overruns;
adapter->dev_stats.tx_drops = tx_drops;
-   adapter->dev_stats.rx_overruns = rx_overruns;
 }
 
 /**
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 4988fbffb5..20b8307836 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -219,7 +219,6 @@ struct ena_stats_dev {
 * As a workaround it is being published as an extended statistic.
 */
u64 tx_drops;
-   u64 rx_overruns;
 };
 
 struct ena_stats_metrics {
-- 
2.17.1



[PATCH 09/33] net/ena/hal: add a new csum offload bit

2024-03-04 Thread shaibran
From: Shai Brandes 

Add a new driver supported feature bit for TX IPv6 checksum offload.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
index 4172916551..670e794c98 100644
--- a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
@@ -985,7 +985,8 @@ struct ena_admin_host_info {
 * 4 : rss_configurable_function_key
 * 5 : reserved
 * 6 : rx_page_reuse
-* 31:7 : reserved
+* 7 : tx_ipv6_csum_offload
+* 31:8 : reserved
 */
uint32_t driver_supported_features;
 };
@@ -1377,6 +1378,8 @@ struct ena_admin_phc_resp {
 #define ENA_ADMIN_HOST_INFO_RSS_CONFIGURABLE_FUNCTION_KEY_MASK BIT(4)
 #define ENA_ADMIN_HOST_INFO_RX_PAGE_REUSE_SHIFT 6
 #define ENA_ADMIN_HOST_INFO_RX_PAGE_REUSE_MASK  BIT(6)
+#define ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_SHIFT  7
+#define ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_MASK   BIT(7)
 
 /* feature_rss_ind_table */
 #define ENA_ADMIN_FEATURE_RSS_IND_TABLE_ONE_ENTRY_UPDATE_MASK BIT(0)
@@ -1851,6 +1854,20 @@ static inline void 
set_ena_admin_host_info_rx_page_reuse(struct ena_admin_host_i
ENA_ADMIN_HOST_INFO_RX_PAGE_REUSE_MASK;
 }
 
+static inline
+uint32_t get_ena_admin_host_info_tx_ipv6_csum_offload(const struct 
ena_admin_host_info *p)
+{
+   return (p->driver_supported_features & 
ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_MASK) >>
+   ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_SHIFT;
+}
+
+static inline void set_ena_admin_host_info_tx_ipv6_csum_offload(struct 
ena_admin_host_info *p,
+uint32_t val)
+{
+   p->driver_supported_features |= (val << 
ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_SHIFT) &
+
ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_MASK;
+}
+
 static inline uint8_t 
get_ena_admin_feature_rss_ind_table_one_entry_update(const struct 
ena_admin_feature_rss_ind_table *p)
 {
return p->flags & ENA_ADMIN_FEATURE_RSS_IND_TABLE_ONE_ENTRY_UPDATE_MASK;
-- 
2.17.1



[PATCH 05/33] net/ena: fix fast mbuf free

2024-03-04 Thread shaibran
From: Shai Brandes 

In case the application enables fast mbuf release optimization,
the driver releases 256 TX mbufs in bulk upon reaching the
TX free threshold.
The existing implementation utilizes rte_mempool_put_bulk for bulk
freeing TXs, which exclusively supports direct mbufs.
In case the application transmits indirect bufs, the driver must
also decrement the mbuf reference count and unlink the mbuf segment.
For such case, the driver should employ rte_pktmbuf_free_bulk.

Fixes: c339f53823f3 ("net/ena: support fast mbuf free")
Cc: sta...@dpdk.org

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_24_03.rst | 1 +
 drivers/net/ena/ena_ethdev.c   | 6 ++
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index f47073c7dc..6b73d4fedf 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -105,6 +105,7 @@ New Features
 
   * Removed the reporting of `rx_overruns` errors from xstats and instead 
updated `imissed` stat with its value.
   * Added support for sub-optimal configuration notifications from the device.
+  * Restructured fast release of mbufs when RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE 
optimization is enabled.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 3157237c0d..537ee9f8c3 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -3122,8 +3122,7 @@ ena_tx_cleanup_mbuf_fast(struct rte_mbuf **mbufs_to_clean,
m_next = mbuf->next;
mbufs_to_clean[mbuf_cnt++] = mbuf;
if (mbuf_cnt == buf_size) {
-   rte_mempool_put_bulk(mbufs_to_clean[0]->pool, (void 
**)mbufs_to_clean,
-   (unsigned int)mbuf_cnt);
+   rte_pktmbuf_free_bulk(mbufs_to_clean, mbuf_cnt);
mbuf_cnt = 0;
}
mbuf = m_next;
@@ -3191,8 +3190,7 @@ static int ena_tx_cleanup(void *txp, uint32_t 
free_pkt_cnt)
}
 
if (mbuf_cnt != 0)
-   rte_mempool_put_bulk(mbufs_to_clean[0]->pool,
-   (void **)mbufs_to_clean, mbuf_cnt);
+   rte_pktmbuf_free_bulk(mbufs_to_clean, mbuf_cnt);
 
/* Notify completion handler that full cleanup was performed */
if (free_pkt_cnt == 0 || total_tx_pkts < cleanup_budget)
-- 
2.17.1



[PATCH 06/33] net/ena: rename base folder to hal

2024-03-04 Thread shaibran
From: Shai Brandes 

Changed the base HAL folder to hal.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/{base => hal}/ena_com.c  | 0
 drivers/net/ena/{base => hal}/ena_com.h  | 0
 drivers/net/ena/{base => hal}/ena_defs/ena_admin_defs.h  | 0
 drivers/net/ena/{base => hal}/ena_defs/ena_common_defs.h | 0
 drivers/net/ena/{base => hal}/ena_defs/ena_eth_io_defs.h | 0
 drivers/net/ena/{base => hal}/ena_defs/ena_gen_info.h| 0
 drivers/net/ena/{base => hal}/ena_defs/ena_includes.h| 0
 drivers/net/ena/{base => hal}/ena_defs/ena_regs_defs.h   | 0
 drivers/net/ena/{base => hal}/ena_eth_com.c  | 0
 drivers/net/ena/{base => hal}/ena_eth_com.h  | 0
 drivers/net/ena/{base => hal}/ena_plat.h | 0
 drivers/net/ena/{base => hal}/ena_plat_dpdk.h| 0
 drivers/net/ena/meson.build  | 6 +++---
 13 files changed, 3 insertions(+), 3 deletions(-)
 rename drivers/net/ena/{base => hal}/ena_com.c (100%)
 rename drivers/net/ena/{base => hal}/ena_com.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_admin_defs.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_common_defs.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_eth_io_defs.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_gen_info.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_includes.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_regs_defs.h (100%)
 rename drivers/net/ena/{base => hal}/ena_eth_com.c (100%)
 rename drivers/net/ena/{base => hal}/ena_eth_com.h (100%)
 rename drivers/net/ena/{base => hal}/ena_plat.h (100%)
 rename drivers/net/ena/{base => hal}/ena_plat_dpdk.h (100%)

diff --git a/drivers/net/ena/base/ena_com.c b/drivers/net/ena/hal/ena_com.c
similarity index 100%
rename from drivers/net/ena/base/ena_com.c
rename to drivers/net/ena/hal/ena_com.c
diff --git a/drivers/net/ena/base/ena_com.h b/drivers/net/ena/hal/ena_com.h
similarity index 100%
rename from drivers/net/ena/base/ena_com.h
rename to drivers/net/ena/hal/ena_com.h
diff --git a/drivers/net/ena/base/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_admin_defs.h
rename to drivers/net/ena/hal/ena_defs/ena_admin_defs.h
diff --git a/drivers/net/ena/base/ena_defs/ena_common_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_common_defs.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_common_defs.h
rename to drivers/net/ena/hal/ena_defs/ena_common_defs.h
diff --git a/drivers/net/ena/base/ena_defs/ena_eth_io_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_eth_io_defs.h
rename to drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h
diff --git a/drivers/net/ena/base/ena_defs/ena_gen_info.h 
b/drivers/net/ena/hal/ena_defs/ena_gen_info.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_gen_info.h
rename to drivers/net/ena/hal/ena_defs/ena_gen_info.h
diff --git a/drivers/net/ena/base/ena_defs/ena_includes.h 
b/drivers/net/ena/hal/ena_defs/ena_includes.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_includes.h
rename to drivers/net/ena/hal/ena_defs/ena_includes.h
diff --git a/drivers/net/ena/base/ena_defs/ena_regs_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_regs_defs.h
rename to drivers/net/ena/hal/ena_defs/ena_regs_defs.h
diff --git a/drivers/net/ena/base/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
similarity index 100%
rename from drivers/net/ena/base/ena_eth_com.c
rename to drivers/net/ena/hal/ena_eth_com.c
diff --git a/drivers/net/ena/base/ena_eth_com.h 
b/drivers/net/ena/hal/ena_eth_com.h
similarity index 100%
rename from drivers/net/ena/base/ena_eth_com.h
rename to drivers/net/ena/hal/ena_eth_com.h
diff --git a/drivers/net/ena/base/ena_plat.h b/drivers/net/ena/hal/ena_plat.h
similarity index 100%
rename from drivers/net/ena/base/ena_plat.h
rename to drivers/net/ena/hal/ena_plat.h
diff --git a/drivers/net/ena/base/ena_plat_dpdk.h 
b/drivers/net/ena/hal/ena_plat_dpdk.h
similarity index 100%
rename from drivers/net/ena/base/ena_plat_dpdk.h
rename to drivers/net/ena/hal/ena_plat_dpdk.h
diff --git a/drivers/net/ena/meson.build b/drivers/net/ena/meson.build
index d02ed3f64f..c41f1b04a0 100644
--- a/drivers/net/ena/meson.build
+++ b/drivers/net/ena/meson.build
@@ -10,10 +10,10 @@ endif
 sources = files(
 'ena_ethdev.c',
 'ena_rss.c',
-'base/ena_com.c',
-'base/ena_eth_com.c',
+'hal/ena_com.c',
+'hal/ena_eth_com.c',
 )
 
 deps += ['timer']
 
-includes += include_directories('base', 'base/ena_defs')
+includes += include_directories('hal', 'hal/ena_defs')
-- 
2.17.1



[PATCH 11/33] net/ena/hal: optimize Rx ring submission queue

2024-03-04 Thread shaibran
From: Shai Brandes 

RX ring submission queue descriptors are always located in host memory
This optimization replaces the generic descriptor retrieval method
with a tailored method for host memory type descriptors to avoid
unnecessary if statement.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
index d6811c7b48..dc2935a53e 100644
--- a/drivers/net/ena/hal/ena_eth_com.c
+++ b/drivers/net/ena/hal/ena_eth_com.c
@@ -631,9 +631,8 @@ int ena_com_add_single_rx_desc(struct ena_com_io_sq *io_sq,
if (unlikely(!ena_com_sq_have_enough_space(io_sq, 1)))
return ENA_COM_NO_SPACE;
 
-   desc = get_sq_desc(io_sq);
-   if (unlikely(!desc))
-   return ENA_COM_FAULT;
+   /* virt_addr allocation success is checked before calling this function 
*/
+   desc = get_sq_desc_regular_queue(io_sq);
 
memset(desc, 0x0, sizeof(struct ena_eth_io_rx_desc));
 
-- 
2.17.1



[PATCH 07/33] net/ena: restructure the llq policy setting process

2024-03-04 Thread shaibran
From: Shai Brandes 

The driver will set the size of the LLQ header size according to the
recommendation from the device.
Replaced `enable_llq` and `large_llq_hdr` devargs with
a new devarg `llq_policy` that accepts the following values:
0 - Disable LLQ.
Use with extreme caution as it leads to a huge performance
degradation on AWS instances from 6th generation onwards.
1 - Accept device recommended LLQ policy (Default).
Device can recommend normal or large LLQ policy.
2 - Enforce normal LLQ policy.
3 - Enforce large LLQ policy.
Required for packets with header that exceed 96 bytes on
AWS instances prior to 5th generation.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/nics/ena.rst|  21 ++---
 doc/guides/rel_notes/release_24_03.rst |   1 +
 drivers/net/ena/ena_ethdev.c   | 110 +
 drivers/net/ena/ena_ethdev.h   |  11 ++-
 4 files changed, 77 insertions(+), 66 deletions(-)

diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index b039e75ead..53c9341859 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -107,11 +107,15 @@ Configuration
 Runtime Configuration
 ^
 
-   * **large_llq_hdr** (default 0)
+   * **llq_policy** (default 1)
 
- Enables or disables usage of large LLQ headers. This option will have
- effect only if the device also supports large LLQ headers. Otherwise, the
- default value will be used.
+ Controls whether use device recommended header policy or override it.
+ 0 - Disable LLQ.
+ **Use with extreme caution as it leads to a huge performance
+ degradation on AWS instances from 6th generation onwards.**
+ 1 - Accept device recommended LLQ policy (Default).
+ 2 - Enforce normal LLQ policy.
+ 3 - Enforce large LLQ policy.
 
* **miss_txc_to** (default 5)
 
@@ -122,15 +126,6 @@ Runtime Configuration
  timer service. Setting this parameter to 0 disables this feature. Maximum
  allowed value is 60 seconds.
 
-   * **enable_llq** (default 1)
-
- Determines whenever the driver should use the LLQ (if it's available) or
- not.
-
- **NOTE: On the 6th generation AWS instances disabling LLQ may lead to a
- huge performance degradation. In general disabling LLQ is highly not
- recommended!**
-
 ENA Configuration Parameters
 
 
diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 6b73d4fedf..2a22bb07ed 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -106,6 +106,7 @@ New Features
   * Removed the reporting of `rx_overruns` errors from xstats and instead 
updated `imissed` stat with its value.
   * Added support for sub-optimal configuration notifications from the device.
   * Restructured fast release of mbufs when RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE 
optimization is enabled.
+  * Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg 
`llq_policy`.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 537ee9f8c3..2414f631c8 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -40,6 +40,8 @@
 
 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
 
+#define DECIMAL_BASE 10
+
 /*
  * We should try to keep ENA_CLEANUP_BUF_SIZE lower than
  * RTE_MEMPOOL_CACHE_MAX_SIZE, so we can fit this in mempool local cache.
@@ -74,17 +76,23 @@ struct ena_stats {
ENA_STAT_ENTRY(stat, srd)
 
 /* Device arguments */
-#define ENA_DEVARG_LARGE_LLQ_HDR "large_llq_hdr"
+/* Controls whether to disable LLQ, use device recommended header policy
+ * or overriding the device recommendation.
+ * 0 - Disable LLQ.
+ * Use with extreme caution as it leads to a huge performance
+ * degradation on AWS instances from 6th generation onwards.
+ * 1 - Accept device recommended LLQ policy (Default).
+ * Device can recommend normal or large LLQ policy.
+ * 2 - Enforce normal LLQ policy.
+ * 3 - Enforce large LLQ policy.
+ * Required for packets with header that exceed 96 bytes on
+ * AWS instances prior to 5th generation.
+ */
+#define ENA_DEVARG_LLQ_POLICY "llq_policy"
 /* Timeout in seconds after which a single uncompleted Tx packet should be
  * considered as a missing.
  */
 #define ENA_DEVARG_MISS_TXC_TO "miss_txc_to"
-/*
- * Controls whether LLQ should be used (if available). Enabled by default.
- * NOTE: It's highly not recommended to disable the LLQ, as it may lead to a
- * huge performance degradation on 6th generation AWS instances.
- */
-#define ENA_DEVARG_ENABLE_LLQ "enable_llq"
 
 /*
  * Each rte_memzone should have unique name.
@@ -279,9 +287,9 @@ static int ena_xstats_get_by_id(struct rte_eth_dev *dev,
const uint64_t *ids,
uint64_t *values,
   

[PATCH 10/33] net/ena/hal: added a bus parameter to ena memcpy macro

2024-03-04 Thread shaibran
From: Shai Brandes 

ENA_MEMCPY_TO_DEVICE_64 macro needs pci bus id in order
to write to the device memory when using llq.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.c   | 3 ++-
 drivers/net/ena/hal/ena_plat_dpdk.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
index 32090259cd..d6811c7b48 100644
--- a/drivers/net/ena/hal/ena_eth_com.c
+++ b/drivers/net/ena/hal/ena_eth_com.c
@@ -74,7 +74,8 @@ static int ena_com_write_bounce_buffer_to_dev(struct 
ena_com_io_sq *io_sq,
wmb();
 
/* The line is completed. Copy it to dev */
-   ENA_MEMCPY_TO_DEVICE_64(io_sq->desc_addr.pbuf_dev_addr + dst_offset,
+   ENA_MEMCPY_TO_DEVICE_64(io_sq->bus,
+   io_sq->desc_addr.pbuf_dev_addr + dst_offset,
bounce_buffer,
llq_info->desc_list_entry_size);
 
diff --git a/drivers/net/ena/hal/ena_plat_dpdk.h 
b/drivers/net/ena/hal/ena_plat_dpdk.h
index 14bf582a45..5f7cbd1ee7 100644
--- a/drivers/net/ena/hal/ena_plat_dpdk.h
+++ b/drivers/net/ena/hal/ena_plat_dpdk.h
@@ -301,11 +301,12 @@ ena_mem_alloc_coherent(struct rte_eth_dev_data *data, 
size_t size,
 #define ENA_WAIT_EVENTS_DESTROY(admin_queue) ((void)(admin_queue))
 
 /* The size must be 8 byte align */
-#define ENA_MEMCPY_TO_DEVICE_64(dst, src, size)
   \
+#define ENA_MEMCPY_TO_DEVICE_64(bus, dst, src, size)  \
do {   \
int count, i;  \
uint64_t *to = (uint64_t *)(dst);  \
const uint64_t *from = (const uint64_t *)(src);\
+   (void)(bus);   \
count = (size) / 8;\
for (i = 0; i < count; i++, from++, to++)  \
rte_write64_relaxed(*from, to);\
-- 
2.17.1



[PATCH 12/33] net/ena/hal: rename fields in completion descriptors

2024-03-04 Thread shaibran
From: Shai Brandes 

Several reserved bits in ena_eth_io_tx_cdesc and
ena_eth_io_rx_cdesc_base have been renamed explicitly to
MBZ (Must Be Zero).
These bits are set by the device to zero before being sent
to the driver. The fields are used as an integrity check in
order to ensure that the received descriptor is not corrupted.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h |  1 +
 .../net/ena/hal/ena_defs/ena_eth_io_defs.h| 49 +--
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
index 670e794c98..438e4a1085 100644
--- a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
@@ -84,6 +84,7 @@ enum ena_admin_aq_caps_id {
ENA_ADMIN_ENA_SRD_INFO  = 1,
ENA_ADMIN_CUSTOMER_METRICS  = 2,
ENA_ADMIN_EXTENDED_RESET_REASONS= 3,
+   ENA_ADMIN_CDESC_MBZ = 4,
 };
 
 enum ena_admin_placement_policy_type {
diff --git a/drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h
index 2107d17fdf..f811dd261e 100644
--- a/drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h
@@ -152,7 +152,8 @@ struct ena_eth_io_tx_cdesc {
 
/* flags
 * 0 : phase
-* 7:1 : reserved1
+* 5:1 : reserved1
+* 7:6 : mbz6 - MBZ
 */
uint8_t flags;
 
@@ -198,7 +199,7 @@ struct ena_eth_io_rx_desc {
 struct ena_eth_io_rx_cdesc_base {
/* 4:0 : l3_proto_idx
 * 6:5 : src_vlan_cnt
-* 7 : reserved7 - MBZ
+* 7 : mbz7 - MBZ
 * 12:8 : l4_proto_idx
 * 13 : l3_csum_err - when set, either the L3
 *checksum error detected, or, the controller didn't
@@ -214,7 +215,8 @@ struct ena_eth_io_rx_cdesc_base {
 * 16 : l4_csum_checked - L4 checksum was verified
 *(could be OK or error), when cleared the status of
 *checksum is unknown
-* 23:17 : reserved17 - MBZ
+* 17 : mbz17 - MBZ
+* 23:18 : reserved18
 * 24 : phase
 * 25 : l3_csum2 - second checksum engine result
 * 26 : first - Indicates first descriptor in
@@ -341,6 +343,8 @@ struct ena_eth_io_numa_node_cfg_reg {
 
 /* tx_cdesc */
 #define ENA_ETH_IO_TX_CDESC_PHASE_MASK  BIT(0)
+#define ENA_ETH_IO_TX_CDESC_MBZ6_SHIFT  6
+#define ENA_ETH_IO_TX_CDESC_MBZ6_MASK   GENMASK(7, 6)
 
 /* rx_desc */
 #define ENA_ETH_IO_RX_DESC_PHASE_MASK   BIT(0)
@@ -355,6 +359,8 @@ struct ena_eth_io_numa_node_cfg_reg {
 #define ENA_ETH_IO_RX_CDESC_BASE_L3_PROTO_IDX_MASK  GENMASK(4, 0)
 #define ENA_ETH_IO_RX_CDESC_BASE_SRC_VLAN_CNT_SHIFT 5
 #define ENA_ETH_IO_RX_CDESC_BASE_SRC_VLAN_CNT_MASK  GENMASK(6, 5)
+#define ENA_ETH_IO_RX_CDESC_BASE_MBZ7_SHIFT 7
+#define ENA_ETH_IO_RX_CDESC_BASE_MBZ7_MASK  BIT(7)
 #define ENA_ETH_IO_RX_CDESC_BASE_L4_PROTO_IDX_SHIFT 8
 #define ENA_ETH_IO_RX_CDESC_BASE_L4_PROTO_IDX_MASK  GENMASK(12, 8)
 #define ENA_ETH_IO_RX_CDESC_BASE_L3_CSUM_ERR_SHIFT  13
@@ -365,6 +371,8 @@ struct ena_eth_io_numa_node_cfg_reg {
 #define ENA_ETH_IO_RX_CDESC_BASE_IPV4_FRAG_MASK BIT(15)
 #define ENA_ETH_IO_RX_CDESC_BASE_L4_CSUM_CHECKED_SHIFT  16
 #define ENA_ETH_IO_RX_CDESC_BASE_L4_CSUM_CHECKED_MASK   BIT(16)
+#define ENA_ETH_IO_RX_CDESC_BASE_MBZ17_SHIFT17
+#define ENA_ETH_IO_RX_CDESC_BASE_MBZ17_MASK BIT(17)
 #define ENA_ETH_IO_RX_CDESC_BASE_PHASE_SHIFT24
 #define ENA_ETH_IO_RX_CDESC_BASE_PHASE_MASK BIT(24)
 #define ENA_ETH_IO_RX_CDESC_BASE_L3_CSUM2_SHIFT 25
@@ -731,6 +739,15 @@ static inline void set_ena_eth_io_tx_cdesc_phase(struct 
ena_eth_io_tx_cdesc *p,
p->flags |= val & ENA_ETH_IO_TX_CDESC_PHASE_MASK;
 }
 
+static inline uint8_t get_ena_eth_io_tx_cdesc_mbz6(const struct 
ena_eth_io_tx_cdesc *p)
+{
+   return (p->flags & ENA_ETH_IO_TX_CDESC_MBZ6_MASK) >> 
ENA_ETH_IO_TX_CDESC_MBZ6_SHIFT;
+}
+static inline void set_ena_eth_io_tx_cdesc_mbz6(struct ena_eth_io_tx_cdesc *p, 
uint8_t val)
+{
+   p->flags |= (val << ENA_ETH_IO_TX_CDESC_MBZ6_SHIFT) & 
ENA_ETH_IO_TX_CDESC_MBZ6_MASK;
+}
+
 static inline uint8_t get_ena_eth_io_rx_desc_phase(const struct 
ena_eth_io_rx_desc *p)
 {
return p->ctrl & ENA_ETH_IO_RX_DESC_PHASE_MASK;
@@ -791,6 +808,19 @@ static inline void 
set_ena_eth_io_rx_cdesc_base_src_vlan_cnt(struct ena_eth_io_r
p->status |= (val << ENA_ETH_IO_RX_CDESC_BASE_SRC_VLAN_CNT_SHIFT) & 
ENA_ETH_IO_RX_CDESC_BASE_SRC_VLAN_CNT_MASK;
 }
 
+static inline uint32_t get_ena_eth_io_rx_cdesc_base_mbz7(const struct 
ena_eth_io_rx_cdesc_base *p)
+{
+   return (p-

[PATCH 14/33] net/ena/hal: add completion descriptor corruption check

2024-03-04 Thread shaibran
From: Shai Brandes 

Adding a check of the MBZ (Must Be Zero) fields in the
incoming tx and rx completion descriptors in order to
identify corrupted descriptors.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.c | 13 +++--
 drivers/net/ena/hal/ena_eth_com.h | 14 +-
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
index dc2935a53e..988fa013a7 100644
--- a/drivers/net/ena/hal/ena_eth_com.c
+++ b/drivers/net/ena/hal/ena_eth_com.c
@@ -237,6 +237,7 @@ static int ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq 
*io_cq,
u16 *first_cdesc_idx,
u16 *num_descs)
 {
+   struct ena_com_dev *dev = ena_com_io_cq_to_ena_dev(io_cq);
u16 count = io_cq->cur_rx_pkt_cdesc_count, head_masked;
struct ena_eth_io_rx_cdesc_base *cdesc;
u32 last = 0;
@@ -252,13 +253,21 @@ static int ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq 
*io_cq,
ena_com_cq_inc_head(io_cq);
if (unlikely((status & ENA_ETH_IO_RX_CDESC_BASE_FIRST_MASK) >>
ENA_ETH_IO_RX_CDESC_BASE_FIRST_SHIFT && count != 0)) {
-   struct ena_com_dev *dev = 
ena_com_io_cq_to_ena_dev(io_cq);
-
ena_trc_err(dev,
"First bit is on in descriptor #%d on q_id: 
%d, req_id: %u\n",
count, io_cq->qid, cdesc->req_id);
return ENA_COM_FAULT;
}
+
+   if (unlikely((status & (ENA_ETH_IO_RX_CDESC_BASE_MBZ7_MASK |
+   ENA_ETH_IO_RX_CDESC_BASE_MBZ17_MASK)) &&
+ ena_com_get_cap(dev, ENA_ADMIN_CDESC_MBZ))) {
+   ena_trc_err(dev,
+   "Corrupted RX descriptor #%d on q_id: %d, 
req_id: %u\n",
+   count, io_cq->qid, cdesc->req_id);
+   return ENA_COM_FAULT;
+   }
+
count++;
last = (status & ENA_ETH_IO_RX_CDESC_BASE_LAST_MASK) >>
ENA_ETH_IO_RX_CDESC_BASE_LAST_SHIFT;
diff --git a/drivers/net/ena/hal/ena_eth_com.h 
b/drivers/net/ena/hal/ena_eth_com.h
index 6a7c17f84f..2fac10e678 100644
--- a/drivers/net/ena/hal/ena_eth_com.h
+++ b/drivers/net/ena/hal/ena_eth_com.h
@@ -204,9 +204,11 @@ static inline void ena_com_cq_inc_head(struct 
ena_com_io_cq *io_cq)
 static inline int ena_com_tx_comp_req_id_get(struct ena_com_io_cq *io_cq,
 u16 *req_id)
 {
+   struct ena_com_dev *dev = ena_com_io_cq_to_ena_dev(io_cq);
u8 expected_phase, cdesc_phase;
struct ena_eth_io_tx_cdesc *cdesc;
u16 masked_head;
+   u8 flags;
 
masked_head = io_cq->head & (io_cq->q_depth - 1);
expected_phase = io_cq->phase;
@@ -215,14 +217,24 @@ static inline int ena_com_tx_comp_req_id_get(struct 
ena_com_io_cq *io_cq,
((uintptr_t)io_cq->cdesc_addr.virt_addr +
(masked_head * io_cq->cdesc_entry_size_in_bytes));
 
+   flags = READ_ONCE8(cdesc->flags);
+
/* When the current completion descriptor phase isn't the same as the
 * expected, it mean that the device still didn't update
 * this completion.
 */
-   cdesc_phase = READ_ONCE8(cdesc->flags) & ENA_ETH_IO_TX_CDESC_PHASE_MASK;
+   cdesc_phase = flags & ENA_ETH_IO_TX_CDESC_PHASE_MASK;
if (cdesc_phase != expected_phase)
return ENA_COM_TRY_AGAIN;
 
+   if (unlikely((flags & ENA_ETH_IO_TX_CDESC_MBZ6_MASK) &&
+ ena_com_get_cap(dev, ENA_ADMIN_CDESC_MBZ))) {
+   ena_trc_err(dev,
+   "Corrupted TX descriptor on q_id: %d, req_id: %u\n",
+   io_cq->qid, cdesc->req_id);
+   return ENA_COM_FAULT;
+   }
+
dma_rmb();
 
*req_id = READ_ONCE16(cdesc->req_id);
-- 
2.17.1



[PATCH 15/33] net/ena/hal: malformed Tx descriptor error reason

2024-03-04 Thread shaibran
From: Shai Brandes 

Adding ENA_REGS_RESET_TX_DESCRIPTOR_MALFORMED to identify
cases where the returned TX completion descriptors are
corrupted.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_defs/ena_regs_defs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
index 6a33f74812..a94025dc77 100644
--- a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
@@ -23,6 +23,7 @@ enum ena_regs_reset_reason_types {
ENA_REGS_RESET_MISS_INTERRUPT   = 14,
ENA_REGS_RESET_SUSPECTED_POLL_STARVATION= 15,
ENA_REGS_RESET_RX_DESCRIPTOR_MALFORMED  = 16,
+   ENA_REGS_RESET_TX_DESCRIPTOR_MALFORMED  = 17,
ENA_REGS_RESET_LAST,
 };
 
-- 
2.17.1



[PATCH 17/33] net/ena/hal: restructure interrupt handling

2024-03-04 Thread shaibran
From: Shai Brandes 

When invoking an admin command, in interrupt mode, if the interrupt
is received after timeout and also after the calling function finished
running, the response will be written into a memory that is no longer
valid.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 651373a52f..be08f79431 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -181,6 +181,7 @@ static int ena_com_admin_init_aenq(struct ena_com_dev 
*ena_dev,
 static void comp_ctxt_release(struct ena_com_admin_queue *queue,
 struct ena_comp_ctx *comp_ctx)
 {
+   comp_ctx->user_cqe = NULL;
comp_ctx->occupied = false;
ATOMIC32_DEC(&queue->outstanding_cmds);
 }
@@ -474,6 +475,9 @@ static void ena_com_handle_single_admin_completion(struct 
ena_com_admin_queue *a
return;
}
 
+   if (!comp_ctx->occupied)
+   return;
+
comp_ctx->status = ENA_CMD_COMPLETED;
comp_ctx->comp_status = cqe->acq_common_descriptor.status;
 
-- 
2.17.1



[PATCH 13/33] net/ena/hal: use correct read once on u8 field

2024-03-04 Thread shaibran
From: Shai Brandes 

The flags field in ena_eth_io_tx_cdesc is 8-bits long.
The current macro used is READ_ONCE16.
Switching to READ_ONCE8 to avoid reading extra data.
Given that there's an implicit cast to u8 in the assignment,
the correct value is being read, but this change makes it
even more accurate.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.h 
b/drivers/net/ena/hal/ena_eth_com.h
index cee4f35124..6a7c17f84f 100644
--- a/drivers/net/ena/hal/ena_eth_com.h
+++ b/drivers/net/ena/hal/ena_eth_com.h
@@ -219,7 +219,7 @@ static inline int ena_com_tx_comp_req_id_get(struct 
ena_com_io_cq *io_cq,
 * expected, it mean that the device still didn't update
 * this completion.
 */
-   cdesc_phase = READ_ONCE16(cdesc->flags) & 
ENA_ETH_IO_TX_CDESC_PHASE_MASK;
+   cdesc_phase = READ_ONCE8(cdesc->flags) & ENA_ETH_IO_TX_CDESC_PHASE_MASK;
if (cdesc_phase != expected_phase)
return ENA_COM_TRY_AGAIN;
 
-- 
2.17.1



[PATCH 18/33] net/ena/hal: add unlikely to error checks

2024-03-04 Thread shaibran
From: Shai Brandes 

The unlikely mechanism is used to reduce pipe flush,
caused by a wrong branch prediction.
Moreover, it increases readability by wrapping unexpected errors.
This commit adds unlikely to error checks that are unlikely to happen.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 56 +++
 drivers/net/ena/hal/ena_eth_com.c |  2 +-
 2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index be08f79431..f20879613b 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -79,7 +79,7 @@ static int ena_com_mem_addr_set(struct ena_com_dev *ena_dev,
   struct ena_common_mem_addr *ena_addr,
   dma_addr_t addr)
 {
-   if ((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 0)) != addr) {
+   if (unlikely((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 0)) != 
addr)) {
ena_trc_err(ena_dev, "DMA address has more bits than the device 
supports\n");
return ENA_COM_INVAL;
}
@@ -99,7 +99,7 @@ static int ena_com_admin_init_sq(struct ena_com_admin_queue 
*admin_queue)
ENA_MEM_ALLOC_COHERENT(admin_queue->q_dmadev, size, sq->entries, 
sq->dma_addr,
   sq->mem_handle);
 
-   if (!sq->entries) {
+   if (unlikely(!sq->entries)) {
ena_trc_err(ena_dev, "Memory allocation failed\n");
return ENA_COM_NO_MEM;
}
@@ -122,7 +122,7 @@ static int ena_com_admin_init_cq(struct ena_com_admin_queue 
*admin_queue)
ENA_MEM_ALLOC_COHERENT(admin_queue->q_dmadev, size, cq->entries, 
cq->dma_addr,
   cq->mem_handle);
 
-   if (!cq->entries)  {
+   if (unlikely(!cq->entries))  {
ena_trc_err(ena_dev, "Memory allocation failed\n");
return ENA_COM_NO_MEM;
}
@@ -147,7 +147,7 @@ static int ena_com_admin_init_aenq(struct ena_com_dev 
*ena_dev,
aenq->dma_addr,
aenq->mem_handle);
 
-   if (!aenq->entries) {
+   if (unlikely(!aenq->entries)) {
ena_trc_err(ena_dev, "Memory allocation failed\n");
return ENA_COM_NO_MEM;
}
@@ -233,7 +233,7 @@ static struct ena_comp_ctx 
*__ena_com_submit_admin_cmd(struct ena_com_admin_queu
 
/* In case of queue FULL */
cnt = (u16)ATOMIC32_READ(&admin_queue->outstanding_cmds);
-   if (cnt >= admin_queue->q_depth) {
+   if (unlikely(cnt >= admin_queue->q_depth)) {
ena_trc_dbg(admin_queue->ena_dev, "Admin queue is full.\n");
admin_queue->stats.out_of_space++;
return ERR_PTR(ENA_COM_NO_SPACE);
@@ -357,7 +357,7 @@ static int ena_com_init_io_sq(struct ena_com_dev *ena_dev,
   io_sq->desc_addr.mem_handle);
}
 
-   if (!io_sq->desc_addr.virt_addr) {
+   if (unlikely(!io_sq->desc_addr.virt_addr)) {
ena_trc_err(ena_dev, "Memory allocation failed\n");
return ENA_COM_NO_MEM;
}
@@ -382,7 +382,7 @@ static int ena_com_init_io_sq(struct ena_com_dev *ena_dev,
if (!io_sq->bounce_buf_ctrl.base_buffer)
io_sq->bounce_buf_ctrl.base_buffer = 
ENA_MEM_ALLOC(ena_dev->dmadev, size);
 
-   if (!io_sq->bounce_buf_ctrl.base_buffer) {
+   if (unlikely(!io_sq->bounce_buf_ctrl.base_buffer)) {
ena_trc_err(ena_dev, "Bounce buffer memory allocation 
failed\n");
return ENA_COM_NO_MEM;
}
@@ -447,7 +447,7 @@ static int ena_com_init_io_cq(struct ena_com_dev *ena_dev,
   ENA_CDESC_RING_SIZE_ALIGNMENT);
}
 
-   if (!io_cq->cdesc_addr.virt_addr) {
+   if (unlikely(!io_cq->cdesc_addr.virt_addr)) {
ena_trc_err(ena_dev, "Memory allocation failed\n");
return ENA_COM_NO_MEM;
}
@@ -577,7 +577,7 @@ static int ena_com_wait_and_process_admin_cq_polling(struct 
ena_comp_ctx *comp_c
if (comp_ctx->status != ENA_CMD_SUBMITTED)
break;
 
-   if (ENA_TIME_EXPIRE(timeout)) {
+   if (unlikely(ENA_TIME_EXPIRE(timeout))) {
ena_trc_err(admin_queue->ena_dev,
"Wait for completion (polling) timeout\n");
/* ENA didn't have any completion */
@@ -776,7 +776,7 @@ static int ena_com_config_llq_info(struct ena_com_dev 
*ena_dev,
llq_default_cfg->llq_ring_entry_size_value;
 
rc = ena_com_set_llq(ena_dev);
-   if (rc)
+   if (unlikely(rc))
ena_trc_err(ena_dev, "Cannot set LLQ configuration: %d\n", rc)

[PATCH 19/33] net/ena/hal: missing admin interrupt reset reason

2024-03-04 Thread shaibran
From: Shai Brandes 

There can be cases when we trigger reset if an admin interrupt
is missing.
In order to identify this use-case specifically,
this commit adds a new reset reason.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c|  2 ++
 drivers/net/ena/hal/ena_com.h| 12 
 drivers/net/ena/hal/ena_defs/ena_regs_defs.h |  1 +
 3 files changed, 15 insertions(+)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index f20879613b..0d1f4dd715 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -803,6 +803,7 @@ static int 
ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *com
ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);
 
if (comp_ctx->status == ENA_CMD_COMPLETED) {
+   admin_queue->is_missing_admin_interrupt = true;
ena_trc_err(admin_queue->ena_dev,
"The ena device sent a completion but the 
driver didn't receive a MSI-X interrupt (cmd %d), autopolling mode is %s\n",
comp_ctx->cmd_opcode, 
admin_queue->auto_polling ? "ON" : "OFF");
@@ -2138,6 +2139,7 @@ int ena_com_admin_init(struct ena_com_dev *ena_dev,
 
admin_queue->ena_dev = ena_dev;
admin_queue->running_state = true;
+   admin_queue->is_missing_admin_interrupt = false;
 
return 0;
 error:
diff --git a/drivers/net/ena/hal/ena_com.h b/drivers/net/ena/hal/ena_com.h
index c62016cc06..c999cd2381 100644
--- a/drivers/net/ena/hal/ena_com.h
+++ b/drivers/net/ena/hal/ena_com.h
@@ -237,6 +237,8 @@ struct ena_com_admin_queue {
 */
bool running_state;
 
+   bool is_missing_admin_interrupt;
+
/* Count the number of outstanding admin commands */
ena_atomic32_t outstanding_cmds;
 
@@ -1089,6 +1091,16 @@ int ena_com_config_dev_mode(struct ena_com_dev *ena_dev,
struct ena_admin_feature_llq_desc *llq_features,
struct ena_llq_configurations *llq_default_config);
 
+/* ena_com_get_missing_admin_interrupt - Return if there is a missing admin 
interrupt
+ * @ena_dev: ENA communication layer struct
+ *
+ * @return - true if there is a missing admin interrupt or false otherwise
+ */
+static inline bool ena_com_get_missing_admin_interrupt(struct ena_com_dev 
*ena_dev)
+{
+   return ena_dev->admin_queue.is_missing_admin_interrupt;
+}
+
 /* ena_com_io_sq_to_ena_dev - Extract ena_com_dev using contained field io_sq.
  * @io_sq: IO submit queue struct
  *
diff --git a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
index a94025dc77..db6a97d675 100644
--- a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
@@ -24,6 +24,7 @@ enum ena_regs_reset_reason_types {
ENA_REGS_RESET_SUSPECTED_POLL_STARVATION= 15,
ENA_REGS_RESET_RX_DESCRIPTOR_MALFORMED  = 16,
ENA_REGS_RESET_TX_DESCRIPTOR_MALFORMED  = 17,
+   ENA_REGS_RESET_MISSING_ADMIN_INTERRUPT  = 18,
ENA_REGS_RESET_LAST,
 };
 
-- 
2.17.1



[PATCH 20/33] net/ena/hal: check for existing keep alive notification

2024-03-04 Thread shaibran
From: Shai Brandes 

This commit adds an API to query the aenq on whether
there is a pending keep alive notification.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 39 +++
 drivers/net/ena/hal/ena_com.h | 10 +
 2 files changed, 49 insertions(+)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 0d1f4dd715..7a23881667 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -2456,6 +2456,45 @@ void ena_com_aenq_intr_handler(struct ena_com_dev 
*ena_dev, void *data)
mmiowb();
 }
 
+bool ena_com_aenq_has_keep_alive(struct ena_com_dev *ena_dev)
+{
+   struct ena_admin_aenq_common_desc *aenq_common;
+   struct ena_com_aenq *aenq = &ena_dev->aenq;
+   struct ena_admin_aenq_entry *aenq_e;
+   u8 phase = aenq->phase;
+   u16 masked_head;
+
+   masked_head = aenq->head & (aenq->q_depth - 1);
+   aenq_e = &aenq->entries[masked_head]; /* Get first entry */
+   aenq_common = &aenq_e->aenq_common_desc;
+
+   /* Go over all the events */
+   while ((READ_ONCE8(aenq_common->flags) &
+   ENA_ADMIN_AENQ_COMMON_DESC_PHASE_MASK) == phase) {
+   /* Make sure the device finished writing the rest of the 
descriptor
+* before reading it.
+*/
+   dma_rmb();
+
+   if (aenq_common->group == ENA_ADMIN_KEEP_ALIVE)
+   return true;
+
+   /* Get next event entry */
+   masked_head++;
+
+   if (unlikely(masked_head == aenq->q_depth)) {
+   masked_head = 0;
+   phase = !phase;
+   }
+
+   aenq_e = &aenq->entries[masked_head];
+   aenq_common = &aenq_e->aenq_common_desc;
+   }
+
+   return false;
+}
+
+
 int ena_com_dev_reset(struct ena_com_dev *ena_dev,
  enum ena_regs_reset_reason_types reset_reason)
 {
diff --git a/drivers/net/ena/hal/ena_com.h b/drivers/net/ena/hal/ena_com.h
index c999cd2381..737747f64b 100644
--- a/drivers/net/ena/hal/ena_com.h
+++ b/drivers/net/ena/hal/ena_com.h
@@ -639,6 +639,16 @@ void ena_com_admin_q_comp_intr_handler(struct ena_com_dev 
*ena_dev);
  */
 void ena_com_aenq_intr_handler(struct ena_com_dev *ena_dev, void *data);
 
+/* ena_com_aenq_has_keep_alive - Retrieve if there is a keep alive 
notification in the aenq
+ * @ena_dev: ENA communication layer struct
+ *
+ * This method goes over the async event notification queue and returns if 
there
+ * is a keep alive notification.
+ *
+ * @return - true if there is a keep alive notification in the aenq or false 
otherwise
+ */
+bool ena_com_aenq_has_keep_alive(struct ena_com_dev *ena_dev);
+
 /* ena_com_abort_admin_commands - Abort all the outstanding admin commands.
  * @ena_dev: ENA communication layer struct
  *
-- 
2.17.1



[PATCH 21/33] net/ena/hal: modify memory barrier comment

2024-03-04 Thread shaibran
From: Shai Brandes 

The dma_rmb() memory barrier guarantees that the device set the
phase bit before continuing to read the rest of the descriptor.
Because the phase bit and the rest of the descriptor are in the same
cache line this ensures coherency of the data from the descriptor.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 7a23881667..7f0e0b2449 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -2412,8 +2412,8 @@ void ena_com_aenq_intr_handler(struct ena_com_dev 
*ena_dev, void *data)
/* Go over all the events */
while ((READ_ONCE8(aenq_common->flags) &
ENA_ADMIN_AENQ_COMMON_DESC_PHASE_MASK) == phase) {
-   /* Make sure the phase bit (ownership) is as expected before
-* reading the rest of the descriptor.
+   /* Make sure the device finished writing the rest of the 
descriptor
+* before reading it.
 */
dma_rmb();
 
-- 
2.17.1



[PATCH 16/33] net/ena/hal: phc feature modifications

2024-03-04 Thread shaibran
From: Shai Brandes 

1. PHC algorithm is updated to support reading new PHC values.
2. Update default PHC expiration timeout.
3. Fix a theoretical PHC destroy race.
4. Adjust PHC for multiple devices.
5. PHC activation version check point.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 111 --
 drivers/net/ena/hal/ena_com.h |  31 +++--
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h |  45 +--
 3 files changed, 135 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 31c37b0ab3..651373a52f 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -41,10 +41,12 @@
 #define ENA_MAX_ADMIN_POLL_US 5000
 
 /* PHC definitions */
-#define ENA_PHC_DEFAULT_EXPIRE_TIMEOUT_USEC 20
+#define ENA_PHC_DEFAULT_EXPIRE_TIMEOUT_USEC 10
 #define ENA_PHC_DEFAULT_BLOCK_TIMEOUT_USEC 1000
-#define ENA_PHC_TIMESTAMP_ERROR 0x
+#define ENA_PHC_MAX_ERROR_BOUND 0x
 #define ENA_PHC_REQ_ID_OFFSET 0xDEAD
+#define ENA_PHC_ERROR_FLAGS (ENA_ADMIN_PHC_ERROR_FLAG_TIMESTAMP | \
+ENA_ADMIN_PHC_ERROR_FLAG_ERROR_BOUND)
 
 /*/
 /*/
@@ -1778,16 +1780,21 @@ int ena_com_phc_config(struct ena_com_dev *ena_dev)
struct ena_admin_set_feat_cmd set_feat_cmd;
int ret = 0;
 
-   /* Get device PHC default configuration */
-   ret = ena_com_get_feature(ena_dev, &get_feat_resp, 
ENA_ADMIN_PHC_CONFIG, 0);
+   /* Get default device PHC configuration */
+   ret = ena_com_get_feature(ena_dev,
+ &get_feat_resp,
+ ENA_ADMIN_PHC_CONFIG,
+ ENA_ADMIN_PHC_FEATURE_VERSION_0);
if (unlikely(ret)) {
ena_trc_err(ena_dev, "Failed to get PHC feature configuration, 
error: %d\n", ret);
return ret;
}
 
-   /* Supporting only readless PHC retrieval */
-   if (get_feat_resp.u.phc.type != ENA_ADMIN_PHC_TYPE_READLESS) {
-   ena_trc_err(ena_dev, "Unsupported PHC type, error: %d\n", 
ENA_COM_UNSUPPORTED);
+   /* Supporting only PHC V0 (readless mode with error bound) */
+   if (get_feat_resp.u.phc.version != ENA_ADMIN_PHC_FEATURE_VERSION_0) {
+   ena_trc_err(ena_dev, "Unsupprted PHC version (0x%X), error: 
%d\n",
+   get_feat_resp.u.phc.version,
+   ENA_COM_UNSUPPORTED);
return ENA_COM_UNSUPPORTED;
}
 
@@ -1804,11 +1811,11 @@ int ena_com_phc_config(struct ena_com_dev *ena_dev)
   get_feat_resp.u.phc.block_timeout_usec :
   ENA_PHC_DEFAULT_BLOCK_TIMEOUT_USEC;
 
-   /* Sanity check - expire timeout must not be above skip timeout */
+   /* Sanity check - expire timeout must not exceed block timeout */
if (phc->expire_timeout_usec > phc->block_timeout_usec)
phc->expire_timeout_usec = phc->block_timeout_usec;
 
-   /* Prepare PHC feature command with PHC output address */
+   /* Prepare PHC config feature command */
memset(&set_feat_cmd, 0x0, sizeof(set_feat_cmd));
set_feat_cmd.aq_common_descriptor.opcode = ENA_ADMIN_SET_FEATURE;
set_feat_cmd.feat_common.feature_id = ENA_ADMIN_PHC_CONFIG;
@@ -1840,13 +1847,16 @@ int ena_com_phc_config(struct ena_com_dev *ena_dev)
 void ena_com_phc_destroy(struct ena_com_dev *ena_dev)
 {
struct ena_com_phc_info *phc = &ena_dev->phc;
-
-   phc->active = false;
+   unsigned long flags = 0;
 
/* In case PHC is not supported by the device, silently exiting */
if (!phc->virt_addr)
return;
 
+   ENA_SPINLOCK_LOCK(phc->lock, flags);
+   phc->active = false;
+   ENA_SPINLOCK_UNLOCK(phc->lock, flags);
+
ENA_MEM_FREE_COHERENT(ena_dev->dmadev,
  sizeof(*phc->virt_addr),
  phc->virt_addr,
@@ -1857,15 +1867,14 @@ void ena_com_phc_destroy(struct ena_com_dev *ena_dev)
ENA_SPINLOCK_DESTROY(phc->lock);
 }
 
-int ena_com_phc_get(struct ena_com_dev *ena_dev, u64 *timestamp)
+int ena_com_phc_get_timestamp(struct ena_com_dev *ena_dev, u64 *timestamp)
 {
volatile struct ena_admin_phc_resp *read_resp = ena_dev->phc.virt_addr;
+   const ena_time_high_res_t zero_system_time = ENA_TIME_INIT_HIGH_RES();
struct ena_com_phc_info *phc = &ena_dev->phc;
-   ena_time_high_res_t initial_time = ENA_TIME_INIT_HIGH_RES();
-   static ena_time_high_res_t start_time;
-   unsigned long flags = 0;
ena_time_high_res_t expire_time;
ena_time_high_res_t block_time;
+   unsigned long flags = 0;
int ret = ENA_COM_OK;
 
if 

[PATCH 22/33] net/ena/hal: rework Rx ring submission queue

2024-03-04 Thread shaibran
From: Shai Brandes 

RX ring submission queue descriptors are always located in host memory
This optimization replaces the generic update tail method with a
tailored method for host memory type descriptors to avoid unnecessary if
statement.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
index b9123f84c3..ebad38d15a 100644
--- a/drivers/net/ena/hal/ena_eth_com.c
+++ b/drivers/net/ena/hal/ena_eth_com.c
@@ -210,11 +210,8 @@ static int ena_com_sq_update_llq_tail(struct ena_com_io_sq 
*io_sq)
return ENA_COM_OK;
 }
 
-static int ena_com_sq_update_tail(struct ena_com_io_sq *io_sq)
+static int ena_com_sq_update_reqular_queue_tail(struct ena_com_io_sq *io_sq)
 {
-   if (io_sq->mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV)
-   return ena_com_sq_update_llq_tail(io_sq);
-
io_sq->tail++;
 
/* Switch phase bit in case of wrap around */
@@ -224,6 +221,14 @@ static int ena_com_sq_update_tail(struct ena_com_io_sq 
*io_sq)
return ENA_COM_OK;
 }
 
+static int ena_com_sq_update_tail(struct ena_com_io_sq *io_sq)
+{
+   if (io_sq->mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV)
+   return ena_com_sq_update_llq_tail(io_sq);
+
+   return ena_com_sq_update_reqular_queue_tail(io_sq);
+}
+
 static struct ena_eth_io_rx_cdesc_base *
ena_com_rx_cdesc_idx_to_ptr(struct ena_com_io_cq *io_cq, u16 idx)
 {
@@ -662,7 +667,7 @@ int ena_com_add_single_rx_desc(struct ena_com_io_sq *io_sq,
desc->buff_addr_hi =
((ena_buf->paddr & GENMASK_ULL(io_sq->dma_addr_bits - 1, 32)) 
>> 32);
 
-   return ena_com_sq_update_tail(io_sq);
+   return ena_com_sq_update_reqular_queue_tail(io_sq);
 }
 
 bool ena_com_cq_empty(struct ena_com_io_cq *io_cq)
-- 
2.17.1



[PATCH 24/33] net/ena/hal: handle command abort

2024-03-04 Thread shaibran
From: Shai Brandes 

Currently admin_queue->stats.aborted_cmd counter is incremented if an
admin command status is ENA_CMD_ABORTED and only if the admin queue is
in polling mode.
This commit fixes handling the case of incrementing
admin_queue->stats.aborted_cmd if the admin queue is in interrupt
mode as well.
Also added a verification that the command status is a valid
completion status which is currently verified only if the admin queue
is in polling mode.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 7f0e0b2449..1c88a82ffc 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -824,8 +824,19 @@ static int 
ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *com
ret = ENA_COM_TIMER_EXPIRED;
goto err;
}
+   } else if (unlikely(comp_ctx->status == ENA_CMD_ABORTED)) {
+   ena_trc_err(admin_queue->ena_dev, "Command was aborted\n");
+   ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
+   admin_queue->stats.aborted_cmd++;
+   ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);
+   ret = ENA_COM_NO_DEVICE;
+   goto err;
}
 
+   ENA_WARN(comp_ctx->status != ENA_CMD_COMPLETED,
+admin_queue->ena_dev, "Invalid comp status %d\n",
+comp_ctx->status);
+
ret = ena_com_comp_status_to_errno(admin_queue, comp_ctx->comp_status);
 err:
comp_ctxt_release(admin_queue, comp_ctx);
-- 
2.17.1



[PATCH 23/33] net/ena/hal: remove operating system type enum

2024-03-04 Thread shaibran
From: Shai Brandes 

remove all othe operating system enumeration as they
are unrelated to DPDK. Use a constant value instead.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h | 13 +
 drivers/net/ena/hal/ena_plat_dpdk.h   |  1 +
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
index ce8a26721e..c3910c50cc 100644
--- a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
@@ -933,19 +933,8 @@ struct ena_admin_feature_rss_flow_hash_input {
uint16_t enabled_input_sort;
 };
 
-enum ena_admin_os_type {
-   ENA_ADMIN_OS_LINUX  = 1,
-   ENA_ADMIN_OS_WIN= 2,
-   ENA_ADMIN_OS_DPDK   = 3,
-   ENA_ADMIN_OS_FREEBSD= 4,
-   ENA_ADMIN_OS_IPXE   = 5,
-   ENA_ADMIN_OS_ESXI   = 6,
-   ENA_ADMIN_OS_MACOS  = 7,
-   ENA_ADMIN_OS_GROUPS_NUM = 7,
-};
-
 struct ena_admin_host_info {
-   /* defined in enum ena_admin_os_type */
+   /* Host OS type defined as ENA_ADMIN_OS_* */
uint32_t os_type;
 
/* os distribution string format */
diff --git a/drivers/net/ena/hal/ena_plat_dpdk.h 
b/drivers/net/ena/hal/ena_plat_dpdk.h
index 5f7cbd1ee7..aa8fbb0cd9 100644
--- a/drivers/net/ena/hal/ena_plat_dpdk.h
+++ b/drivers/net/ena/hal/ena_plat_dpdk.h
@@ -341,5 +341,6 @@ static __rte_always_inline int ena_bits_per_u64(uint64_t 
bitmap)
return count;
 }
 
+#define ENA_ADMIN_OS_DPDK 3
 
 #endif /* DPDK_ENA_COM_ENA_PLAT_DPDK_H_ */
-- 
2.17.1



[PATCH 26/33] net/ena: cosmetic changes

2024-03-04 Thread shaibran
From: Shai Brandes 

This patch makes several changes to improve
the style and readability of the code.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 1c88a82ffc..2db21e7895 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -1808,7 +1808,7 @@ int ena_com_phc_config(struct ena_com_dev *ena_dev)
 
/* Supporting only PHC V0 (readless mode with error bound) */
if (get_feat_resp.u.phc.version != ENA_ADMIN_PHC_FEATURE_VERSION_0) {
-   ena_trc_err(ena_dev, "Unsupprted PHC version (0x%X), error: 
%d\n",
+   ena_trc_err(ena_dev, "Unsupported PHC version (0x%X), error: 
%d\n",
get_feat_resp.u.phc.version,
ENA_COM_UNSUPPORTED);
return ENA_COM_UNSUPPORTED;
@@ -1914,15 +1914,14 @@ int ena_com_phc_get_timestamp(struct ena_com_dev 
*ena_dev, u64 *timestamp)
 
/* PHC is in active state, update statistics according to 
req_id and error_flags */
if ((READ_ONCE16(read_resp->req_id) != phc->req_id) ||
-   (read_resp->error_flags & ENA_PHC_ERROR_FLAGS)) {
+   (read_resp->error_flags & ENA_PHC_ERROR_FLAGS))
/* Device didn't update req_id during blocking time or 
timestamp is invalid,
 * this indicates on a device error
 */
phc->stats.phc_err++;
-   } else {
+   else
/* Device updated req_id during blocking time with 
valid timestamp */
phc->stats.phc_exp++;
-   }
}
 
/* Setting relative timeouts */
@@ -2431,7 +2430,7 @@ void ena_com_aenq_intr_handler(struct ena_com_dev 
*ena_dev, void *data)
timestamp = (u64)aenq_common->timestamp_low |
((u64)aenq_common->timestamp_high << 32);
 
-   ena_trc_dbg(ena_dev, "AENQ! Group[%x] Syndrome[%x] timestamp: 
[%" ENA_PRIU64 "s]\n",
+   ena_trc_dbg(ena_dev, "AENQ! Group[%x] Syndrome[%x] timestamp: 
[%" ENA_PRIu64 "s]\n",
aenq_common->group,
aenq_common->syndrome,
timestamp);
@@ -3233,16 +3232,15 @@ int ena_com_allocate_customer_metrics_buffer(struct 
ena_com_dev *ena_dev)
 {
struct ena_customer_metrics *customer_metrics = 
&ena_dev->customer_metrics;
 
+   customer_metrics->buffer_len = ENA_CUSTOMER_METRICS_BUFFER_SIZE;
ENA_MEM_ALLOC_COHERENT(ena_dev->dmadev,
   customer_metrics->buffer_len,
   customer_metrics->buffer_virt_addr,
   customer_metrics->buffer_dma_addr,
   customer_metrics->buffer_dma_handle);
-   if (unlikely(customer_metrics->buffer_virt_addr == NULL))
+   if (unlikely(!customer_metrics->buffer_virt_addr))
return ENA_COM_NO_MEM;
 
-   customer_metrics->buffer_len = ENA_CUSTOMER_METRICS_BUFFER_SIZE;
-
return 0;
 }
 
@@ -3285,7 +3283,6 @@ void ena_com_delete_customer_metrics_buffer(struct 
ena_com_dev *ena_dev)
  customer_metrics->buffer_dma_addr,
  customer_metrics->buffer_dma_handle);
customer_metrics->buffer_virt_addr = NULL;
-   customer_metrics->buffer_len = 0;
}
 }
 
-- 
2.17.1



[PATCH 25/33] net/ena/hal: add support for device reset request

2024-03-04 Thread shaibran
From: Shai Brandes 

Adds support for reset request message from the device to the driver,
over AENQ, which in turn should cause the driver to trigger reset.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h | 3 ++-
 drivers/net/ena/hal/ena_defs/ena_regs_defs.h  | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
index c3910c50cc..2adce75ed3 100644
--- a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
@@ -1213,7 +1213,8 @@ enum ena_admin_aenq_group {
ENA_ADMIN_KEEP_ALIVE= 4,
ENA_ADMIN_REFRESH_CAPABILITIES  = 5,
ENA_ADMIN_CONF_NOTIFICATIONS= 6,
-   ENA_ADMIN_AENQ_GROUPS_NUM   = 7,
+   ENA_ADMIN_DEVICE_REQUEST_RESET  = 7,
+   ENA_ADMIN_AENQ_GROUPS_NUM   = 8,
 };
 
 enum ena_admin_aenq_notification_syndrome {
diff --git a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
index db6a97d675..dd9b629f10 100644
--- a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
@@ -25,6 +25,7 @@ enum ena_regs_reset_reason_types {
ENA_REGS_RESET_RX_DESCRIPTOR_MALFORMED  = 16,
ENA_REGS_RESET_TX_DESCRIPTOR_MALFORMED  = 17,
ENA_REGS_RESET_MISSING_ADMIN_INTERRUPT  = 18,
+   ENA_REGS_RESET_DEVICE_REQUEST   = 19,
ENA_REGS_RESET_LAST,
 };
 
-- 
2.17.1



[PATCH 27/33] net/ena/hal: modify customer metrics memory management

2024-03-04 Thread shaibran
From: Shai Brandes 

1. Set buffer length to zero in case memory allocation failed
   and after memory is released.
2. The driver checks buffer_virt_addr for customer allocation
   success. In case the allocation fails, buffer_virt_addr
   may not necessarily be NULL.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 2db21e7895..24756e5e76 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -3233,13 +3233,17 @@ int ena_com_allocate_customer_metrics_buffer(struct 
ena_com_dev *ena_dev)
struct ena_customer_metrics *customer_metrics = 
&ena_dev->customer_metrics;
 
customer_metrics->buffer_len = ENA_CUSTOMER_METRICS_BUFFER_SIZE;
+   customer_metrics->buffer_virt_addr = NULL;
+
ENA_MEM_ALLOC_COHERENT(ena_dev->dmadev,
   customer_metrics->buffer_len,
   customer_metrics->buffer_virt_addr,
   customer_metrics->buffer_dma_addr,
   customer_metrics->buffer_dma_handle);
-   if (unlikely(!customer_metrics->buffer_virt_addr))
+   if (unlikely(!customer_metrics->buffer_virt_addr)) {
+   customer_metrics->buffer_len = 0;
return ENA_COM_NO_MEM;
+   }
 
return 0;
 }
@@ -3283,6 +3287,7 @@ void ena_com_delete_customer_metrics_buffer(struct 
ena_com_dev *ena_dev)
  customer_metrics->buffer_dma_addr,
  customer_metrics->buffer_dma_handle);
customer_metrics->buffer_virt_addr = NULL;
+   customer_metrics->buffer_len = 0;
}
 }
 
-- 
2.17.1



[PATCH 28/33] net/ena/hal: cosmetic changes

2024-03-04 Thread shaibran
From: Shai Brandes 

1. modify log prints to use correct format specifier
   for unsigned variables.
2. removed line breaks for lines that do not exceed
   maximal line length.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.c   | 22 +++---
 drivers/net/ena/hal/ena_plat_dpdk.h |  5 ++---
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
index ebad38d15a..87a2dbfba1 100644
--- a/drivers/net/ena/hal/ena_eth_com.c
+++ b/drivers/net/ena/hal/ena_eth_com.c
@@ -64,7 +64,7 @@ static int ena_com_write_bounce_buffer_to_dev(struct 
ena_com_io_sq *io_sq,
 
io_sq->entries_in_tx_burst_left--;
ena_trc_dbg(ena_com_io_sq_to_ena_dev(io_sq),
-   "Decreasing entries_in_tx_burst_left of queue %d to 
%d\n",
+   "Decreasing entries_in_tx_burst_left of queue %u to 
%u\n",
io_sq->qid, io_sq->entries_in_tx_burst_left);
}
 
@@ -259,7 +259,7 @@ static int ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq 
*io_cq,
if (unlikely((status & ENA_ETH_IO_RX_CDESC_BASE_FIRST_MASK) >>
ENA_ETH_IO_RX_CDESC_BASE_FIRST_SHIFT && count != 0)) {
ena_trc_err(dev,
-   "First bit is on in descriptor #%d on q_id: 
%d, req_id: %u\n",
+   "First bit is on in descriptor #%u on q_id: 
%u, req_id: %u\n",
count, io_cq->qid, cdesc->req_id);
return ENA_COM_FAULT;
}
@@ -268,7 +268,7 @@ static int ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq 
*io_cq,
ENA_ETH_IO_RX_CDESC_BASE_MBZ17_MASK)) &&
  ena_com_get_cap(dev, ENA_ADMIN_CDESC_MBZ))) {
ena_trc_err(dev,
-   "Corrupted RX descriptor #%d on q_id: %d, 
req_id: %u\n",
+   "Corrupted RX descriptor #%u on q_id: %u, 
req_id: %u\n",
count, io_cq->qid, cdesc->req_id);
return ENA_COM_FAULT;
}
@@ -288,7 +288,7 @@ static int ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq 
*io_cq,
io_cq->cur_rx_pkt_cdesc_start_idx = head_masked;
 
ena_trc_dbg(ena_com_io_cq_to_ena_dev(io_cq),
-   "ENA q_id: %d packets were completed. first desc 
idx %u descs# %d\n",
+   "ENA q_id: %u packets were completed. first desc 
idx %u descs# %u\n",
io_cq->qid, *first_cdesc_idx, count);
} else {
io_cq->cur_rx_pkt_cdesc_count = count;
@@ -394,7 +394,7 @@ static void ena_com_rx_set_flags(struct ena_com_io_cq 
*io_cq,
ENA_ETH_IO_RX_CDESC_BASE_IPV4_FRAG_SHIFT;
 
ena_trc_dbg(ena_com_io_cq_to_ena_dev(io_cq),
-   "l3_proto %d l4_proto %d l3_csum_err %d l4_csum_err %d hash 
%d frag %d cdesc_status %x\n",
+   "l3_proto %d l4_proto %d l3_csum_err %d l4_csum_err %d hash 
%u frag %d cdesc_status %x\n",
ena_rx_ctx->l3_proto,
ena_rx_ctx->l4_proto,
ena_rx_ctx->l3_csum_err,
@@ -434,7 +434,7 @@ int ena_com_prepare_tx(struct ena_com_io_sq *io_sq,
 
if (unlikely(header_len > io_sq->tx_max_header_size)) {
ena_trc_err(ena_com_io_sq_to_ena_dev(io_sq),
-   "Header size is too large %d max header: %d\n",
+   "Header size is too large %u max header: %u\n",
header_len, io_sq->tx_max_header_size);
return ENA_COM_INVAL;
}
@@ -592,12 +592,12 @@ int ena_com_rx_pkt(struct ena_com_io_cq *io_cq,
}
 
ena_trc_dbg(ena_com_io_cq_to_ena_dev(io_cq),
-   "Fetch rx packet: queue %d completed desc: %d\n",
+   "Fetch rx packet: queue %u completed desc: %u\n",
io_cq->qid, nb_hw_desc);
 
if (unlikely(nb_hw_desc > ena_rx_ctx->max_bufs)) {
ena_trc_err(ena_com_io_cq_to_ena_dev(io_cq),
-   "Too many RX cdescs (%d) > MAX(%d)\n",
+   "Too many RX cdescs (%u) > MAX(%u)\n",
nb_hw_desc, ena_rx_ctx->max_bufs);
return ENA_COM_NO_SPACE;
}
@@ -622,7 +622,7 @@ int ena_com_rx_pkt(struct ena_com_io_cq *io_cq,
io_sq->next_to_comp += nb_hw_desc;
 
ena_trc_dbg(ena_com_io_cq_to_ena_dev(io_cq),
-   "[%s][QID#%d] Updating SQ head to: %d\n", __func__,
+   "Updating Queue %u, SQ head to: %u\n",
io_sq->qid, io_sq->next_to_comp);
 
/* Get rx flags from the last pkt */
@@ -660,8 +660,8 @@ int ena_com_add_single_rx_

[PATCH 31/33] net/ena: support max large llq depth from the device

2024-03-04 Thread shaibran
From: Shai Brandes 

Selected AWS instances from later generations enable
large LLQ by default, allowing the transmission of
packets with headers exceeding 96 bytes.

Due to the overall ENA memory BAR size limitation,
large LLQ has the side effect of halving the maximum
number of LLQ entries (from 1024 to 512).

ENA-Express, powered by AWS Scalable Reliable Datagram
(SRD) technology, requires Tx queue with 1024 entries.
Selected AWS instances from upcoming generations will
have double the size of the ENA memory BAR, enabling ENA-Express
to work with a large LLQ of 1024 entries.

The initial default large LLQ size will remain 512.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_24_03.rst|  2 +
 drivers/net/ena/ena_ethdev.c  | 38 ---
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h |  4 +-
 3 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 2a22bb07ed..9823616eeb 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -107,6 +107,8 @@ New Features
   * Added support for sub-optimal configuration notifications from the device.
   * Restructured fast release of mbufs when RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE 
optimization is enabled.
   * Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg 
`llq_policy`.
+  * Added support for LLQ header size recommendation from the device.
+  * Allowed large LLQ with 1024 entries when the device supports enlarged 
memory BAR.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index d73e321d0f..43693ee2ee 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -42,6 +42,8 @@
 
 #define DECIMAL_BASE 10
 
+#define MAX_WIDE_LLQ_DEPTH_UNSUPPORTED 0
+
 /*
  * We should try to keep ENA_CLEANUP_BUF_SIZE lower than
  * RTE_MEMPOOL_CACHE_MAX_SIZE, so we can fit this in mempool local cache.
@@ -1071,7 +1073,7 @@ static int
 ena_calc_io_queue_size(struct ena_calc_queue_size_ctx *ctx,
   bool use_large_llq_hdr)
 {
-   struct ena_admin_feature_llq_desc *llq = &ctx->get_feat_ctx->llq;
+   struct ena_admin_feature_llq_desc *dev = &ctx->get_feat_ctx->llq;
struct ena_com_dev *ena_dev = ctx->ena_dev;
uint32_t max_tx_queue_size;
uint32_t max_rx_queue_size;
@@ -1086,7 +1088,7 @@ ena_calc_io_queue_size(struct ena_calc_queue_size_ctx 
*ctx,
if (ena_dev->tx_mem_queue_type ==
ENA_ADMIN_PLACEMENT_POLICY_DEV) {
max_tx_queue_size = RTE_MIN(max_tx_queue_size,
-   llq->max_llq_depth);
+   dev->max_llq_depth);
} else {
max_tx_queue_size = RTE_MIN(max_tx_queue_size,
max_queue_ext->max_tx_sq_depth);
@@ -1106,7 +1108,7 @@ ena_calc_io_queue_size(struct ena_calc_queue_size_ctx 
*ctx,
if (ena_dev->tx_mem_queue_type ==
ENA_ADMIN_PLACEMENT_POLICY_DEV) {
max_tx_queue_size = RTE_MIN(max_tx_queue_size,
-   llq->max_llq_depth);
+   dev->max_llq_depth);
} else {
max_tx_queue_size = RTE_MIN(max_tx_queue_size,
max_queues->max_sq_depth);
@@ -1122,18 +1124,28 @@ ena_calc_io_queue_size(struct ena_calc_queue_size_ctx 
*ctx,
max_rx_queue_size = rte_align32prevpow2(max_rx_queue_size);
max_tx_queue_size = rte_align32prevpow2(max_tx_queue_size);
 
-   if (use_large_llq_hdr) {
-   if ((llq->entry_size_ctrl_supported &
-ENA_ADMIN_LIST_ENTRY_SIZE_256B) &&
-   (ena_dev->tx_mem_queue_type ==
-ENA_ADMIN_PLACEMENT_POLICY_DEV)) {
-   max_tx_queue_size /= 2;
-   PMD_INIT_LOG(INFO,
-   "Forcing large headers and decreasing maximum 
Tx queue size to %d\n",
+   if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV && 
use_large_llq_hdr) {
+   /* intersection between driver configuration and device 
capabilities */
+   if (dev->entry_size_ctrl_supported & 
ENA_ADMIN_LIST_ENTRY_SIZE_256B) {
+   if (dev->max_wide_llq_depth == 
MAX_WIDE_LLQ_DEPTH_UNSUPPORTED) {
+   /* Devices that do not support the double-sized 
ENA memory BAR will
+* report max_wide_llq_depth as 0. In such 
case, driver halves the
+* queue depth when working in large llq policy.
+*/
+   max_tx_queue_size >>= 1;
+   PMD_INIT_LOG(INFO,
+   

[PATCH 30/33] net/ena: exhaust interrupt callbacks in device close

2024-03-04 Thread shaibran
From: Shai Brandes 

Change rte_intr_callback_unregister to its synchronous variant to
ensure all active interrupt callbacks are completed before proceeding
with the flow. Relocate the interrupt deregistration to precede the
release of stats memory, thereby preventing the interrupt handler
from accessing memory that has already been freed.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 2a7b7c0cba..d73e321d0f 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -871,6 +871,7 @@ static int ena_close(struct rte_eth_dev *dev)
struct rte_intr_handle *intr_handle = pci_dev->intr_handle;
struct ena_adapter *adapter = dev->data->dev_private;
int ret = 0;
+   int rc;
 
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
@@ -879,17 +880,17 @@ static int ena_close(struct rte_eth_dev *dev)
ret = ena_stop(dev);
adapter->state = ENA_ADAPTER_STATE_CLOSED;
 
+   rte_intr_disable(intr_handle);
+   rc = rte_intr_callback_unregister_sync(intr_handle, 
ena_interrupt_handler_rte, dev);
+   if (unlikely(rc != 0))
+   PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n");
+
ena_rx_queue_release_all(dev);
ena_tx_queue_release_all(dev);
 
rte_free(adapter->drv_stats);
adapter->drv_stats = NULL;
 
-   rte_intr_disable(intr_handle);
-   rte_intr_callback_unregister(intr_handle,
-ena_interrupt_handler_rte,
-dev);
-
/*
 * MAC is not allocated dynamically. Setting NULL should prevent from
 * release of the resource in the rte_eth_dev_release_port().
-- 
2.17.1



[PATCH 32/33] net/ena: control path pure polling mode

2024-03-04 Thread shaibran
From: Shai Brandes 

This commit implements a new operation mode that enables purely
polling-based functionality, eliminating the need for interrupts in
the control path. This mode is not activated by default and can be
toggled using the "control_poll_interval" devarg. When operating in
this mode, periodic alarms are used to monitor the control queues.

A non-zero value for this devarg is mandatory for control path
functionality when binding ports to uio_pci_generic kernel module which
lacks interrupt support.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/nics/ena.rst|  49 ---
 doc/guides/rel_notes/release_24_03.rst |   2 +
 drivers/net/ena/ena_ethdev.c   | 108 -
 drivers/net/ena/ena_ethdev.h   |   5 ++
 4 files changed, 130 insertions(+), 34 deletions(-)

diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 53c9341859..a94397f9d3 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -109,12 +109,16 @@ Runtime Configuration
 
* **llq_policy** (default 1)
 
- Controls whether use device recommended header policy or override it.
+ Controls whether use device recommended header policy or override it:
+
  0 - Disable LLQ.
- **Use with extreme caution as it leads to a huge performance
- degradation on AWS instances from 6th generation onwards.**
+ **Use with extreme caution as it leads to a huge performance
+ degradation on AWS instances from 6th generation onwards.**
+
  1 - Accept device recommended LLQ policy (Default).
+
  2 - Enforce normal LLQ policy.
+
  3 - Enforce large LLQ policy.
 
* **miss_txc_to** (default 5)
@@ -126,6 +130,18 @@ Runtime Configuration
  timer service. Setting this parameter to 0 disables this feature. Maximum
  allowed value is 60 seconds.
 
+   * **control_poll_interval** (default 0)
+
+ Enable polling-based functionality of the admin queues, eliminating the
+ need for interrupts in the control-path:
+
+ 0 - Disable (Admin queue will work in interrupt mode).
+
+ [1..1000] - Number of milliseconds to wait between periodic inspection of 
the admin queues.
+
+ **A non-zero value for this devarg is mandatory for control path 
functionality
+ when binding ports to uio_pci_generic kernel module which lacks interrupt 
support.**
+
 ENA Configuration Parameters
 
 
@@ -164,23 +180,23 @@ Prerequisites
 #. Prepare the system as recommended by DPDK suite.  This includes environment
variables, hugepages configuration, tool-chains and configuration.
 
-#. ENA PMD can operate with ``vfio-pci``(*) or ``igb_uio`` driver.
+#. ENA PMD can operate with ``vfio-pci`` (*), ``igb_uio``, or 
``uio_pci_generic`` driver.
 
(*) ENAv2 hardware supports Low Latency Queue v2 (LLQv2). This feature
reduces the latency of the packets by pushing the header directly through
the PCI to the device, before the DMA is even triggered. For proper work
-   kernel PCI driver must support write combining (WC).
+   kernel PCI driver must support write-combining (WC).
In DPDK ``igb_uio`` it must be enabled by loading module with
``wc_activate=1`` flag (example below). However, mainline's vfio-pci
-   driver in kernel doesn't have WC support yet (planed to be added).
+   driver in kernel doesn't have WC support yet (planned to be added).
If vfio-pci is used user should follow `AWS ENA PMD documentation

`_.
 
-#. Insert ``vfio-pci`` or ``igb_uio`` kernel module using the command
-   ``modprobe vfio-pci`` or ``modprobe uio; insmod igb_uio.ko wc_activate=1``
-   respectively.
+#. For ``igb_uio``:
+   Insert ``igb_uio`` kernel module using the command ``modprobe uio; insmod 
igb_uio.ko wc_activate=1``
 
-#. For ``vfio-pci`` users only:
+#. For ``vfio-pci``:
+   Insert ``vfio-pci`` kernel module using the command ``modprobe vfio-pci``
Please make sure that ``IOMMU`` is enabled in your system,
or use ``vfio`` driver in ``noiommu`` mode::
 
@@ -189,7 +205,14 @@ Prerequisites
To use ``noiommu`` mode, the ``vfio-pci`` must be built with flag
``CONFIG_VFIO_NOIOMMU``.
 
-#. Bind the intended ENA device to ``vfio-pci`` or ``igb_uio`` module.
+#. For ``uio_pci_generic``:
+   Insert ``uio_pci_generic`` kernel module using the command ``modprobe 
uio_pci_generic``.
+
+   Note that when launching the application, the ``control_poll_interval`` 
devarg must be used with a non-zero value (1000 is recommended)
+   as ``uio_pci_generic`` lacks interrupt support. The control-path (admin 
queues) of the ENA require poll-mode
+   to process command completion and asyncronous notification from the device.
+
+#. Bind the intended ENA device to ``vfio-pci``, ``igb_uio``, or 
``uio_pci_generic`` module.
 
 At this point the system should be ready to run DPDK applications. Once the
 application ru

[PATCH 33/33] net/ena: upgrade driver version to 2.9.0

2024-03-04 Thread shaibran
From: Shai Brandes 

upgrade driver version to 2.9.0.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index af1f6d6d05..f47f585611 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -22,7 +22,7 @@
 #include 
 
 #define DRV_MODULE_VER_MAJOR   2
-#define DRV_MODULE_VER_MINOR   8
+#define DRV_MODULE_VER_MINOR   9
 #define DRV_MODULE_VER_SUBMINOR0
 
 #define __MERGE_64B_H_L(h, l) (((uint64_t)h << 32) | l)
-- 
2.17.1



[PATCH 29/33] net/ena: update device-preferred size of rings

2024-03-04 Thread shaibran
From: Shai Brandes 

Update the device-preferred size of the Tx ring to fall within the
valid range when a large LLQ is enabled. For consistency, align the
device-preferred size of the Rx ring accordingly.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 2414f631c8..2a7b7c0cba 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -2595,8 +2595,10 @@ static int ena_infos_get(struct rte_eth_dev *dev,
dev_info->tx_desc_lim.nb_mtu_seg_max = RTE_MIN(ENA_PKT_MAX_BUFS,
adapter->max_tx_sgl_size);
 
-   dev_info->default_rxportconf.ring_size = ENA_DEFAULT_RING_SIZE;
-   dev_info->default_txportconf.ring_size = ENA_DEFAULT_RING_SIZE;
+   dev_info->default_rxportconf.ring_size = RTE_MIN(ENA_DEFAULT_RING_SIZE,
+
dev_info->rx_desc_lim.nb_max);
+   dev_info->default_txportconf.ring_size = RTE_MIN(ENA_DEFAULT_RING_SIZE,
+
dev_info->tx_desc_lim.nb_max);
 
dev_info->err_handle_mode = RTE_ETH_ERROR_HANDLE_MODE_PASSIVE;
 
-- 
2.17.1



[PATCH v5] net/cnxk: support Tx queue descriptor count

2024-03-04 Thread skoteshwar
From: Satha Rao 

Added CNXK APIs to get used txq descriptor count.

Signed-off-by: Satha Rao 
---
 doc/guides/nics/features/cnxk.ini  |  1 +
 doc/guides/rel_notes/release_24_03.rst |  1 +
 drivers/net/cnxk/cn10k_tx_select.c | 22 ++
 drivers/net/cnxk/cn9k_tx_select.c  | 23 +++
 drivers/net/cnxk/cnxk_ethdev.h | 25 +
 5 files changed, 72 insertions(+)

diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index b5d9f7e..1c8db1a 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -40,6 +40,7 @@ Timesync = Y
 Timestamp offload= Y
 Rx descriptor status = Y
 Tx descriptor status = Y
+Tx queue count   = Y
 Basic stats  = Y
 Stats per queue  = Y
 Extended stats   = Y
diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 2b160cf..b1942b5 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -113,6 +113,7 @@ New Features
   * Added support for Rx inject.
   * Optimized SW external mbuf free for better performance and avoid SQ 
corruption.
   * Added support for port representors.
+  * Added support for ``rte_eth_tx_queue_count``.
 
 * **Updated Marvell OCTEON EP driver.**
 
diff --git a/drivers/net/cnxk/cn10k_tx_select.c 
b/drivers/net/cnxk/cn10k_tx_select.c
index 404f5ba..aa0620e 100644
--- a/drivers/net/cnxk/cn10k_tx_select.c
+++ b/drivers/net/cnxk/cn10k_tx_select.c
@@ -20,6 +20,24 @@
eth_dev->tx_pkt_burst;
 }
 
+#if defined(RTE_ARCH_ARM64)
+static int
+cn10k_nix_tx_queue_count(void *tx_queue)
+{
+   struct cn10k_eth_txq *txq = (struct cn10k_eth_txq *)tx_queue;
+
+   return cnxk_nix_tx_queue_count(txq->fc_mem, txq->sqes_per_sqb_log2);
+}
+
+static int
+cn10k_nix_tx_queue_sec_count(void *tx_queue)
+{
+   struct cn10k_eth_txq *txq = (struct cn10k_eth_txq *)tx_queue;
+
+   return cnxk_nix_tx_queue_sec_count(txq->fc_mem, txq->sqes_per_sqb_log2, 
txq->cpt_fc);
+}
+#endif
+
 void
 cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
 {
@@ -63,6 +81,10 @@
if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg);
}
+   if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
+   eth_dev->tx_queue_count = cn10k_nix_tx_queue_sec_count;
+   else
+   eth_dev->tx_queue_count = cn10k_nix_tx_queue_count;
 
rte_mb();
 #else
diff --git a/drivers/net/cnxk/cn9k_tx_select.c 
b/drivers/net/cnxk/cn9k_tx_select.c
index e08883f..5ecf919 100644
--- a/drivers/net/cnxk/cn9k_tx_select.c
+++ b/drivers/net/cnxk/cn9k_tx_select.c
@@ -20,6 +20,24 @@
eth_dev->tx_pkt_burst;
 }
 
+#if defined(RTE_ARCH_ARM64)
+static int
+cn9k_nix_tx_queue_count(void *tx_queue)
+{
+   struct cn9k_eth_txq *txq = (struct cn9k_eth_txq *)tx_queue;
+
+   return cnxk_nix_tx_queue_count(txq->fc_mem, txq->sqes_per_sqb_log2);
+}
+
+static int
+cn9k_nix_tx_queue_sec_count(void *tx_queue)
+{
+   struct cn9k_eth_txq *txq = (struct cn9k_eth_txq *)tx_queue;
+
+   return cnxk_nix_tx_queue_sec_count(txq->fc_mem, txq->sqes_per_sqb_log2, 
txq->cpt_fc);
+}
+#endif
+
 void
 cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
 {
@@ -59,6 +77,11 @@
if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg);
}
+   if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
+   eth_dev->tx_queue_count = cn9k_nix_tx_queue_sec_count;
+   else
+   eth_dev->tx_queue_count = cn9k_nix_tx_queue_count;
+
 
rte_mb();
 #else
diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h
index 5d42e13..b42af88 100644
--- a/drivers/net/cnxk/cnxk_ethdev.h
+++ b/drivers/net/cnxk/cnxk_ethdev.h
@@ -464,6 +464,31 @@ struct cnxk_eth_txq_sp {
return ((struct cnxk_eth_txq_sp *)__txq) - 1;
 }
 
+static inline int
+cnxk_nix_tx_queue_count(uint64_t *mem, uint16_t sqes_per_sqb_log2)
+{
+   uint64_t val;
+
+   val = rte_atomic_load_explicit((RTE_ATOMIC(uint64_t) *)mem, 
rte_memory_order_relaxed);
+   val = (val << sqes_per_sqb_log2) - val;
+
+   return (val & 0x);
+}
+
+static inline int
+cnxk_nix_tx_queue_sec_count(uint64_t *mem, uint16_t sqes_per_sqb_log2, 
uint64_t *sec_fc)
+{
+   uint64_t sq_cnt, sec_cnt, val;
+
+   sq_cnt = rte_atomic_load_explicit((RTE_ATOMIC(uint64_t) *)mem, 
rte_memory_order_relaxed);
+   sq_cnt = (sq_cnt << sqes_per_sqb_log2) - sq_cnt;
+   sec_cnt = rte_atomic_load_explicit((RTE_ATOMIC(uint64_t) *)sec_fc,
+  rte_memory_order_relaxed);
+   val = RTE_MAX(sq_cnt, sec_cnt);
+
+   return (val & 0x);
+}
+
 /* Common ethdev ops */
 extern struct eth_dev_ops cnxk_eth_dev_ops;
 
-- 
1.

[PATCH v6] net/i40e: add diagnostic support in Tx path

2024-03-04 Thread Mingjin Ye
Implemented a Tx wrapper to perform a thorough check on mbufs,
categorizing and counting invalid cases by type for diagnostic
purposes. The count of invalid cases is accessible through xstats_get.

Also, the devarg option "mbuf_check" was introduced to configure the
diagnostic parameters to enable the appropriate diagnostic features.

supported cases: mbuf, size, segment, offload.
 1. mbuf: Check for corrupted mbuf.
 2. size: Check min/max packet length according to HW spec.
 3. segment: Check number of mbuf segments not exceed HW limits.
 4. offload: Check for use of an unsupported offload flag.

parameter format: "mbuf_check=" or "mbuf_check=[,]"
eg: dpdk-testpmd -a :87:00.0,mbuf_check=[mbuf,size] -- -i

Signed-off-by: Mingjin Ye 
---
v2: remove strict.
---
v3: optimised.
---
v4: rebase.
---
v5: fix ci error.
---
v6: Changes the commit log.
---
 doc/guides/nics/i40e.rst   |  14 +++
 drivers/net/i40e/i40e_ethdev.c | 138 -
 drivers/net/i40e/i40e_ethdev.h |  28 ++
 drivers/net/i40e/i40e_rxtx.c   | 153 +++--
 drivers/net/i40e/i40e_rxtx.h   |   2 +
 5 files changed, 327 insertions(+), 8 deletions(-)

diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index 15689ac958..91b45e1d40 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -275,6 +275,20 @@ Runtime Configuration
 
   -a 84:00.0,vf_msg_cfg=80@120:180
 
+- ``Support TX diagnostics`` (default ``not enabled``)
+
+  Set the ``devargs`` parameter ``mbuf_check`` to enable TX diagnostics.
+  For example, ``-a 87:00.0,mbuf_check=`` or ``-a 
87:00.0,mbuf_check=[,...]``.
+  Thereafter, ``rte_eth_xstats_get()`` can be used to get the error counts,
+  which are collected in ``tx_mbuf_error_packets`` xstats.
+  In testpmd these can be shown via: ``testpmd> show port xstats all``.
+  Supported values for the ``case`` parameter are:
+
+  *   mbuf: Check for corrupted mbuf.
+  *   size: Check min/max packet length according to HW spec.
+  *   segment: Check number of mbuf segments does not exceed HW limits.
+  *   offload: Check for use of an unsupported offload flag.
+
 Vector RX Pre-conditions
 
 For Vector RX it is assumed that the number of descriptor rings will be a power
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 4d21341382..3e2ddcaa3e 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -48,6 +48,7 @@
 #define ETH_I40E_SUPPORT_MULTI_DRIVER  "support-multi-driver"
 #define ETH_I40E_QUEUE_NUM_PER_VF_ARG  "queue-num-per-vf"
 #define ETH_I40E_VF_MSG_CFG"vf_msg_cfg"
+#define ETH_I40E_MBUF_CHECK_ARG   "mbuf_check"
 
 #define I40E_CLEAR_PXE_WAIT_MS 200
 #define I40E_VSI_TSR_QINQ_STRIP0x4010
@@ -412,6 +413,7 @@ static const char *const valid_keys[] = {
ETH_I40E_SUPPORT_MULTI_DRIVER,
ETH_I40E_QUEUE_NUM_PER_VF_ARG,
ETH_I40E_VF_MSG_CFG,
+   ETH_I40E_MBUF_CHECK_ARG,
NULL};
 
 static const struct rte_pci_id pci_id_i40e_map[] = {
@@ -545,6 +547,14 @@ static const struct rte_i40e_xstats_name_off 
rte_i40e_stats_strings[] = {
 #define I40E_NB_ETH_XSTATS (sizeof(rte_i40e_stats_strings) / \
sizeof(rte_i40e_stats_strings[0]))
 
+static const struct rte_i40e_xstats_name_off i40e_mbuf_strings[] = {
+   {"tx_mbuf_error_packets", offsetof(struct i40e_mbuf_stats,
+   tx_pkt_errors)},
+};
+
+#define I40E_NB_MBUF_XSTATS (sizeof(i40e_mbuf_strings) / \
+   sizeof(i40e_mbuf_strings[0]))
+
 static const struct rte_i40e_xstats_name_off rte_i40e_hw_port_strings[] = {
{"tx_link_down_dropped", offsetof(struct i40e_hw_port_stats,
tx_dropped_link_down)},
@@ -1373,6 +1383,88 @@ read_vf_msg_config(__rte_unused const char *key,
return 0;
 }
 
+static int
+read_mbuf_check_config(__rte_unused const char *key, const char *value, void 
*args)
+{
+   char *cur;
+   char *tmp;
+   int str_len;
+   int valid_len;
+
+   int ret = 0;
+   uint64_t *mc_flags = args;
+   char *str2 = strdup(value);
+   if (str2 == NULL)
+   return -1;
+
+   str_len = strlen(str2);
+   if (str2[0] == '[' && str2[str_len - 1] == ']') {
+   if (str_len < 3) {
+   ret = -1;
+   goto mdd_end;
+   }
+   valid_len = str_len - 2;
+   memmove(str2, str2 + 1, valid_len);
+   memset(str2 + valid_len, '\0', 2);
+   }
+   cur = strtok_r(str2, ",", &tmp);
+   while (cur != NULL) {
+   if (!strcmp(cur, "mbuf"))
+   *mc_flags |= I40E_MBUF_CHECK_F_TX_MBUF;
+   else if (!strcmp(cur, "size"))
+   *mc_flags |= I40E_MBUF_CHECK_F_TX_SIZE;
+   else if (!strcmp(cur, "segment"))
+   *mc_flags |= I40E_MBUF_CHECK_F_TX_SEGMENT;
+   else if (!strcmp(cu

[PATCH v4] net/ice: add diagnostic support in Tx path

2024-03-04 Thread Mingjin Ye
Implemented a Tx wrapper to perform a thorough check on mbufs,
categorizing and counting invalid cases by type for diagnostic
purposes. The count of invalid cases is accessible through xstats_get.

Also, the devarg option "mbuf_check" was introduced to configure the
diagnostic parameters to enable the appropriate diagnostic features.

supported cases: mbuf, size, segment, offload.
 1. mbuf: Check for corrupted mbuf.
 2. size: Check min/max packet length according to HW spec.
 3. segment: Check number of mbuf segments not exceed HW limits.
 4. offload: Check for use of an unsupported offload flag.

parameter format: "mbuf_check=" or "mbuf_check=[,]"
eg: dpdk-testpmd -a :81:00.0,mbuf_check=[mbuf,size] -- -i

Signed-off-by: Mingjin Ye 
---
v2: rebase.
---
v3: Modify comment log.
---
v4: Changes the commit log.
---
 doc/guides/nics/ice.rst  |  14 
 drivers/net/ice/ice_ethdev.c | 108 +++-
 drivers/net/ice/ice_ethdev.h |  23 +
 drivers/net/ice/ice_rxtx.c   | 158 ---
 drivers/net/ice/ice_rxtx.h   |  20 +
 5 files changed, 312 insertions(+), 11 deletions(-)

diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index 8f33751577..53b4a79095 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -257,6 +257,20 @@ Runtime Configuration
   As a trade-off, this configuration may cause the packet processing 
performance
   degradation due to the PCI bandwidth limitation.
 
+- ``Tx diagnostics`` (default ``not enabled``)
+
+  Set the ``devargs`` parameter ``mbuf_check`` to enable TX diagnostics.
+  For example, ``-a 81:00.0,mbuf_check=`` or ``-a 
81:00.0,mbuf_check=[,...]``.
+  Thereafter, ``rte_eth_xstats_get()`` can be used to get the error counts,
+  which are collected in ``tx_mbuf_error_packets`` xstats.
+  In testpmd these can be shown via: ``testpmd> show port xstats all``.
+  Supported values for the ``case`` parameter are:
+
+  *   mbuf: Check for corrupted mbuf.
+  *   size: Check min/max packet length according to HW spec.
+  *   segment: Check number of mbuf segments does not exceed HW limits.
+  *   offload: Check for use of an unsupported offload flag.
+
 Driver compilation and testing
 --
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index f07b236ad4..b2e1a664d4 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -12,6 +12,7 @@
 #include 
 
 #include 
+#include 
 
 #include "eal_firmware.h"
 
@@ -34,6 +35,7 @@
 #define ICE_HW_DEBUG_MASK_ARG "hw_debug_mask"
 #define ICE_ONE_PPS_OUT_ARG   "pps_out"
 #define ICE_RX_LOW_LATENCY_ARG"rx_low_latency"
+#define ICE_MBUF_CHECK_ARG   "mbuf_check"
 
 #define ICE_CYCLECOUNTER_MASK  0xULL
 
@@ -49,6 +51,7 @@ static const char * const ice_valid_args[] = {
ICE_ONE_PPS_OUT_ARG,
ICE_RX_LOW_LATENCY_ARG,
ICE_DEFAULT_MAC_DISABLE,
+   ICE_MBUF_CHECK_ARG,
NULL
 };
 
@@ -320,6 +323,14 @@ static const struct ice_xstats_name_off 
ice_stats_strings[] = {
 #define ICE_NB_ETH_XSTATS (sizeof(ice_stats_strings) / \
sizeof(ice_stats_strings[0]))
 
+static const struct ice_xstats_name_off ice_mbuf_strings[] = {
+   {"tx_mbuf_error_packets", offsetof(struct ice_mbuf_stats,
+   tx_pkt_errors)},
+};
+
+#define ICE_NB_MBUF_XSTATS (sizeof(ice_mbuf_strings) / \
+   sizeof(ice_mbuf_strings[0]))
+
 static const struct ice_xstats_name_off ice_hw_port_strings[] = {
{"tx_link_down_dropped", offsetof(struct ice_hw_port_stats,
tx_dropped_link_down)},
@@ -2062,6 +2073,58 @@ handle_pps_out_arg(__rte_unused const char *key, const 
char *value,
return 0;
 }
 
+static int
+ice_parse_mbuf_check(__rte_unused const char *key, const char *value, void 
*args)
+{
+   char *cur;
+   char *tmp;
+   int str_len;
+   int valid_len;
+
+   int ret = 0;
+   uint64_t *mc_flags = args;
+   char *str2 = strdup(value);
+   if (str2 == NULL)
+   return -1;
+
+   str_len = strlen(str2);
+   if (str_len == 0) {
+   ret = -1;
+   goto err_end;
+   }
+
+   /* Try stripping the outer square brackets of the parameter string. */
+   str_len = strlen(str2);
+   if (str2[0] == '[' && str2[str_len - 1] == ']') {
+   if (str_len < 3) {
+   ret = -1;
+   goto err_end;
+   }
+   valid_len = str_len - 2;
+   memmove(str2, str2 + 1, valid_len);
+   memset(str2 + valid_len, '\0', 2);
+   }
+
+   cur = strtok_r(str2, ",", &tmp);
+   while (cur != NULL) {
+   if (!strcmp(cur, "mbuf"))
+   *mc_flags |= ICE_MBUF_CHECK_F_TX_MBUF;
+   else if (!strcmp(cur, "size"))
+   *mc_flags |= ICE_MBUF_CHECK_F_TX_SIZE;
+   else if (!strcmp(cur, "segment"))
+  

[PATCH v6] net/cnxk: support Tx queue descriptor count

2024-03-04 Thread skoteshwar
From: Satha Rao 

Added CNXK APIs to get used txq descriptor count.

Signed-off-by: Satha Rao 
---

Depends-on: series-30833 ("ethdev: support Tx queue used count")

v2:
  Updated release notes and fixed API for CPT queues.
v3:
  Addressed review comments
v5:
  Fixed compilation errors
v6:
  Fixed checkpatch

 doc/guides/nics/features/cnxk.ini  |  1 +
 doc/guides/rel_notes/release_24_03.rst |  1 +
 drivers/net/cnxk/cn10k_tx_select.c | 22 ++
 drivers/net/cnxk/cn9k_tx_select.c  | 23 +++
 drivers/net/cnxk/cnxk_ethdev.h | 25 +
 5 files changed, 72 insertions(+)

diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index b5d9f7e..1c8db1a 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -40,6 +40,7 @@ Timesync = Y
 Timestamp offload= Y
 Rx descriptor status = Y
 Tx descriptor status = Y
+Tx queue count   = Y
 Basic stats  = Y
 Stats per queue  = Y
 Extended stats   = Y
diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 2b160cf..b1942b5 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -113,6 +113,7 @@ New Features
   * Added support for Rx inject.
   * Optimized SW external mbuf free for better performance and avoid SQ 
corruption.
   * Added support for port representors.
+  * Added support for ``rte_eth_tx_queue_count``.
 
 * **Updated Marvell OCTEON EP driver.**
 
diff --git a/drivers/net/cnxk/cn10k_tx_select.c 
b/drivers/net/cnxk/cn10k_tx_select.c
index 404f5ba..aa0620e 100644
--- a/drivers/net/cnxk/cn10k_tx_select.c
+++ b/drivers/net/cnxk/cn10k_tx_select.c
@@ -20,6 +20,24 @@
eth_dev->tx_pkt_burst;
 }
 
+#if defined(RTE_ARCH_ARM64)
+static int
+cn10k_nix_tx_queue_count(void *tx_queue)
+{
+   struct cn10k_eth_txq *txq = (struct cn10k_eth_txq *)tx_queue;
+
+   return cnxk_nix_tx_queue_count(txq->fc_mem, txq->sqes_per_sqb_log2);
+}
+
+static int
+cn10k_nix_tx_queue_sec_count(void *tx_queue)
+{
+   struct cn10k_eth_txq *txq = (struct cn10k_eth_txq *)tx_queue;
+
+   return cnxk_nix_tx_queue_sec_count(txq->fc_mem, txq->sqes_per_sqb_log2, 
txq->cpt_fc);
+}
+#endif
+
 void
 cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
 {
@@ -63,6 +81,10 @@
if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg);
}
+   if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
+   eth_dev->tx_queue_count = cn10k_nix_tx_queue_sec_count;
+   else
+   eth_dev->tx_queue_count = cn10k_nix_tx_queue_count;
 
rte_mb();
 #else
diff --git a/drivers/net/cnxk/cn9k_tx_select.c 
b/drivers/net/cnxk/cn9k_tx_select.c
index e08883f..5ecf919 100644
--- a/drivers/net/cnxk/cn9k_tx_select.c
+++ b/drivers/net/cnxk/cn9k_tx_select.c
@@ -20,6 +20,24 @@
eth_dev->tx_pkt_burst;
 }
 
+#if defined(RTE_ARCH_ARM64)
+static int
+cn9k_nix_tx_queue_count(void *tx_queue)
+{
+   struct cn9k_eth_txq *txq = (struct cn9k_eth_txq *)tx_queue;
+
+   return cnxk_nix_tx_queue_count(txq->fc_mem, txq->sqes_per_sqb_log2);
+}
+
+static int
+cn9k_nix_tx_queue_sec_count(void *tx_queue)
+{
+   struct cn9k_eth_txq *txq = (struct cn9k_eth_txq *)tx_queue;
+
+   return cnxk_nix_tx_queue_sec_count(txq->fc_mem, txq->sqes_per_sqb_log2, 
txq->cpt_fc);
+}
+#endif
+
 void
 cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
 {
@@ -59,6 +77,11 @@
if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg);
}
+   if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
+   eth_dev->tx_queue_count = cn9k_nix_tx_queue_sec_count;
+   else
+   eth_dev->tx_queue_count = cn9k_nix_tx_queue_count;
+
 
rte_mb();
 #else
diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h
index 5d42e13..5e04064 100644
--- a/drivers/net/cnxk/cnxk_ethdev.h
+++ b/drivers/net/cnxk/cnxk_ethdev.h
@@ -464,6 +464,31 @@ struct cnxk_eth_txq_sp {
return ((struct cnxk_eth_txq_sp *)__txq) - 1;
 }
 
+static inline int
+cnxk_nix_tx_queue_count(uint64_t *mem, uint16_t sqes_per_sqb_log2)
+{
+   uint64_t val;
+
+   val = rte_atomic_load_explicit((RTE_ATOMIC(uint64_t)*)mem, 
rte_memory_order_relaxed);
+   val = (val << sqes_per_sqb_log2) - val;
+
+   return (val & 0x);
+}
+
+static inline int
+cnxk_nix_tx_queue_sec_count(uint64_t *mem, uint16_t sqes_per_sqb_log2, 
uint64_t *sec_fc)
+{
+   uint64_t sq_cnt, sec_cnt, val;
+
+   sq_cnt = rte_atomic_load_explicit((RTE_ATOMIC(uint64_t)*)mem, 
rte_memory_order_relaxed);
+   sq_cnt = (sq_cnt << sqes_per_sqb_log2) - sq_cnt;
+   sec_cnt = rte_atomic_load_explicit((RTE_ATOMIC(uint64_t)*)sec_fc,
+

Re: [PATCH v5 3/3] event/cnxk: support DMA event functions

2024-03-04 Thread Jerin Jacob
On Mon, Mar 4, 2024 at 2:38 AM Amit Prakash Shukla
 wrote:
>
> Added support of dma driver callback assignment to eventdev
> enqueue and dequeue. The change also defines dma adapter
> capabilities function.
>
> Depends-on: series-30612 ("lib/dmadev: get DMA device using device ID")
>
> Signed-off-by: Amit Prakash Shukla 


Cleaned up release notes and Series applied to
dpdk-next-net-eventdev/for-main. Thanks


RE: [EXTERNAL] [PATCH] crypto/mlx5: add virtual function device ID

2024-03-04 Thread Akhil Goyal
> > Subject: RE: [EXTERNAL] [PATCH] crypto/mlx5: add virtual function device ID
> >
> > > Subject: [EXTERNAL] [PATCH] crypto/mlx5: add virtual function device
> > > ID
> > >
> > > This adds the virtual function device ID to the list of supported
> > > NVIDIA devices that run the MLX5 compress PMD.
> >
> > Compress PMD or crypto PMD?
> 
> Sorry, it should be crypto.
Applied to dpdk-next-crypto with the above change.
Thanks.


Re: [PATCH v2] net/ice: fix null pointer dereferences

2024-03-04 Thread Bruce Richardson
On Mon, Mar 04, 2024 at 01:37:51PM +0800, Wenwu Ma wrote:
> This patch fixes two null pointer dereferences detected by
> coverity scan.
> 
> Coverity issue: 414096
> Fixes: 6ccef90ff5d3 ("net/ice: support VSI level bandwidth config")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Wenwu Ma 

Reviewed-by: Bruce Richardson 

Applied to dpdk-next-net-intel with expanded commit log message.

thanks,
/Bruce


RE: [EXTERNAL] [PATCH] crypto/mlx5: add max segment assert

2024-03-04 Thread Akhil Goyal
> Currently, for multi-segment mbuf, before crypto WQE an extra
> UMR WQE will be introduced to build the contiguous memory space.
> Crypto WQE uses that contiguous memory space key as input.
> 
> This commit adds assert for maximum supported segments in debug
> mode in case the segments exceed UMR's limitation.
> 
> Signed-off-by: Suanming Mou 
> Acked-by: Matan Azrad 
Applied to dpdk-next-crypto
Thanks.


RE: [EXTERNAL] [PATCH] examples/l3fwd: fix Rx over not ready port

2024-03-04 Thread Konstantin Ananyev


> > > > From: Konstantin Ananyev 
> > > > Sent: Friday, March 1, 2024 10:10 PM
> > > > To: dev@dpdk.org
> > > > Cc: Jerin Jacob ; Pavan Nikhilesh Bhagavatula
> > > > ; Konstantin Ananyev
> > > > ; sta...@dpdk.org
> > > > Subject: [EXTERNAL] [PATCH] examples/l3fwd: fix Rx over not ready port
> > > >
> > > > Prioritize security for external emails: Confirm sender and content 
> > > > safety
> > > > before clicking links or opening attachments
> > > >
> > > > --
> > > > From: Konstantin Ananyev 
> > > >
> > > > Running l3fwd in event mode with SW eventdev, service cores
> > > > can start RX before main thread is finished with PMD installation.
> > > > to reproduce:
> > > > ./dpdk-l3fwd --lcores=49,51 -n 6 -a ca:00.0 -s 0x8 \
> > > > --vdev event_sw0 -- \
> > > > -L -P -p 1  --mode eventdev --eventq-sched=ordered \
> > > > --rule_ipv4=test/l3fwd_lpm_v4_u1.cfg --
> > rule_ipv6=test/l3fwd_lpm_v6_u1.cfg
> > > > \
> > > > --no-numa
> > > >
> > > > At init stage user will most likely see the error message like that:
> > > > ETHDEV: lcore 51 called rx_pkt_burst for not ready port 0
> > > > 0: ./dpdk-l3fwd (rte_dump_stack+0x1f) [15de723]
> > > > ...
> > > > 9: ./dpdk-l3fwd (eal_thread_loop+0x5a2) [15c1324]
> > > > ...
> > > >
> > > > And then all depends how luck/unlucky you are.
> > > > If there are some actual packet in HW RX queue, then the app will most
> > > > likely crash, otherwise it might survive.
> > > > As error message suggests, the problem is that services are started
> > > > before main thread finished with NIC setup and initialization.
> > > > The suggested fix moves services startup after NIC setup phase.
> > > >
> > > > Bugzilla ID: 1390
> > > > Fixes: 8bd537e9c6cf ("examples/l3fwd: add service core setup based on
> > > > caps")
> > > > Cc: sta...@dpdk.org
> > > >
> > > > Signed-off-by: Konstantin Ananyev 
> > > > Signed-off-by: Konstantin Ananyev 
> 
> Acked-by: Pavan Nikhilesh 
> 
> > > > ---
> > > >  examples/l3fwd/main.c | 6 +-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
> > > > index 3bf28aec0c..d4fb5d1971 100644
> > > > --- a/examples/l3fwd/main.c
> > > > +++ b/examples/l3fwd/main.c
> > > > @@ -1577,7 +1577,6 @@ main(int argc, char **argv)
> > > > l3fwd_lkp.main_loop = 
> > > > evt_rsrc->ops.fib_event_loop;
> > > > else
> > > > l3fwd_lkp.main_loop = evt_rsrc-
> > > > >ops.lpm_event_loop;
> > > > -   l3fwd_event_service_setup();
> > > > } else
> > > >  #endif
> > > > l3fwd_poll_resource_setup();
> > > > @@ -1609,6 +1608,11 @@ main(int argc, char **argv)
> > > > }
> > > > }
> > > >
> > > > +#ifdef RTE_LIB_EVENTDEV
> > >
> > > Is the ifdef required?
> >
> > Well, right now l3fwd_event_service_setup() is defined only when
> > RTE_LIB_EVENTDEV is defined, see examples/l3fwd/main.c.
> > So, I suppose, yes.
> >
> 
> My bad I was looking at wrong DPDK version (22.11).

NP, thank you for taking a look.
As a FYI, I filled 2 more bugs on a similar subject (l3fwd event mode):
https://bugs.dpdk.org/show_bug.cgi?id=1393
https://bugs.dpdk.org/show_bug.cgi?id=1391
If you have bandwidth to have a look and provide some feedback,
would be great. 

> 
> > >
> > > > +   if (evt_rsrc->enabled)
> > > > +   l3fwd_event_service_setup();
> > > > +#endif
> > > > +
> > > > printf("\n");
> > > >
> > > > for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> > > > --
> > > > 2.35.3



Re: [PATCH 0/7] vhost: FD manager improvements

2024-03-04 Thread David Marchand
On Thu, Feb 29, 2024 at 1:25 PM Maxime Coquelin
 wrote:
>
> This series aims at improving the Vhost FD manager.
>
> First patch is a fix necessary to have VDUSE devices
> destroy to work. I expect it to be taken into v24.03
> release.
>
> The rest of the series are various improvements to the
> FD manager that can wait v24.07 release.
>
> Maxime Coquelin (7):
>   vhost: fix VDUSE device destruction failure
>   vhost: rename polling mutex
>   vhost: make use of FD manager init function
>   vhost: hide synchronization within FD manager
>   vhost: improve fdset initialization
>   vhost: convert fdset sync to eventfd
>   vhost: improve FD manager logging
>
>  lib/vhost/fd_man.c  | 313 +--
>  lib/vhost/fd_man.c.orig | 538 
>  lib/vhost/fd_man.h  |  41 +--
>  lib/vhost/socket.c  |  37 +--
>  lib/vhost/vduse.c   |  51 +---
>  5 files changed, 800 insertions(+), 180 deletions(-)
>  create mode 100644 lib/vhost/fd_man.c.orig

I marked all but the first patch as Deferred.
I'll send a new revision of the fix to address the deadlock revealed by CI.


-- 
David Marchand



[PATCH v2] vhost: fix VDUSE device destruction failure

2024-03-04 Thread David Marchand
From: Maxime Coquelin 

VDUSE_DESTROY_DEVICE ioctl can fail because the device's
chardev is not released despite close syscall having been
called. It happens because the events handler thread is
still polling the file descriptor.

fdset_pipe_notify() is not enough because it does not
ensure the notification has been handled by the event
thread, it just returns once the notification is sent.

To fix this, this patch introduces a synchronization
mechanism based on pthread's condition, so that
fdset_pipe_notify_sync() only returns once the pipe's
read callback has been executed.

Fixes: 51d018fdac4e ("vhost: add VDUSE events handler")
Cc: sta...@dpdk.org

Signed-off-by: Maxime Coquelin 
Signed-off-by: David Marchand 
---
Changes since v1:
- sync'd only when in VDUSE destruction path,
- added explicit init of sync_mutex,

---
 lib/vhost/fd_man.c | 23 +--
 lib/vhost/fd_man.h |  6 ++
 lib/vhost/socket.c |  1 +
 lib/vhost/vduse.c  |  3 ++-
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
index 79a8d2c006..481e6b900a 100644
--- a/lib/vhost/fd_man.c
+++ b/lib/vhost/fd_man.c
@@ -309,10 +309,11 @@ fdset_event_dispatch(void *arg)
 }
 
 static void
-fdset_pipe_read_cb(int readfd, void *dat __rte_unused,
+fdset_pipe_read_cb(int readfd, void *dat,
   int *remove __rte_unused)
 {
char charbuf[16];
+   struct fdset *fdset = dat;
int r = read(readfd, charbuf, sizeof(charbuf));
/*
 * Just an optimization, we don't care if read() failed
@@ -320,6 +321,11 @@ fdset_pipe_read_cb(int readfd, void *dat __rte_unused,
 * compiler happy
 */
RTE_SET_USED(r);
+
+   pthread_mutex_lock(&fdset->sync_mutex);
+   fdset->sync = true;
+   pthread_cond_broadcast(&fdset->sync_cond);
+   pthread_mutex_unlock(&fdset->sync_mutex);
 }
 
 void
@@ -342,7 +348,7 @@ fdset_pipe_init(struct fdset *fdset)
}
 
ret = fdset_add(fdset, fdset->u.readfd,
-   fdset_pipe_read_cb, NULL, NULL);
+   fdset_pipe_read_cb, NULL, fdset);
 
if (ret < 0) {
VHOST_FDMAN_LOG(ERR,
@@ -366,5 +372,18 @@ fdset_pipe_notify(struct fdset *fdset)
 * compiler happy
 */
RTE_SET_USED(r);
+}
+
+void
+fdset_pipe_notify_sync(struct fdset *fdset)
+{
+   pthread_mutex_lock(&fdset->sync_mutex);
+
+   fdset->sync = false;
+   fdset_pipe_notify(fdset);
+
+   while (!fdset->sync)
+   pthread_cond_wait(&fdset->sync_cond, &fdset->sync_mutex);
 
+   pthread_mutex_unlock(&fdset->sync_mutex);
 }
diff --git a/lib/vhost/fd_man.h b/lib/vhost/fd_man.h
index 6315904c8e..7816fb11ac 100644
--- a/lib/vhost/fd_man.h
+++ b/lib/vhost/fd_man.h
@@ -6,6 +6,7 @@
 #define _FD_MAN_H_
 #include 
 #include 
+#include 
 
 #define MAX_FDS 1024
 
@@ -35,6 +36,10 @@ struct fdset {
int writefd;
};
} u;
+
+   pthread_mutex_t sync_mutex;
+   pthread_cond_t sync_cond;
+   bool sync;
 };
 
 
@@ -53,5 +58,6 @@ int fdset_pipe_init(struct fdset *fdset);
 void fdset_pipe_uninit(struct fdset *fdset);
 
 void fdset_pipe_notify(struct fdset *fdset);
+void fdset_pipe_notify_sync(struct fdset *fdset);
 
 #endif
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index a2fdac30a4..96b3ab5595 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -93,6 +93,7 @@ static struct vhost_user vhost_user = {
.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
.fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
+   .sync_mutex = PTHREAD_MUTEX_INITIALIZER,
.num = 0
},
.vsocket_cnt = 0,
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index d462428d2c..e0c6991b69 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -36,6 +36,7 @@ static struct vduse vduse = {
.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
.fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
+   .sync_mutex = PTHREAD_MUTEX_INITIALIZER,
.num = 0
},
 };
@@ -618,7 +619,7 @@ vduse_device_destroy(const char *path)
vduse_device_stop(dev);
 
fdset_del(&vduse.fdset, dev->vduse_dev_fd);
-   fdset_pipe_notify(&vduse.fdset);
+   fdset_pipe_notify_sync(&vduse.fdset);
 
if (dev->vduse_dev_fd >= 0) {
close(dev->vduse_dev_fd);
-- 
2.43.0



Re: [PATCH v6] net/cnxk: support Tx queue descriptor count

2024-03-04 Thread Jerin Jacob
On Mon, Mar 4, 2024 at 3:30 PM  wrote:
>
> From: Satha Rao 
>
> Added CNXK APIs to get used txq descriptor count.
>
> Signed-off-by: Satha Rao 

Applied to dpdk-next-net-mrvl/for-main. Thanks


> ---
>
> Depends-on: series-30833 ("ethdev: support Tx queue used count")
>
> v2:
>   Updated release notes and fixed API for CPT queues.
> v3:
>   Addressed review comments
> v5:
>   Fixed compilation errors
> v6:
>   Fixed checkpatch
>
>  doc/guides/nics/features/cnxk.ini  |  1 +
>  doc/guides/rel_notes/release_24_03.rst |  1 +
>  drivers/net/cnxk/cn10k_tx_select.c | 22 ++
>  drivers/net/cnxk/cn9k_tx_select.c  | 23 +++
>  drivers/net/cnxk/cnxk_ethdev.h | 25 +
>  5 files changed, 72 insertions(+)
>
> diff --git a/doc/guides/nics/features/cnxk.ini 
> b/doc/guides/nics/features/cnxk.ini
> index b5d9f7e..1c8db1a 100644
> --- a/doc/guides/nics/features/cnxk.ini
> +++ b/doc/guides/nics/features/cnxk.ini
> @@ -40,6 +40,7 @@ Timesync = Y
>  Timestamp offload= Y
>  Rx descriptor status = Y
>  Tx descriptor status = Y
> +Tx queue count   = Y
>  Basic stats  = Y
>  Stats per queue  = Y
>  Extended stats   = Y
> diff --git a/doc/guides/rel_notes/release_24_03.rst 
> b/doc/guides/rel_notes/release_24_03.rst
> index 2b160cf..b1942b5 100644
> --- a/doc/guides/rel_notes/release_24_03.rst
> +++ b/doc/guides/rel_notes/release_24_03.rst
> @@ -113,6 +113,7 @@ New Features
>* Added support for Rx inject.
>* Optimized SW external mbuf free for better performance and avoid SQ 
> corruption.
>* Added support for port representors.
> +  * Added support for ``rte_eth_tx_queue_count``.
>
>  * **Updated Marvell OCTEON EP driver.**
>
> diff --git a/drivers/net/cnxk/cn10k_tx_select.c 
> b/drivers/net/cnxk/cn10k_tx_select.c
> index 404f5ba..aa0620e 100644
> --- a/drivers/net/cnxk/cn10k_tx_select.c
> +++ b/drivers/net/cnxk/cn10k_tx_select.c
> @@ -20,6 +20,24 @@
> eth_dev->tx_pkt_burst;
>  }
>
> +#if defined(RTE_ARCH_ARM64)
> +static int
> +cn10k_nix_tx_queue_count(void *tx_queue)
> +{
> +   struct cn10k_eth_txq *txq = (struct cn10k_eth_txq *)tx_queue;
> +
> +   return cnxk_nix_tx_queue_count(txq->fc_mem, txq->sqes_per_sqb_log2);
> +}
> +
> +static int
> +cn10k_nix_tx_queue_sec_count(void *tx_queue)
> +{
> +   struct cn10k_eth_txq *txq = (struct cn10k_eth_txq *)tx_queue;
> +
> +   return cnxk_nix_tx_queue_sec_count(txq->fc_mem, 
> txq->sqes_per_sqb_log2, txq->cpt_fc);
> +}
> +#endif
> +
>  void
>  cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
>  {
> @@ -63,6 +81,10 @@
> if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
> pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg);
> }
> +   if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
> +   eth_dev->tx_queue_count = cn10k_nix_tx_queue_sec_count;
> +   else
> +   eth_dev->tx_queue_count = cn10k_nix_tx_queue_count;
>
> rte_mb();
>  #else
> diff --git a/drivers/net/cnxk/cn9k_tx_select.c 
> b/drivers/net/cnxk/cn9k_tx_select.c
> index e08883f..5ecf919 100644
> --- a/drivers/net/cnxk/cn9k_tx_select.c
> +++ b/drivers/net/cnxk/cn9k_tx_select.c
> @@ -20,6 +20,24 @@
> eth_dev->tx_pkt_burst;
>  }
>
> +#if defined(RTE_ARCH_ARM64)
> +static int
> +cn9k_nix_tx_queue_count(void *tx_queue)
> +{
> +   struct cn9k_eth_txq *txq = (struct cn9k_eth_txq *)tx_queue;
> +
> +   return cnxk_nix_tx_queue_count(txq->fc_mem, txq->sqes_per_sqb_log2);
> +}
> +
> +static int
> +cn9k_nix_tx_queue_sec_count(void *tx_queue)
> +{
> +   struct cn9k_eth_txq *txq = (struct cn9k_eth_txq *)tx_queue;
> +
> +   return cnxk_nix_tx_queue_sec_count(txq->fc_mem, 
> txq->sqes_per_sqb_log2, txq->cpt_fc);
> +}
> +#endif
> +
>  void
>  cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
>  {
> @@ -59,6 +77,11 @@
> if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
> pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg);
> }
> +   if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
> +   eth_dev->tx_queue_count = cn9k_nix_tx_queue_sec_count;
> +   else
> +   eth_dev->tx_queue_count = cn9k_nix_tx_queue_count;
> +
>
> rte_mb();
>  #else
> diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h
> index 5d42e13..5e04064 100644
> --- a/drivers/net/cnxk/cnxk_ethdev.h
> +++ b/drivers/net/cnxk/cnxk_ethdev.h
> @@ -464,6 +464,31 @@ struct cnxk_eth_txq_sp {
> return ((struct cnxk_eth_txq_sp *)__txq) - 1;
>  }
>
> +static inline int
> +cnxk_nix_tx_queue_count(uint64_t *mem, uint16_t sqes_per_sqb_log2)
> +{
> +   uint64_t val;
> +
> +   val = rte_atomic_load_explicit((RTE_ATOMIC(uint64_t)*)mem, 
> rte_memory_order_relaxed);
> +   val = (val << sqes_per_sqb_log2) - val;
> +
> +   return (val & 0x);
> +}
> +
> +static inlin

Re: [PATCH v6] net/i40e: add diagnostic support in Tx path

2024-03-04 Thread Bruce Richardson
On Mon, Mar 04, 2024 at 09:33:21AM +, Mingjin Ye wrote:
> Implemented a Tx wrapper to perform a thorough check on mbufs,
> categorizing and counting invalid cases by type for diagnostic
> purposes. The count of invalid cases is accessible through xstats_get.
> 
> Also, the devarg option "mbuf_check" was introduced to configure the
> diagnostic parameters to enable the appropriate diagnostic features.
> 
> supported cases: mbuf, size, segment, offload.
>  1. mbuf: Check for corrupted mbuf.
>  2. size: Check min/max packet length according to HW spec.
>  3. segment: Check number of mbuf segments not exceed HW limits.
>  4. offload: Check for use of an unsupported offload flag.
> 
> parameter format: "mbuf_check=" or "mbuf_check=[,]"
> eg: dpdk-testpmd -a :87:00.0,mbuf_check=[mbuf,size] -- -i
> 
> Signed-off-by: Mingjin Ye 

Review comments inline below, thanks.

This implementation seems more complex than the previous iavf one that I
previously reviewed and merged. This includes more changes to the TX path
selection logic, so it would be worthwhile including a note about that in
the commit log.

/Bruce

> ---
> v2: remove strict.
> ---
> v3: optimised.
> ---
> v4: rebase.
> ---
> v5: fix ci error.
> ---
> v6: Changes the commit log.
> ---
>  doc/guides/nics/i40e.rst   |  14 +++
>  drivers/net/i40e/i40e_ethdev.c | 138 -
>  drivers/net/i40e/i40e_ethdev.h |  28 ++
>  drivers/net/i40e/i40e_rxtx.c   | 153 +++--
>  drivers/net/i40e/i40e_rxtx.h   |   2 +
>  5 files changed, 327 insertions(+), 8 deletions(-)
> 
> diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
> index 15689ac958..91b45e1d40 100644
> --- a/doc/guides/nics/i40e.rst
> +++ b/doc/guides/nics/i40e.rst
> @@ -275,6 +275,20 @@ Runtime Configuration
>  
>-a 84:00.0,vf_msg_cfg=80@120:180
>  
> +- ``Support TX diagnostics`` (default ``not enabled``)
> +
> +  Set the ``devargs`` parameter ``mbuf_check`` to enable TX diagnostics.
> +  For example, ``-a 87:00.0,mbuf_check=`` or ``-a 
> 87:00.0,mbuf_check=[,...]``.
> +  Thereafter, ``rte_eth_xstats_get()`` can be used to get the error counts,
> +  which are collected in ``tx_mbuf_error_packets`` xstats.
> +  In testpmd these can be shown via: ``testpmd> show port xstats all``.
> +  Supported values for the ``case`` parameter are:
> +
> +  *   mbuf: Check for corrupted mbuf.
> +  *   size: Check min/max packet length according to HW spec.
> +  *   segment: Check number of mbuf segments does not exceed HW limits.
> +  *   offload: Check for use of an unsupported offload flag.
> +
>  Vector RX Pre-conditions
>  
>  For Vector RX it is assumed that the number of descriptor rings will be a 
> power
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index 4d21341382..3e2ddcaa3e 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -48,6 +48,7 @@
>  #define ETH_I40E_SUPPORT_MULTI_DRIVER"support-multi-driver"
>  #define ETH_I40E_QUEUE_NUM_PER_VF_ARG"queue-num-per-vf"
>  #define ETH_I40E_VF_MSG_CFG  "vf_msg_cfg"
> +#define ETH_I40E_MBUF_CHECK_ARG   "mbuf_check"
>  
>  #define I40E_CLEAR_PXE_WAIT_MS 200
>  #define I40E_VSI_TSR_QINQ_STRIP  0x4010
> @@ -412,6 +413,7 @@ static const char *const valid_keys[] = {
>   ETH_I40E_SUPPORT_MULTI_DRIVER,
>   ETH_I40E_QUEUE_NUM_PER_VF_ARG,
>   ETH_I40E_VF_MSG_CFG,
> + ETH_I40E_MBUF_CHECK_ARG,
>   NULL};
>  
>  static const struct rte_pci_id pci_id_i40e_map[] = {
> @@ -545,6 +547,14 @@ static const struct rte_i40e_xstats_name_off 
> rte_i40e_stats_strings[] = {
>  #define I40E_NB_ETH_XSTATS (sizeof(rte_i40e_stats_strings) / \
>   sizeof(rte_i40e_stats_strings[0]))
>  
> +static const struct rte_i40e_xstats_name_off i40e_mbuf_strings[] = {
> + {"tx_mbuf_error_packets", offsetof(struct i40e_mbuf_stats,
> + tx_pkt_errors)},
> +};
> +
> +#define I40E_NB_MBUF_XSTATS (sizeof(i40e_mbuf_strings) / \
> + sizeof(i40e_mbuf_strings[0]))
> +
>  static const struct rte_i40e_xstats_name_off rte_i40e_hw_port_strings[] = {
>   {"tx_link_down_dropped", offsetof(struct i40e_hw_port_stats,
>   tx_dropped_link_down)},
> @@ -1373,6 +1383,88 @@ read_vf_msg_config(__rte_unused const char *key,
>   return 0;
>  }
>  
> +static int
> +read_mbuf_check_config(__rte_unused const char *key, const char *value, void 
> *args)
> +{
> + char *cur;
> + char *tmp;
> + int str_len;
> + int valid_len;
> +
> + int ret = 0;
> + uint64_t *mc_flags = args;
> + char *str2 = strdup(value);
> + if (str2 == NULL)
> + return -1;
> +
> + str_len = strlen(str2);
> + if (str2[0] == '[' && str2[str_len - 1] == ']') {
> + if (str_len < 3) {
> + ret = -1;
> + goto mdd_end;
> + }
> + valid_len = str_len - 2;
> +   

[PATCH v2 00/33] net/ena: v2.9.0 driver release

2024-03-04 Thread shaibran
From: Shai Brandes 

Hi all, the ena v2.9.0 release introduces:
1. HAL upgrade:
   - renamed the 'base' folder to be 'hal'
   - separated the HAL patches instead of a bulk update.
2. Restructured ena stats and metrics.
3. Restructured the LLQ configuration:
   - configurable via devarg.
   - support device recommendation.
   - restructure the logic in driver.
4. Added support for the admin queue to work only in poll-mode
   - configurable via devarg.
   - allows to bind ports to uio_pci_generic kernel driver.
5. Reworked the device close to exhaust interrupt callbacks and alarms.
6. Fixed a bug in fast mbuf free.
Best regards.

---
v2:
* Fixed minor spelling issues from checkpatch

Shai Brandes (33):
  net/ena: rework the metrics multi-process functions
  net/ena: report new supported link speed capabilities
  net/ena: update imissed stat with Rx overruns
  net/ena: sub-optimal configuration notifications support
  net/ena: fix fast mbuf free
  net/ena: rename base folder to hal
  net/ena: restructure the llq policy setting process
  net/ena/hal: exponential backoff exp limit
  net/ena/hal: add a new csum offload bit
  net/ena/hal: added a bus parameter to ena memcpy macro
  net/ena/hal: optimize Rx ring submission queue
  net/ena/hal: rename fields in completion descriptors
  net/ena/hal: use correct read once on u8 field
  net/ena/hal: add completion descriptor corruption check
  net/ena/hal: malformed Tx descriptor error reason
  net/ena/hal: phc feature modifications
  net/ena/hal: restructure interrupt handling
  net/ena/hal: add unlikely to error checks
  net/ena/hal: missing admin interrupt reset reason
  net/ena/hal: check for existing keep alive notification
  net/ena/hal: modify memory barrier comment
  net/ena/hal: rework Rx ring submission queue
  net/ena/hal: remove operating system type enum
  net/ena/hal: handle command abort
  net/ena/hal: add support for device reset request
  net/ena: cosmetic changes
  net/ena/hal: modify customer metrics memory management
  net/ena/hal: cosmetic changes
  net/ena: update device-preferred size of rings
  net/ena: exhaust interrupt callbacks in device close
  net/ena: support max large llq depth from the device
  net/ena: control path pure polling mode
  net/ena: upgrade driver version to 2.9.0

 doc/guides/nics/ena.rst   |  61 ++--
 doc/guides/rel_notes/release_24_03.rst|  11 +
 drivers/net/ena/ena_ethdev.c  | 316 --
 drivers/net/ena/ena_ethdev.h  |  17 +-
 drivers/net/ena/{base => hal}/ena_com.c   | 240 +
 drivers/net/ena/{base => hal}/ena_com.h   |  53 ++-
 .../{base => hal}/ena_defs/ena_admin_defs.h   |  92 +++--
 .../{base => hal}/ena_defs/ena_common_defs.h  |   0
 .../{base => hal}/ena_defs/ena_eth_io_defs.h  |  49 ++-
 .../ena/{base => hal}/ena_defs/ena_gen_info.h |   0
 .../ena/{base => hal}/ena_defs/ena_includes.h |   0
 .../{base => hal}/ena_defs/ena_regs_defs.h|   3 +
 drivers/net/ena/{base => hal}/ena_eth_com.c   |  56 ++--
 drivers/net/ena/{base => hal}/ena_eth_com.h   |  14 +-
 drivers/net/ena/{base => hal}/ena_plat.h  |   0
 drivers/net/ena/{base => hal}/ena_plat_dpdk.h |   9 +-
 drivers/net/ena/meson.build   |   6 +-
 17 files changed, 669 insertions(+), 258 deletions(-)
 rename drivers/net/ena/{base => hal}/ena_com.c (94%)
 rename drivers/net/ena/{base => hal}/ena_com.h (96%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_admin_defs.h (96%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_common_defs.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_eth_io_defs.h (95%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_gen_info.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_includes.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_regs_defs.h (97%)
 rename drivers/net/ena/{base => hal}/ena_eth_com.c (93%)
 rename drivers/net/ena/{base => hal}/ena_eth_com.h (94%)
 rename drivers/net/ena/{base => hal}/ena_plat.h (100%)
 rename drivers/net/ena/{base => hal}/ena_plat_dpdk.h (97%)

-- 
2.17.1



[PATCH v2 01/33] net/ena: rework the metrics multi-process functions

2024-03-04 Thread shaibran
From: Shai Brandes 

1. Changed the rte_memcpy call to use the precomputed buf_size.
2. Removed redundant address operators (ampersand symbol)
   when providing memcpy source address parameter.
3. Code style related change.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index beb17c4125..6d500bfa78 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -531,8 +531,8 @@ __extension__ ({
 __extension__ ({
ENA_TOUCH(rsp);
ENA_TOUCH(ena_dev);
-   if (stats != (struct ena_admin_eni_stats *)&adapter->metrics_stats)
-   rte_memcpy(stats, &adapter->metrics_stats, sizeof(*stats));
+   if (stats != (struct ena_admin_eni_stats *)adapter->metrics_stats)
+   rte_memcpy(stats, adapter->metrics_stats, sizeof(*stats));
 }),
struct ena_com_dev *ena_dev, struct ena_admin_eni_stats *stats);
 
@@ -590,9 +590,8 @@ __extension__ ({
 __extension__ ({
ENA_TOUCH(rsp);
ENA_TOUCH(ena_dev);
-   ENA_TOUCH(buf_size);
-   if (buf != (char *)&adapter->metrics_stats)
-   rte_memcpy(buf, &adapter->metrics_stats, adapter->metrics_num * 
sizeof(uint64_t));
+   if (buf != (char *)adapter->metrics_stats)
+   rte_memcpy(buf, adapter->metrics_stats, buf_size);
 }),
struct ena_com_dev *ena_dev, char *buf, size_t buf_size);
 
@@ -4088,7 +4087,7 @@ ena_mp_primary_handle(const struct rte_mp_msg *mp_msg, 
const void *peer)
case ENA_MP_CUSTOMER_METRICS_GET:
res = ena_com_get_customer_metrics(ena_dev,
(char *)adapter->metrics_stats,
-   sizeof(uint64_t) * adapter->metrics_num);
+   adapter->metrics_num * sizeof(uint64_t));
break;
case ENA_MP_SRD_STATS_GET:
res = ena_com_get_ena_srd_info(ena_dev,
-- 
2.17.1



[PATCH v2 03/33] net/ena: update imissed stat with Rx overruns

2024-03-04 Thread shaibran
From: Shai Brandes 

Depending on its acceleration support, the device updates
a different statistic when an ingress packet is dropped
because no buffers are available to hold it.
- In AWS instance types from later generations
'rx_overruns' is updated.
- Otherwise, in legacy instance types,
'rx_dropped_cnt' is updated.

That is, there is no need to report rx_overruns separately
as an xstat and the driver can simply sum up the two
self-contained counters as the 'imissed' statistic.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_24_03.rst | 4 
 drivers/net/ena/ena_ethdev.c   | 8 +---
 drivers/net/ena/ena_ethdev.h   | 1 -
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 879bb4944c..fb66d67d32 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -101,6 +101,10 @@ New Features
   * ``rte_flow_template_table_resize_complete()``.
 Complete table resize.
 
+* **Updated Amazon ena (Elastic Network Adapter) net driver.**
+
+  * Removed the reporting of `rx_overruns` errors from xstats and instead 
updated `imissed` stat with its value.
+
 * **Updated Atomic Rules' Arkville driver.**
 
   * Added support for Atomic Rules' TK242 packet-capture family of devices
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index b1e7de0541..d3f395a832 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -93,7 +93,6 @@ static const struct ena_stats ena_stats_global_strings[] = {
ENA_STAT_GLOBAL_ENTRY(dev_start),
ENA_STAT_GLOBAL_ENTRY(dev_stop),
ENA_STAT_GLOBAL_ENTRY(tx_drops),
-   ENA_STAT_GLOBAL_ENTRY(rx_overruns),
 };
 
 /*
@@ -4014,9 +4013,12 @@ static void ena_keep_alive(void *adapter_data,
tx_drops = ((uint64_t)desc->tx_drops_high << 32) | desc->tx_drops_low;
rx_overruns = ((uint64_t)desc->rx_overruns_high << 32) | 
desc->rx_overruns_low;
 
-   adapter->drv_stats->rx_drops = rx_drops;
+   /*
+* Depending on its acceleration support, the device updates a 
different statistic when
+* Rx packet is dropped because there are no available buffers to 
accommodate it.
+*/
+   adapter->drv_stats->rx_drops = rx_drops + rx_overruns;
adapter->dev_stats.tx_drops = tx_drops;
-   adapter->dev_stats.rx_overruns = rx_overruns;
 }
 
 /**
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 4988fbffb5..20b8307836 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -219,7 +219,6 @@ struct ena_stats_dev {
 * As a workaround it is being published as an extended statistic.
 */
u64 tx_drops;
-   u64 rx_overruns;
 };
 
 struct ena_stats_metrics {
-- 
2.17.1



[PATCH v2 02/33] net/ena: report new supported link speed capabilities

2024-03-04 Thread shaibran
From: Shai Brandes 

Updated the rte_eth_dev_info device supported speed
bitmap to include 200Gbps and 400Gbps capabilities.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 6d500bfa78..b1e7de0541 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -2542,7 +2542,9 @@ static int ena_infos_get(struct rte_eth_dev *dev,
RTE_ETH_LINK_SPEED_25G  |
RTE_ETH_LINK_SPEED_40G  |
RTE_ETH_LINK_SPEED_50G  |
-   RTE_ETH_LINK_SPEED_100G;
+   RTE_ETH_LINK_SPEED_100G |
+   RTE_ETH_LINK_SPEED_200G |
+   RTE_ETH_LINK_SPEED_400G;
 
/* Inform framework about available features */
dev_info->rx_offload_capa = ena_get_rx_port_offloads(adapter);
-- 
2.17.1



[PATCH v2 05/33] net/ena: fix fast mbuf free

2024-03-04 Thread shaibran
From: Shai Brandes 

In case the application enables fast mbuf release optimization,
the driver releases 256 TX mbufs in bulk upon reaching the
TX free threshold.
The existing implementation utilizes rte_mempool_put_bulk for bulk
freeing TXs, which exclusively supports direct mbufs.
In case the application transmits indirect bufs, the driver must
also decrement the mbuf reference count and unlink the mbuf segment.
For such case, the driver should employ rte_pktmbuf_free_bulk.

Fixes: c339f53823f3 ("net/ena: support fast mbuf free")
Cc: sta...@dpdk.org

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_24_03.rst | 1 +
 drivers/net/ena/ena_ethdev.c   | 6 ++
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index f47073c7dc..6b73d4fedf 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -105,6 +105,7 @@ New Features
 
   * Removed the reporting of `rx_overruns` errors from xstats and instead 
updated `imissed` stat with its value.
   * Added support for sub-optimal configuration notifications from the device.
+  * Restructured fast release of mbufs when RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE 
optimization is enabled.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 3157237c0d..537ee9f8c3 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -3122,8 +3122,7 @@ ena_tx_cleanup_mbuf_fast(struct rte_mbuf **mbufs_to_clean,
m_next = mbuf->next;
mbufs_to_clean[mbuf_cnt++] = mbuf;
if (mbuf_cnt == buf_size) {
-   rte_mempool_put_bulk(mbufs_to_clean[0]->pool, (void 
**)mbufs_to_clean,
-   (unsigned int)mbuf_cnt);
+   rte_pktmbuf_free_bulk(mbufs_to_clean, mbuf_cnt);
mbuf_cnt = 0;
}
mbuf = m_next;
@@ -3191,8 +3190,7 @@ static int ena_tx_cleanup(void *txp, uint32_t 
free_pkt_cnt)
}
 
if (mbuf_cnt != 0)
-   rte_mempool_put_bulk(mbufs_to_clean[0]->pool,
-   (void **)mbufs_to_clean, mbuf_cnt);
+   rte_pktmbuf_free_bulk(mbufs_to_clean, mbuf_cnt);
 
/* Notify completion handler that full cleanup was performed */
if (free_pkt_cnt == 0 || total_tx_pkts < cleanup_budget)
-- 
2.17.1



[PATCH v2 06/33] net/ena: rename base folder to hal

2024-03-04 Thread shaibran
From: Shai Brandes 

Changed the base HAL folder to hal.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/{base => hal}/ena_com.c  | 0
 drivers/net/ena/{base => hal}/ena_com.h  | 0
 drivers/net/ena/{base => hal}/ena_defs/ena_admin_defs.h  | 0
 drivers/net/ena/{base => hal}/ena_defs/ena_common_defs.h | 0
 drivers/net/ena/{base => hal}/ena_defs/ena_eth_io_defs.h | 0
 drivers/net/ena/{base => hal}/ena_defs/ena_gen_info.h| 0
 drivers/net/ena/{base => hal}/ena_defs/ena_includes.h| 0
 drivers/net/ena/{base => hal}/ena_defs/ena_regs_defs.h   | 0
 drivers/net/ena/{base => hal}/ena_eth_com.c  | 0
 drivers/net/ena/{base => hal}/ena_eth_com.h  | 0
 drivers/net/ena/{base => hal}/ena_plat.h | 0
 drivers/net/ena/{base => hal}/ena_plat_dpdk.h| 0
 drivers/net/ena/meson.build  | 6 +++---
 13 files changed, 3 insertions(+), 3 deletions(-)
 rename drivers/net/ena/{base => hal}/ena_com.c (100%)
 rename drivers/net/ena/{base => hal}/ena_com.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_admin_defs.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_common_defs.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_eth_io_defs.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_gen_info.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_includes.h (100%)
 rename drivers/net/ena/{base => hal}/ena_defs/ena_regs_defs.h (100%)
 rename drivers/net/ena/{base => hal}/ena_eth_com.c (100%)
 rename drivers/net/ena/{base => hal}/ena_eth_com.h (100%)
 rename drivers/net/ena/{base => hal}/ena_plat.h (100%)
 rename drivers/net/ena/{base => hal}/ena_plat_dpdk.h (100%)

diff --git a/drivers/net/ena/base/ena_com.c b/drivers/net/ena/hal/ena_com.c
similarity index 100%
rename from drivers/net/ena/base/ena_com.c
rename to drivers/net/ena/hal/ena_com.c
diff --git a/drivers/net/ena/base/ena_com.h b/drivers/net/ena/hal/ena_com.h
similarity index 100%
rename from drivers/net/ena/base/ena_com.h
rename to drivers/net/ena/hal/ena_com.h
diff --git a/drivers/net/ena/base/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_admin_defs.h
rename to drivers/net/ena/hal/ena_defs/ena_admin_defs.h
diff --git a/drivers/net/ena/base/ena_defs/ena_common_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_common_defs.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_common_defs.h
rename to drivers/net/ena/hal/ena_defs/ena_common_defs.h
diff --git a/drivers/net/ena/base/ena_defs/ena_eth_io_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_eth_io_defs.h
rename to drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h
diff --git a/drivers/net/ena/base/ena_defs/ena_gen_info.h 
b/drivers/net/ena/hal/ena_defs/ena_gen_info.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_gen_info.h
rename to drivers/net/ena/hal/ena_defs/ena_gen_info.h
diff --git a/drivers/net/ena/base/ena_defs/ena_includes.h 
b/drivers/net/ena/hal/ena_defs/ena_includes.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_includes.h
rename to drivers/net/ena/hal/ena_defs/ena_includes.h
diff --git a/drivers/net/ena/base/ena_defs/ena_regs_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
similarity index 100%
rename from drivers/net/ena/base/ena_defs/ena_regs_defs.h
rename to drivers/net/ena/hal/ena_defs/ena_regs_defs.h
diff --git a/drivers/net/ena/base/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
similarity index 100%
rename from drivers/net/ena/base/ena_eth_com.c
rename to drivers/net/ena/hal/ena_eth_com.c
diff --git a/drivers/net/ena/base/ena_eth_com.h 
b/drivers/net/ena/hal/ena_eth_com.h
similarity index 100%
rename from drivers/net/ena/base/ena_eth_com.h
rename to drivers/net/ena/hal/ena_eth_com.h
diff --git a/drivers/net/ena/base/ena_plat.h b/drivers/net/ena/hal/ena_plat.h
similarity index 100%
rename from drivers/net/ena/base/ena_plat.h
rename to drivers/net/ena/hal/ena_plat.h
diff --git a/drivers/net/ena/base/ena_plat_dpdk.h 
b/drivers/net/ena/hal/ena_plat_dpdk.h
similarity index 100%
rename from drivers/net/ena/base/ena_plat_dpdk.h
rename to drivers/net/ena/hal/ena_plat_dpdk.h
diff --git a/drivers/net/ena/meson.build b/drivers/net/ena/meson.build
index d02ed3f64f..c41f1b04a0 100644
--- a/drivers/net/ena/meson.build
+++ b/drivers/net/ena/meson.build
@@ -10,10 +10,10 @@ endif
 sources = files(
 'ena_ethdev.c',
 'ena_rss.c',
-'base/ena_com.c',
-'base/ena_eth_com.c',
+'hal/ena_com.c',
+'hal/ena_eth_com.c',
 )
 
 deps += ['timer']
 
-includes += include_directories('base', 'base/ena_defs')
+includes += include_directories('hal', 'hal/ena_defs')
-- 
2.17.1



[PATCH v2 07/33] net/ena: restructure the llq policy setting process

2024-03-04 Thread shaibran
From: Shai Brandes 

The driver will set the size of the LLQ header size according to the
recommendation from the device.
Replaced `enable_llq` and `large_llq_hdr` devargs with
a new devarg `llq_policy` that accepts the following values:
0 - Disable LLQ.
Use with extreme caution as it leads to a huge performance
degradation on AWS instances from 6th generation onwards.
1 - Accept device recommended LLQ policy (Default).
Device can recommend normal or large LLQ policy.
2 - Enforce normal LLQ policy.
3 - Enforce large LLQ policy.
Required for packets with header that exceed 96 bytes on
AWS instances prior to 5th generation.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/nics/ena.rst|  21 ++---
 doc/guides/rel_notes/release_24_03.rst |   1 +
 drivers/net/ena/ena_ethdev.c   | 110 +
 drivers/net/ena/ena_ethdev.h   |  11 ++-
 4 files changed, 77 insertions(+), 66 deletions(-)

diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index b039e75ead..53c9341859 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -107,11 +107,15 @@ Configuration
 Runtime Configuration
 ^
 
-   * **large_llq_hdr** (default 0)
+   * **llq_policy** (default 1)
 
- Enables or disables usage of large LLQ headers. This option will have
- effect only if the device also supports large LLQ headers. Otherwise, the
- default value will be used.
+ Controls whether use device recommended header policy or override it.
+ 0 - Disable LLQ.
+ **Use with extreme caution as it leads to a huge performance
+ degradation on AWS instances from 6th generation onwards.**
+ 1 - Accept device recommended LLQ policy (Default).
+ 2 - Enforce normal LLQ policy.
+ 3 - Enforce large LLQ policy.
 
* **miss_txc_to** (default 5)
 
@@ -122,15 +126,6 @@ Runtime Configuration
  timer service. Setting this parameter to 0 disables this feature. Maximum
  allowed value is 60 seconds.
 
-   * **enable_llq** (default 1)
-
- Determines whenever the driver should use the LLQ (if it's available) or
- not.
-
- **NOTE: On the 6th generation AWS instances disabling LLQ may lead to a
- huge performance degradation. In general disabling LLQ is highly not
- recommended!**
-
 ENA Configuration Parameters
 
 
diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 6b73d4fedf..2a22bb07ed 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -106,6 +106,7 @@ New Features
   * Removed the reporting of `rx_overruns` errors from xstats and instead 
updated `imissed` stat with its value.
   * Added support for sub-optimal configuration notifications from the device.
   * Restructured fast release of mbufs when RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE 
optimization is enabled.
+  * Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg 
`llq_policy`.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 537ee9f8c3..2414f631c8 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -40,6 +40,8 @@
 
 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
 
+#define DECIMAL_BASE 10
+
 /*
  * We should try to keep ENA_CLEANUP_BUF_SIZE lower than
  * RTE_MEMPOOL_CACHE_MAX_SIZE, so we can fit this in mempool local cache.
@@ -74,17 +76,23 @@ struct ena_stats {
ENA_STAT_ENTRY(stat, srd)
 
 /* Device arguments */
-#define ENA_DEVARG_LARGE_LLQ_HDR "large_llq_hdr"
+/* Controls whether to disable LLQ, use device recommended header policy
+ * or overriding the device recommendation.
+ * 0 - Disable LLQ.
+ * Use with extreme caution as it leads to a huge performance
+ * degradation on AWS instances from 6th generation onwards.
+ * 1 - Accept device recommended LLQ policy (Default).
+ * Device can recommend normal or large LLQ policy.
+ * 2 - Enforce normal LLQ policy.
+ * 3 - Enforce large LLQ policy.
+ * Required for packets with header that exceed 96 bytes on
+ * AWS instances prior to 5th generation.
+ */
+#define ENA_DEVARG_LLQ_POLICY "llq_policy"
 /* Timeout in seconds after which a single uncompleted Tx packet should be
  * considered as a missing.
  */
 #define ENA_DEVARG_MISS_TXC_TO "miss_txc_to"
-/*
- * Controls whether LLQ should be used (if available). Enabled by default.
- * NOTE: It's highly not recommended to disable the LLQ, as it may lead to a
- * huge performance degradation on 6th generation AWS instances.
- */
-#define ENA_DEVARG_ENABLE_LLQ "enable_llq"
 
 /*
  * Each rte_memzone should have unique name.
@@ -279,9 +287,9 @@ static int ena_xstats_get_by_id(struct rte_eth_dev *dev,
const uint64_t *ids,
uint64_t *values,
   

[PATCH v2 08/33] net/ena/hal: exponential backoff exp limit

2024-03-04 Thread shaibran
From: Shai Brandes 

limits the exponent in the exponential backoff
mechanism in order to avoid the value overflowing.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 6953a1fa33..31c37b0ab3 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -34,6 +34,8 @@
 
 #define ENA_REGS_ADMIN_INTR_MASK 1
 
+#define ENA_MAX_BACKOFF_DELAY_EXP 16U
+
 #define ENA_MIN_ADMIN_POLL_US 100
 
 #define ENA_MAX_ADMIN_POLL_US 5000
@@ -545,8 +547,9 @@ static int ena_com_comp_status_to_errno(struct 
ena_com_admin_queue *admin_queue,
 
 static void ena_delay_exponential_backoff_us(u32 exp, u32 delay_us)
 {
+   exp = ENA_MIN32(ENA_MAX_BACKOFF_DELAY_EXP, exp);
delay_us = ENA_MAX32(ENA_MIN_ADMIN_POLL_US, delay_us);
-   delay_us = ENA_MIN32(delay_us * (1U << exp), ENA_MAX_ADMIN_POLL_US);
+   delay_us = ENA_MIN32(ENA_MAX_ADMIN_POLL_US, delay_us * (1U << exp));
ENA_USLEEP(delay_us);
 }
 
-- 
2.17.1



[PATCH v2 09/33] net/ena/hal: add a new csum offload bit

2024-03-04 Thread shaibran
From: Shai Brandes 

Add a new driver supported feature bit for TX IPv6 checksum offload.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
index 4172916551..670e794c98 100644
--- a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
@@ -985,7 +985,8 @@ struct ena_admin_host_info {
 * 4 : rss_configurable_function_key
 * 5 : reserved
 * 6 : rx_page_reuse
-* 31:7 : reserved
+* 7 : tx_ipv6_csum_offload
+* 31:8 : reserved
 */
uint32_t driver_supported_features;
 };
@@ -1377,6 +1378,8 @@ struct ena_admin_phc_resp {
 #define ENA_ADMIN_HOST_INFO_RSS_CONFIGURABLE_FUNCTION_KEY_MASK BIT(4)
 #define ENA_ADMIN_HOST_INFO_RX_PAGE_REUSE_SHIFT 6
 #define ENA_ADMIN_HOST_INFO_RX_PAGE_REUSE_MASK  BIT(6)
+#define ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_SHIFT  7
+#define ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_MASK   BIT(7)
 
 /* feature_rss_ind_table */
 #define ENA_ADMIN_FEATURE_RSS_IND_TABLE_ONE_ENTRY_UPDATE_MASK BIT(0)
@@ -1851,6 +1854,20 @@ static inline void 
set_ena_admin_host_info_rx_page_reuse(struct ena_admin_host_i
ENA_ADMIN_HOST_INFO_RX_PAGE_REUSE_MASK;
 }
 
+static inline
+uint32_t get_ena_admin_host_info_tx_ipv6_csum_offload(const struct 
ena_admin_host_info *p)
+{
+   return (p->driver_supported_features & 
ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_MASK) >>
+   ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_SHIFT;
+}
+
+static inline void set_ena_admin_host_info_tx_ipv6_csum_offload(struct 
ena_admin_host_info *p,
+uint32_t val)
+{
+   p->driver_supported_features |= (val << 
ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_SHIFT) &
+
ENA_ADMIN_HOST_INFO_TX_IPV6_CSUM_OFFLOAD_MASK;
+}
+
 static inline uint8_t 
get_ena_admin_feature_rss_ind_table_one_entry_update(const struct 
ena_admin_feature_rss_ind_table *p)
 {
return p->flags & ENA_ADMIN_FEATURE_RSS_IND_TABLE_ONE_ENTRY_UPDATE_MASK;
-- 
2.17.1



[PATCH v2 10/33] net/ena/hal: added a bus parameter to ena memcpy macro

2024-03-04 Thread shaibran
From: Shai Brandes 

ENA_MEMCPY_TO_DEVICE_64 macro needs pci bus id in order
to write to the device memory when using llq.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.c   | 3 ++-
 drivers/net/ena/hal/ena_plat_dpdk.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
index 32090259cd..d6811c7b48 100644
--- a/drivers/net/ena/hal/ena_eth_com.c
+++ b/drivers/net/ena/hal/ena_eth_com.c
@@ -74,7 +74,8 @@ static int ena_com_write_bounce_buffer_to_dev(struct 
ena_com_io_sq *io_sq,
wmb();
 
/* The line is completed. Copy it to dev */
-   ENA_MEMCPY_TO_DEVICE_64(io_sq->desc_addr.pbuf_dev_addr + dst_offset,
+   ENA_MEMCPY_TO_DEVICE_64(io_sq->bus,
+   io_sq->desc_addr.pbuf_dev_addr + dst_offset,
bounce_buffer,
llq_info->desc_list_entry_size);
 
diff --git a/drivers/net/ena/hal/ena_plat_dpdk.h 
b/drivers/net/ena/hal/ena_plat_dpdk.h
index 14bf582a45..5f7cbd1ee7 100644
--- a/drivers/net/ena/hal/ena_plat_dpdk.h
+++ b/drivers/net/ena/hal/ena_plat_dpdk.h
@@ -301,11 +301,12 @@ ena_mem_alloc_coherent(struct rte_eth_dev_data *data, 
size_t size,
 #define ENA_WAIT_EVENTS_DESTROY(admin_queue) ((void)(admin_queue))
 
 /* The size must be 8 byte align */
-#define ENA_MEMCPY_TO_DEVICE_64(dst, src, size)
   \
+#define ENA_MEMCPY_TO_DEVICE_64(bus, dst, src, size)  \
do {   \
int count, i;  \
uint64_t *to = (uint64_t *)(dst);  \
const uint64_t *from = (const uint64_t *)(src);\
+   (void)(bus);   \
count = (size) / 8;\
for (i = 0; i < count; i++, from++, to++)  \
rte_write64_relaxed(*from, to);\
-- 
2.17.1



[PATCH v2 04/33] net/ena: sub-optimal configuration notifications support

2024-03-04 Thread shaibran
From: Shai Brandes 

ENA device will send asynchronous notifications to the
driver in order to notify users about sub-optimal configurations
and refer them to public AWS documentation for further action.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_24_03.rst|  1 +
 .../net/ena/base/ena_defs/ena_admin_defs.h| 11 +++-
 drivers/net/ena/ena_ethdev.c  | 26 +--
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index fb66d67d32..f47073c7dc 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -104,6 +104,7 @@ New Features
 * **Updated Amazon ena (Elastic Network Adapter) net driver.**
 
   * Removed the reporting of `rx_overruns` errors from xstats and instead 
updated `imissed` stat with its value.
+  * Added support for sub-optimal configuration notifications from the device.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/base/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/base/ena_defs/ena_admin_defs.h
index fa43e22918..4172916551 100644
--- a/drivers/net/ena/base/ena_defs/ena_admin_defs.h
+++ b/drivers/net/ena/base/ena_defs/ena_admin_defs.h
@@ -1214,7 +1214,8 @@ enum ena_admin_aenq_group {
ENA_ADMIN_NOTIFICATION  = 3,
ENA_ADMIN_KEEP_ALIVE= 4,
ENA_ADMIN_REFRESH_CAPABILITIES  = 5,
-   ENA_ADMIN_AENQ_GROUPS_NUM   = 6,
+   ENA_ADMIN_CONF_NOTIFICATIONS= 6,
+   ENA_ADMIN_AENQ_GROUPS_NUM   = 7,
 };
 
 enum ena_admin_aenq_notification_syndrome {
@@ -1251,6 +1252,14 @@ struct ena_admin_aenq_keep_alive_desc {
uint32_t rx_overruns_high;
 };
 
+struct ena_admin_aenq_conf_notifications_desc {
+   struct ena_admin_aenq_common_desc aenq_common_desc;
+
+   uint64_t notifications_bitmap;
+
+   uint64_t reserved;
+};
+
 struct ena_admin_ena_mmio_req_read_less_resp {
uint16_t req_id;
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index d3f395a832..3157237c0d 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -36,6 +36,10 @@
 
 #define ENA_MIN_RING_DESC  128
 
+#define BITS_PER_BYTE 8
+
+#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
+
 /*
  * We should try to keep ENA_CLEANUP_BUF_SIZE lower than
  * RTE_MEMPOOL_CACHE_MAX_SIZE, so we can fit this in mempool local cache.
@@ -1842,7 +1846,8 @@ static int ena_device_init(struct ena_adapter *adapter,
  BIT(ENA_ADMIN_NOTIFICATION) |
  BIT(ENA_ADMIN_KEEP_ALIVE) |
  BIT(ENA_ADMIN_FATAL_ERROR) |
- BIT(ENA_ADMIN_WARNING);
+ BIT(ENA_ADMIN_WARNING) |
+ BIT(ENA_ADMIN_CONF_NOTIFICATIONS);
 
aenq_groups &= get_feat_ctx->aenq.supported_groups;
 
@@ -4021,6 +4026,22 @@ static void ena_keep_alive(void *adapter_data,
adapter->dev_stats.tx_drops = tx_drops;
 }
 
+static void ena_suboptimal_configuration(__rte_unused void *adapter_data,
+struct ena_admin_aenq_entry *aenq_e)
+{
+   struct ena_admin_aenq_conf_notifications_desc *desc;
+   int bit, num_bits;
+
+   desc = (struct ena_admin_aenq_conf_notifications_desc *)aenq_e;
+   num_bits = BITS_PER_TYPE(desc->notifications_bitmap);
+   for (bit = 0; bit < num_bits; bit++) {
+   if (desc->notifications_bitmap & RTE_BIT64(bit)) {
+   PMD_DRV_LOG(WARNING,
+   "Sub-optimal configuration notification code: 
%d\n", bit + 1);
+   }
+   }
+}
+
 /**
  * This handler will called for unknown event group or unimplemented handlers
  **/
@@ -4035,7 +4056,8 @@ static struct ena_aenq_handlers aenq_handlers = {
.handlers = {
[ENA_ADMIN_LINK_CHANGE] = ena_update_on_link_change,
[ENA_ADMIN_NOTIFICATION] = ena_notification,
-   [ENA_ADMIN_KEEP_ALIVE] = ena_keep_alive
+   [ENA_ADMIN_KEEP_ALIVE] = ena_keep_alive,
+   [ENA_ADMIN_CONF_NOTIFICATIONS] = ena_suboptimal_configuration
},
.unimplemented_handler = unimplemented_aenq_handler
 };
-- 
2.17.1



[PATCH v2 11/33] net/ena/hal: optimize Rx ring submission queue

2024-03-04 Thread shaibran
From: Shai Brandes 

RX ring submission queue descriptors are always located in host memory
This optimization replaces the generic descriptor retrieval method
with a tailored method for host memory type descriptors to avoid
unnecessary if statement.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
index d6811c7b48..dc2935a53e 100644
--- a/drivers/net/ena/hal/ena_eth_com.c
+++ b/drivers/net/ena/hal/ena_eth_com.c
@@ -631,9 +631,8 @@ int ena_com_add_single_rx_desc(struct ena_com_io_sq *io_sq,
if (unlikely(!ena_com_sq_have_enough_space(io_sq, 1)))
return ENA_COM_NO_SPACE;
 
-   desc = get_sq_desc(io_sq);
-   if (unlikely(!desc))
-   return ENA_COM_FAULT;
+   /* virt_addr allocation success is checked before calling this function 
*/
+   desc = get_sq_desc_regular_queue(io_sq);
 
memset(desc, 0x0, sizeof(struct ena_eth_io_rx_desc));
 
-- 
2.17.1



[PATCH v2 14/33] net/ena/hal: add completion descriptor corruption check

2024-03-04 Thread shaibran
From: Shai Brandes 

Adding a check of the MBZ (Must Be Zero) fields in the
incoming tx and rx completion descriptors in order to
identify corrupted descriptors.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.c | 13 +++--
 drivers/net/ena/hal/ena_eth_com.h | 14 +-
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
index dc2935a53e..988fa013a7 100644
--- a/drivers/net/ena/hal/ena_eth_com.c
+++ b/drivers/net/ena/hal/ena_eth_com.c
@@ -237,6 +237,7 @@ static int ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq 
*io_cq,
u16 *first_cdesc_idx,
u16 *num_descs)
 {
+   struct ena_com_dev *dev = ena_com_io_cq_to_ena_dev(io_cq);
u16 count = io_cq->cur_rx_pkt_cdesc_count, head_masked;
struct ena_eth_io_rx_cdesc_base *cdesc;
u32 last = 0;
@@ -252,13 +253,21 @@ static int ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq 
*io_cq,
ena_com_cq_inc_head(io_cq);
if (unlikely((status & ENA_ETH_IO_RX_CDESC_BASE_FIRST_MASK) >>
ENA_ETH_IO_RX_CDESC_BASE_FIRST_SHIFT && count != 0)) {
-   struct ena_com_dev *dev = 
ena_com_io_cq_to_ena_dev(io_cq);
-
ena_trc_err(dev,
"First bit is on in descriptor #%d on q_id: 
%d, req_id: %u\n",
count, io_cq->qid, cdesc->req_id);
return ENA_COM_FAULT;
}
+
+   if (unlikely((status & (ENA_ETH_IO_RX_CDESC_BASE_MBZ7_MASK |
+   ENA_ETH_IO_RX_CDESC_BASE_MBZ17_MASK)) &&
+ ena_com_get_cap(dev, ENA_ADMIN_CDESC_MBZ))) {
+   ena_trc_err(dev,
+   "Corrupted RX descriptor #%d on q_id: %d, 
req_id: %u\n",
+   count, io_cq->qid, cdesc->req_id);
+   return ENA_COM_FAULT;
+   }
+
count++;
last = (status & ENA_ETH_IO_RX_CDESC_BASE_LAST_MASK) >>
ENA_ETH_IO_RX_CDESC_BASE_LAST_SHIFT;
diff --git a/drivers/net/ena/hal/ena_eth_com.h 
b/drivers/net/ena/hal/ena_eth_com.h
index 6a7c17f84f..2fac10e678 100644
--- a/drivers/net/ena/hal/ena_eth_com.h
+++ b/drivers/net/ena/hal/ena_eth_com.h
@@ -204,9 +204,11 @@ static inline void ena_com_cq_inc_head(struct 
ena_com_io_cq *io_cq)
 static inline int ena_com_tx_comp_req_id_get(struct ena_com_io_cq *io_cq,
 u16 *req_id)
 {
+   struct ena_com_dev *dev = ena_com_io_cq_to_ena_dev(io_cq);
u8 expected_phase, cdesc_phase;
struct ena_eth_io_tx_cdesc *cdesc;
u16 masked_head;
+   u8 flags;
 
masked_head = io_cq->head & (io_cq->q_depth - 1);
expected_phase = io_cq->phase;
@@ -215,14 +217,24 @@ static inline int ena_com_tx_comp_req_id_get(struct 
ena_com_io_cq *io_cq,
((uintptr_t)io_cq->cdesc_addr.virt_addr +
(masked_head * io_cq->cdesc_entry_size_in_bytes));
 
+   flags = READ_ONCE8(cdesc->flags);
+
/* When the current completion descriptor phase isn't the same as the
 * expected, it mean that the device still didn't update
 * this completion.
 */
-   cdesc_phase = READ_ONCE8(cdesc->flags) & ENA_ETH_IO_TX_CDESC_PHASE_MASK;
+   cdesc_phase = flags & ENA_ETH_IO_TX_CDESC_PHASE_MASK;
if (cdesc_phase != expected_phase)
return ENA_COM_TRY_AGAIN;
 
+   if (unlikely((flags & ENA_ETH_IO_TX_CDESC_MBZ6_MASK) &&
+ ena_com_get_cap(dev, ENA_ADMIN_CDESC_MBZ))) {
+   ena_trc_err(dev,
+   "Corrupted TX descriptor on q_id: %d, req_id: %u\n",
+   io_cq->qid, cdesc->req_id);
+   return ENA_COM_FAULT;
+   }
+
dma_rmb();
 
*req_id = READ_ONCE16(cdesc->req_id);
-- 
2.17.1



[PATCH v2 12/33] net/ena/hal: rename fields in completion descriptors

2024-03-04 Thread shaibran
From: Shai Brandes 

Several reserved bits in ena_eth_io_tx_cdesc and
ena_eth_io_rx_cdesc_base have been renamed explicitly to
MBZ (Must Be Zero).
These bits are set by the device to zero before being sent
to the driver. The fields are used as an integrity check in
order to ensure that the received descriptor is not corrupted.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h |  1 +
 .../net/ena/hal/ena_defs/ena_eth_io_defs.h| 49 +--
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
index 670e794c98..438e4a1085 100644
--- a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
@@ -84,6 +84,7 @@ enum ena_admin_aq_caps_id {
ENA_ADMIN_ENA_SRD_INFO  = 1,
ENA_ADMIN_CUSTOMER_METRICS  = 2,
ENA_ADMIN_EXTENDED_RESET_REASONS= 3,
+   ENA_ADMIN_CDESC_MBZ = 4,
 };
 
 enum ena_admin_placement_policy_type {
diff --git a/drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h
index 2107d17fdf..f811dd261e 100644
--- a/drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_eth_io_defs.h
@@ -152,7 +152,8 @@ struct ena_eth_io_tx_cdesc {
 
/* flags
 * 0 : phase
-* 7:1 : reserved1
+* 5:1 : reserved1
+* 7:6 : mbz6 - MBZ
 */
uint8_t flags;
 
@@ -198,7 +199,7 @@ struct ena_eth_io_rx_desc {
 struct ena_eth_io_rx_cdesc_base {
/* 4:0 : l3_proto_idx
 * 6:5 : src_vlan_cnt
-* 7 : reserved7 - MBZ
+* 7 : mbz7 - MBZ
 * 12:8 : l4_proto_idx
 * 13 : l3_csum_err - when set, either the L3
 *checksum error detected, or, the controller didn't
@@ -214,7 +215,8 @@ struct ena_eth_io_rx_cdesc_base {
 * 16 : l4_csum_checked - L4 checksum was verified
 *(could be OK or error), when cleared the status of
 *checksum is unknown
-* 23:17 : reserved17 - MBZ
+* 17 : mbz17 - MBZ
+* 23:18 : reserved18
 * 24 : phase
 * 25 : l3_csum2 - second checksum engine result
 * 26 : first - Indicates first descriptor in
@@ -341,6 +343,8 @@ struct ena_eth_io_numa_node_cfg_reg {
 
 /* tx_cdesc */
 #define ENA_ETH_IO_TX_CDESC_PHASE_MASK  BIT(0)
+#define ENA_ETH_IO_TX_CDESC_MBZ6_SHIFT  6
+#define ENA_ETH_IO_TX_CDESC_MBZ6_MASK   GENMASK(7, 6)
 
 /* rx_desc */
 #define ENA_ETH_IO_RX_DESC_PHASE_MASK   BIT(0)
@@ -355,6 +359,8 @@ struct ena_eth_io_numa_node_cfg_reg {
 #define ENA_ETH_IO_RX_CDESC_BASE_L3_PROTO_IDX_MASK  GENMASK(4, 0)
 #define ENA_ETH_IO_RX_CDESC_BASE_SRC_VLAN_CNT_SHIFT 5
 #define ENA_ETH_IO_RX_CDESC_BASE_SRC_VLAN_CNT_MASK  GENMASK(6, 5)
+#define ENA_ETH_IO_RX_CDESC_BASE_MBZ7_SHIFT 7
+#define ENA_ETH_IO_RX_CDESC_BASE_MBZ7_MASK  BIT(7)
 #define ENA_ETH_IO_RX_CDESC_BASE_L4_PROTO_IDX_SHIFT 8
 #define ENA_ETH_IO_RX_CDESC_BASE_L4_PROTO_IDX_MASK  GENMASK(12, 8)
 #define ENA_ETH_IO_RX_CDESC_BASE_L3_CSUM_ERR_SHIFT  13
@@ -365,6 +371,8 @@ struct ena_eth_io_numa_node_cfg_reg {
 #define ENA_ETH_IO_RX_CDESC_BASE_IPV4_FRAG_MASK BIT(15)
 #define ENA_ETH_IO_RX_CDESC_BASE_L4_CSUM_CHECKED_SHIFT  16
 #define ENA_ETH_IO_RX_CDESC_BASE_L4_CSUM_CHECKED_MASK   BIT(16)
+#define ENA_ETH_IO_RX_CDESC_BASE_MBZ17_SHIFT17
+#define ENA_ETH_IO_RX_CDESC_BASE_MBZ17_MASK BIT(17)
 #define ENA_ETH_IO_RX_CDESC_BASE_PHASE_SHIFT24
 #define ENA_ETH_IO_RX_CDESC_BASE_PHASE_MASK BIT(24)
 #define ENA_ETH_IO_RX_CDESC_BASE_L3_CSUM2_SHIFT 25
@@ -731,6 +739,15 @@ static inline void set_ena_eth_io_tx_cdesc_phase(struct 
ena_eth_io_tx_cdesc *p,
p->flags |= val & ENA_ETH_IO_TX_CDESC_PHASE_MASK;
 }
 
+static inline uint8_t get_ena_eth_io_tx_cdesc_mbz6(const struct 
ena_eth_io_tx_cdesc *p)
+{
+   return (p->flags & ENA_ETH_IO_TX_CDESC_MBZ6_MASK) >> 
ENA_ETH_IO_TX_CDESC_MBZ6_SHIFT;
+}
+static inline void set_ena_eth_io_tx_cdesc_mbz6(struct ena_eth_io_tx_cdesc *p, 
uint8_t val)
+{
+   p->flags |= (val << ENA_ETH_IO_TX_CDESC_MBZ6_SHIFT) & 
ENA_ETH_IO_TX_CDESC_MBZ6_MASK;
+}
+
 static inline uint8_t get_ena_eth_io_rx_desc_phase(const struct 
ena_eth_io_rx_desc *p)
 {
return p->ctrl & ENA_ETH_IO_RX_DESC_PHASE_MASK;
@@ -791,6 +808,19 @@ static inline void 
set_ena_eth_io_rx_cdesc_base_src_vlan_cnt(struct ena_eth_io_r
p->status |= (val << ENA_ETH_IO_RX_CDESC_BASE_SRC_VLAN_CNT_SHIFT) & 
ENA_ETH_IO_RX_CDESC_BASE_SRC_VLAN_CNT_MASK;
 }
 
+static inline uint32_t get_ena_eth_io_rx_cdesc_base_mbz7(const struct 
ena_eth_io_rx_cdesc_base *p)
+{
+   return (p-

[PATCH v2 13/33] net/ena/hal: use correct read once on u8 field

2024-03-04 Thread shaibran
From: Shai Brandes 

The flags field in ena_eth_io_tx_cdesc is 8-bits long.
The current macro used is READ_ONCE16.
Switching to READ_ONCE8 to avoid reading extra data.
Given that there's an implicit cast to u8 in the assignment,
the correct value is being read, but this change makes it
even more accurate.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.h 
b/drivers/net/ena/hal/ena_eth_com.h
index cee4f35124..6a7c17f84f 100644
--- a/drivers/net/ena/hal/ena_eth_com.h
+++ b/drivers/net/ena/hal/ena_eth_com.h
@@ -219,7 +219,7 @@ static inline int ena_com_tx_comp_req_id_get(struct 
ena_com_io_cq *io_cq,
 * expected, it mean that the device still didn't update
 * this completion.
 */
-   cdesc_phase = READ_ONCE16(cdesc->flags) & 
ENA_ETH_IO_TX_CDESC_PHASE_MASK;
+   cdesc_phase = READ_ONCE8(cdesc->flags) & ENA_ETH_IO_TX_CDESC_PHASE_MASK;
if (cdesc_phase != expected_phase)
return ENA_COM_TRY_AGAIN;
 
-- 
2.17.1



[PATCH v2 15/33] net/ena/hal: malformed Tx descriptor error reason

2024-03-04 Thread shaibran
From: Shai Brandes 

Adding ENA_REGS_RESET_TX_DESCRIPTOR_MALFORMED to identify
cases where the returned TX completion descriptors are
corrupted.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_defs/ena_regs_defs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
index 6a33f74812..a94025dc77 100644
--- a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
@@ -23,6 +23,7 @@ enum ena_regs_reset_reason_types {
ENA_REGS_RESET_MISS_INTERRUPT   = 14,
ENA_REGS_RESET_SUSPECTED_POLL_STARVATION= 15,
ENA_REGS_RESET_RX_DESCRIPTOR_MALFORMED  = 16,
+   ENA_REGS_RESET_TX_DESCRIPTOR_MALFORMED  = 17,
ENA_REGS_RESET_LAST,
 };
 
-- 
2.17.1



[PATCH v2 16/33] net/ena/hal: phc feature modifications

2024-03-04 Thread shaibran
From: Shai Brandes 

1. PHC algorithm is updated to support reading new PHC values.
2. Update default PHC expiration timeout.
3. Fix a theoretical PHC destroy race.
4. Adjust PHC for multiple devices.
5. PHC activation version check point.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 111 --
 drivers/net/ena/hal/ena_com.h |  31 +++--
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h |  45 +--
 3 files changed, 135 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 31c37b0ab3..fb3ad27d0a 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -41,10 +41,12 @@
 #define ENA_MAX_ADMIN_POLL_US 5000
 
 /* PHC definitions */
-#define ENA_PHC_DEFAULT_EXPIRE_TIMEOUT_USEC 20
+#define ENA_PHC_DEFAULT_EXPIRE_TIMEOUT_USEC 10
 #define ENA_PHC_DEFAULT_BLOCK_TIMEOUT_USEC 1000
-#define ENA_PHC_TIMESTAMP_ERROR 0x
+#define ENA_PHC_MAX_ERROR_BOUND 0x
 #define ENA_PHC_REQ_ID_OFFSET 0xDEAD
+#define ENA_PHC_ERROR_FLAGS (ENA_ADMIN_PHC_ERROR_FLAG_TIMESTAMP | \
+ENA_ADMIN_PHC_ERROR_FLAG_ERROR_BOUND)
 
 /*/
 /*/
@@ -1778,16 +1780,21 @@ int ena_com_phc_config(struct ena_com_dev *ena_dev)
struct ena_admin_set_feat_cmd set_feat_cmd;
int ret = 0;
 
-   /* Get device PHC default configuration */
-   ret = ena_com_get_feature(ena_dev, &get_feat_resp, 
ENA_ADMIN_PHC_CONFIG, 0);
+   /* Get default device PHC configuration */
+   ret = ena_com_get_feature(ena_dev,
+ &get_feat_resp,
+ ENA_ADMIN_PHC_CONFIG,
+ ENA_ADMIN_PHC_FEATURE_VERSION_0);
if (unlikely(ret)) {
ena_trc_err(ena_dev, "Failed to get PHC feature configuration, 
error: %d\n", ret);
return ret;
}
 
-   /* Supporting only readless PHC retrieval */
-   if (get_feat_resp.u.phc.type != ENA_ADMIN_PHC_TYPE_READLESS) {
-   ena_trc_err(ena_dev, "Unsupported PHC type, error: %d\n", 
ENA_COM_UNSUPPORTED);
+   /* Supporting only PHC V0 (readless mode with error bound) */
+   if (get_feat_resp.u.phc.version != ENA_ADMIN_PHC_FEATURE_VERSION_0) {
+   ena_trc_err(ena_dev, "Unsupported PHC version (0x%X), error: 
%d\n",
+   get_feat_resp.u.phc.version,
+   ENA_COM_UNSUPPORTED);
return ENA_COM_UNSUPPORTED;
}
 
@@ -1804,11 +1811,11 @@ int ena_com_phc_config(struct ena_com_dev *ena_dev)
   get_feat_resp.u.phc.block_timeout_usec :
   ENA_PHC_DEFAULT_BLOCK_TIMEOUT_USEC;
 
-   /* Sanity check - expire timeout must not be above skip timeout */
+   /* Sanity check - expire timeout must not exceed block timeout */
if (phc->expire_timeout_usec > phc->block_timeout_usec)
phc->expire_timeout_usec = phc->block_timeout_usec;
 
-   /* Prepare PHC feature command with PHC output address */
+   /* Prepare PHC config feature command */
memset(&set_feat_cmd, 0x0, sizeof(set_feat_cmd));
set_feat_cmd.aq_common_descriptor.opcode = ENA_ADMIN_SET_FEATURE;
set_feat_cmd.feat_common.feature_id = ENA_ADMIN_PHC_CONFIG;
@@ -1840,13 +1847,16 @@ int ena_com_phc_config(struct ena_com_dev *ena_dev)
 void ena_com_phc_destroy(struct ena_com_dev *ena_dev)
 {
struct ena_com_phc_info *phc = &ena_dev->phc;
-
-   phc->active = false;
+   unsigned long flags = 0;
 
/* In case PHC is not supported by the device, silently exiting */
if (!phc->virt_addr)
return;
 
+   ENA_SPINLOCK_LOCK(phc->lock, flags);
+   phc->active = false;
+   ENA_SPINLOCK_UNLOCK(phc->lock, flags);
+
ENA_MEM_FREE_COHERENT(ena_dev->dmadev,
  sizeof(*phc->virt_addr),
  phc->virt_addr,
@@ -1857,15 +1867,14 @@ void ena_com_phc_destroy(struct ena_com_dev *ena_dev)
ENA_SPINLOCK_DESTROY(phc->lock);
 }
 
-int ena_com_phc_get(struct ena_com_dev *ena_dev, u64 *timestamp)
+int ena_com_phc_get_timestamp(struct ena_com_dev *ena_dev, u64 *timestamp)
 {
volatile struct ena_admin_phc_resp *read_resp = ena_dev->phc.virt_addr;
+   const ena_time_high_res_t zero_system_time = ENA_TIME_INIT_HIGH_RES();
struct ena_com_phc_info *phc = &ena_dev->phc;
-   ena_time_high_res_t initial_time = ENA_TIME_INIT_HIGH_RES();
-   static ena_time_high_res_t start_time;
-   unsigned long flags = 0;
ena_time_high_res_t expire_time;
ena_time_high_res_t block_time;
+   unsigned long flags = 0;
int ret = ENA_COM_OK;
 
if

[PATCH v2 17/33] net/ena/hal: restructure interrupt handling

2024-03-04 Thread shaibran
From: Shai Brandes 

When invoking an admin command, in interrupt mode, if the interrupt
is received after timeout and also after the calling function finished
running, the response will be written into a memory that is no longer
valid.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index fb3ad27d0a..a0c88b1a0e 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -181,6 +181,7 @@ static int ena_com_admin_init_aenq(struct ena_com_dev 
*ena_dev,
 static void comp_ctxt_release(struct ena_com_admin_queue *queue,
 struct ena_comp_ctx *comp_ctx)
 {
+   comp_ctx->user_cqe = NULL;
comp_ctx->occupied = false;
ATOMIC32_DEC(&queue->outstanding_cmds);
 }
@@ -474,6 +475,9 @@ static void ena_com_handle_single_admin_completion(struct 
ena_com_admin_queue *a
return;
}
 
+   if (!comp_ctx->occupied)
+   return;
+
comp_ctx->status = ENA_CMD_COMPLETED;
comp_ctx->comp_status = cqe->acq_common_descriptor.status;
 
-- 
2.17.1



[PATCH v2 18/33] net/ena/hal: add unlikely to error checks

2024-03-04 Thread shaibran
From: Shai Brandes 

The unlikely mechanism is used to reduce pipe flush,
caused by a wrong branch prediction.
Moreover, it increases readability by wrapping unexpected errors.
This commit adds unlikely to error checks that are unlikely to happen.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 56 +++
 drivers/net/ena/hal/ena_eth_com.c |  2 +-
 2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index a0c88b1a0e..d2de5e172d 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -79,7 +79,7 @@ static int ena_com_mem_addr_set(struct ena_com_dev *ena_dev,
   struct ena_common_mem_addr *ena_addr,
   dma_addr_t addr)
 {
-   if ((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 0)) != addr) {
+   if (unlikely((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 0)) != 
addr)) {
ena_trc_err(ena_dev, "DMA address has more bits than the device 
supports\n");
return ENA_COM_INVAL;
}
@@ -99,7 +99,7 @@ static int ena_com_admin_init_sq(struct ena_com_admin_queue 
*admin_queue)
ENA_MEM_ALLOC_COHERENT(admin_queue->q_dmadev, size, sq->entries, 
sq->dma_addr,
   sq->mem_handle);
 
-   if (!sq->entries) {
+   if (unlikely(!sq->entries)) {
ena_trc_err(ena_dev, "Memory allocation failed\n");
return ENA_COM_NO_MEM;
}
@@ -122,7 +122,7 @@ static int ena_com_admin_init_cq(struct ena_com_admin_queue 
*admin_queue)
ENA_MEM_ALLOC_COHERENT(admin_queue->q_dmadev, size, cq->entries, 
cq->dma_addr,
   cq->mem_handle);
 
-   if (!cq->entries)  {
+   if (unlikely(!cq->entries))  {
ena_trc_err(ena_dev, "Memory allocation failed\n");
return ENA_COM_NO_MEM;
}
@@ -147,7 +147,7 @@ static int ena_com_admin_init_aenq(struct ena_com_dev 
*ena_dev,
aenq->dma_addr,
aenq->mem_handle);
 
-   if (!aenq->entries) {
+   if (unlikely(!aenq->entries)) {
ena_trc_err(ena_dev, "Memory allocation failed\n");
return ENA_COM_NO_MEM;
}
@@ -233,7 +233,7 @@ static struct ena_comp_ctx 
*__ena_com_submit_admin_cmd(struct ena_com_admin_queu
 
/* In case of queue FULL */
cnt = (u16)ATOMIC32_READ(&admin_queue->outstanding_cmds);
-   if (cnt >= admin_queue->q_depth) {
+   if (unlikely(cnt >= admin_queue->q_depth)) {
ena_trc_dbg(admin_queue->ena_dev, "Admin queue is full.\n");
admin_queue->stats.out_of_space++;
return ERR_PTR(ENA_COM_NO_SPACE);
@@ -357,7 +357,7 @@ static int ena_com_init_io_sq(struct ena_com_dev *ena_dev,
   io_sq->desc_addr.mem_handle);
}
 
-   if (!io_sq->desc_addr.virt_addr) {
+   if (unlikely(!io_sq->desc_addr.virt_addr)) {
ena_trc_err(ena_dev, "Memory allocation failed\n");
return ENA_COM_NO_MEM;
}
@@ -382,7 +382,7 @@ static int ena_com_init_io_sq(struct ena_com_dev *ena_dev,
if (!io_sq->bounce_buf_ctrl.base_buffer)
io_sq->bounce_buf_ctrl.base_buffer = 
ENA_MEM_ALLOC(ena_dev->dmadev, size);
 
-   if (!io_sq->bounce_buf_ctrl.base_buffer) {
+   if (unlikely(!io_sq->bounce_buf_ctrl.base_buffer)) {
ena_trc_err(ena_dev, "Bounce buffer memory allocation 
failed\n");
return ENA_COM_NO_MEM;
}
@@ -447,7 +447,7 @@ static int ena_com_init_io_cq(struct ena_com_dev *ena_dev,
   ENA_CDESC_RING_SIZE_ALIGNMENT);
}
 
-   if (!io_cq->cdesc_addr.virt_addr) {
+   if (unlikely(!io_cq->cdesc_addr.virt_addr)) {
ena_trc_err(ena_dev, "Memory allocation failed\n");
return ENA_COM_NO_MEM;
}
@@ -577,7 +577,7 @@ static int ena_com_wait_and_process_admin_cq_polling(struct 
ena_comp_ctx *comp_c
if (comp_ctx->status != ENA_CMD_SUBMITTED)
break;
 
-   if (ENA_TIME_EXPIRE(timeout)) {
+   if (unlikely(ENA_TIME_EXPIRE(timeout))) {
ena_trc_err(admin_queue->ena_dev,
"Wait for completion (polling) timeout\n");
/* ENA didn't have any completion */
@@ -776,7 +776,7 @@ static int ena_com_config_llq_info(struct ena_com_dev 
*ena_dev,
llq_default_cfg->llq_ring_entry_size_value;
 
rc = ena_com_set_llq(ena_dev);
-   if (rc)
+   if (unlikely(rc))
ena_trc_err(ena_dev, "Cannot set LLQ configuration: %d\n", rc)

[PATCH v2 19/33] net/ena/hal: missing admin interrupt reset reason

2024-03-04 Thread shaibran
From: Shai Brandes 

There can be cases when we trigger reset if an admin interrupt
is missing.
In order to identify this use-case specifically,
this commit adds a new reset reason.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c|  2 ++
 drivers/net/ena/hal/ena_com.h| 12 
 drivers/net/ena/hal/ena_defs/ena_regs_defs.h |  1 +
 3 files changed, 15 insertions(+)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index d2de5e172d..8e9c112715 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -803,6 +803,7 @@ static int 
ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *com
ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);
 
if (comp_ctx->status == ENA_CMD_COMPLETED) {
+   admin_queue->is_missing_admin_interrupt = true;
ena_trc_err(admin_queue->ena_dev,
"The ena device sent a completion but the 
driver didn't receive a MSI-X interrupt (cmd %d), autopolling mode is %s\n",
comp_ctx->cmd_opcode, 
admin_queue->auto_polling ? "ON" : "OFF");
@@ -2138,6 +2139,7 @@ int ena_com_admin_init(struct ena_com_dev *ena_dev,
 
admin_queue->ena_dev = ena_dev;
admin_queue->running_state = true;
+   admin_queue->is_missing_admin_interrupt = false;
 
return 0;
 error:
diff --git a/drivers/net/ena/hal/ena_com.h b/drivers/net/ena/hal/ena_com.h
index c62016cc06..c999cd2381 100644
--- a/drivers/net/ena/hal/ena_com.h
+++ b/drivers/net/ena/hal/ena_com.h
@@ -237,6 +237,8 @@ struct ena_com_admin_queue {
 */
bool running_state;
 
+   bool is_missing_admin_interrupt;
+
/* Count the number of outstanding admin commands */
ena_atomic32_t outstanding_cmds;
 
@@ -1089,6 +1091,16 @@ int ena_com_config_dev_mode(struct ena_com_dev *ena_dev,
struct ena_admin_feature_llq_desc *llq_features,
struct ena_llq_configurations *llq_default_config);
 
+/* ena_com_get_missing_admin_interrupt - Return if there is a missing admin 
interrupt
+ * @ena_dev: ENA communication layer struct
+ *
+ * @return - true if there is a missing admin interrupt or false otherwise
+ */
+static inline bool ena_com_get_missing_admin_interrupt(struct ena_com_dev 
*ena_dev)
+{
+   return ena_dev->admin_queue.is_missing_admin_interrupt;
+}
+
 /* ena_com_io_sq_to_ena_dev - Extract ena_com_dev using contained field io_sq.
  * @io_sq: IO submit queue struct
  *
diff --git a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
index a94025dc77..db6a97d675 100644
--- a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
@@ -24,6 +24,7 @@ enum ena_regs_reset_reason_types {
ENA_REGS_RESET_SUSPECTED_POLL_STARVATION= 15,
ENA_REGS_RESET_RX_DESCRIPTOR_MALFORMED  = 16,
ENA_REGS_RESET_TX_DESCRIPTOR_MALFORMED  = 17,
+   ENA_REGS_RESET_MISSING_ADMIN_INTERRUPT  = 18,
ENA_REGS_RESET_LAST,
 };
 
-- 
2.17.1



[PATCH v2 20/33] net/ena/hal: check for existing keep alive notification

2024-03-04 Thread shaibran
From: Shai Brandes 

This commit adds an API to query the aenq on whether
there is a pending keep alive notification.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 39 +++
 drivers/net/ena/hal/ena_com.h | 10 +
 2 files changed, 49 insertions(+)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 8e9c112715..f9613f7807 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -2456,6 +2456,45 @@ void ena_com_aenq_intr_handler(struct ena_com_dev 
*ena_dev, void *data)
mmiowb();
 }
 
+bool ena_com_aenq_has_keep_alive(struct ena_com_dev *ena_dev)
+{
+   struct ena_admin_aenq_common_desc *aenq_common;
+   struct ena_com_aenq *aenq = &ena_dev->aenq;
+   struct ena_admin_aenq_entry *aenq_e;
+   u8 phase = aenq->phase;
+   u16 masked_head;
+
+   masked_head = aenq->head & (aenq->q_depth - 1);
+   aenq_e = &aenq->entries[masked_head]; /* Get first entry */
+   aenq_common = &aenq_e->aenq_common_desc;
+
+   /* Go over all the events */
+   while ((READ_ONCE8(aenq_common->flags) &
+   ENA_ADMIN_AENQ_COMMON_DESC_PHASE_MASK) == phase) {
+   /* Make sure the device finished writing the rest of the 
descriptor
+* before reading it.
+*/
+   dma_rmb();
+
+   if (aenq_common->group == ENA_ADMIN_KEEP_ALIVE)
+   return true;
+
+   /* Get next event entry */
+   masked_head++;
+
+   if (unlikely(masked_head == aenq->q_depth)) {
+   masked_head = 0;
+   phase = !phase;
+   }
+
+   aenq_e = &aenq->entries[masked_head];
+   aenq_common = &aenq_e->aenq_common_desc;
+   }
+
+   return false;
+}
+
+
 int ena_com_dev_reset(struct ena_com_dev *ena_dev,
  enum ena_regs_reset_reason_types reset_reason)
 {
diff --git a/drivers/net/ena/hal/ena_com.h b/drivers/net/ena/hal/ena_com.h
index c999cd2381..737747f64b 100644
--- a/drivers/net/ena/hal/ena_com.h
+++ b/drivers/net/ena/hal/ena_com.h
@@ -639,6 +639,16 @@ void ena_com_admin_q_comp_intr_handler(struct ena_com_dev 
*ena_dev);
  */
 void ena_com_aenq_intr_handler(struct ena_com_dev *ena_dev, void *data);
 
+/* ena_com_aenq_has_keep_alive - Retrieve if there is a keep alive 
notification in the aenq
+ * @ena_dev: ENA communication layer struct
+ *
+ * This method goes over the async event notification queue and returns if 
there
+ * is a keep alive notification.
+ *
+ * @return - true if there is a keep alive notification in the aenq or false 
otherwise
+ */
+bool ena_com_aenq_has_keep_alive(struct ena_com_dev *ena_dev);
+
 /* ena_com_abort_admin_commands - Abort all the outstanding admin commands.
  * @ena_dev: ENA communication layer struct
  *
-- 
2.17.1



[PATCH v2 21/33] net/ena/hal: modify memory barrier comment

2024-03-04 Thread shaibran
From: Shai Brandes 

The dma_rmb() memory barrier guarantees that the device set the
phase bit before continuing to read the rest of the descriptor.
Because the phase bit and the rest of the descriptor are in the same
cache line this ensures coherency of the data from the descriptor.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index f9613f7807..053e095585 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -2412,8 +2412,8 @@ void ena_com_aenq_intr_handler(struct ena_com_dev 
*ena_dev, void *data)
/* Go over all the events */
while ((READ_ONCE8(aenq_common->flags) &
ENA_ADMIN_AENQ_COMMON_DESC_PHASE_MASK) == phase) {
-   /* Make sure the phase bit (ownership) is as expected before
-* reading the rest of the descriptor.
+   /* Make sure the device finished writing the rest of the 
descriptor
+* before reading it.
 */
dma_rmb();
 
-- 
2.17.1



[PATCH v2 23/33] net/ena/hal: remove operating system type enum

2024-03-04 Thread shaibran
From: Shai Brandes 

remove all other operating system enumeration as they
are unrelated to DPDK. Use a constant value instead.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h | 13 +
 drivers/net/ena/hal/ena_plat_dpdk.h   |  1 +
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
index ce8a26721e..c3910c50cc 100644
--- a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
@@ -933,19 +933,8 @@ struct ena_admin_feature_rss_flow_hash_input {
uint16_t enabled_input_sort;
 };
 
-enum ena_admin_os_type {
-   ENA_ADMIN_OS_LINUX  = 1,
-   ENA_ADMIN_OS_WIN= 2,
-   ENA_ADMIN_OS_DPDK   = 3,
-   ENA_ADMIN_OS_FREEBSD= 4,
-   ENA_ADMIN_OS_IPXE   = 5,
-   ENA_ADMIN_OS_ESXI   = 6,
-   ENA_ADMIN_OS_MACOS  = 7,
-   ENA_ADMIN_OS_GROUPS_NUM = 7,
-};
-
 struct ena_admin_host_info {
-   /* defined in enum ena_admin_os_type */
+   /* Host OS type defined as ENA_ADMIN_OS_* */
uint32_t os_type;
 
/* os distribution string format */
diff --git a/drivers/net/ena/hal/ena_plat_dpdk.h 
b/drivers/net/ena/hal/ena_plat_dpdk.h
index 5f7cbd1ee7..aa8fbb0cd9 100644
--- a/drivers/net/ena/hal/ena_plat_dpdk.h
+++ b/drivers/net/ena/hal/ena_plat_dpdk.h
@@ -341,5 +341,6 @@ static __rte_always_inline int ena_bits_per_u64(uint64_t 
bitmap)
return count;
 }
 
+#define ENA_ADMIN_OS_DPDK 3
 
 #endif /* DPDK_ENA_COM_ENA_PLAT_DPDK_H_ */
-- 
2.17.1



[PATCH v2 27/33] net/ena/hal: modify customer metrics memory management

2024-03-04 Thread shaibran
From: Shai Brandes 

1. Set buffer length to zero in case memory allocation failed
   and after memory is released.
2. The driver checks buffer_virt_addr for customer allocation
   success. In case the allocation fails, buffer_virt_addr
   may not necessarily be NULL.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 2db21e7895..24756e5e76 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -3233,13 +3233,17 @@ int ena_com_allocate_customer_metrics_buffer(struct 
ena_com_dev *ena_dev)
struct ena_customer_metrics *customer_metrics = 
&ena_dev->customer_metrics;
 
customer_metrics->buffer_len = ENA_CUSTOMER_METRICS_BUFFER_SIZE;
+   customer_metrics->buffer_virt_addr = NULL;
+
ENA_MEM_ALLOC_COHERENT(ena_dev->dmadev,
   customer_metrics->buffer_len,
   customer_metrics->buffer_virt_addr,
   customer_metrics->buffer_dma_addr,
   customer_metrics->buffer_dma_handle);
-   if (unlikely(!customer_metrics->buffer_virt_addr))
+   if (unlikely(!customer_metrics->buffer_virt_addr)) {
+   customer_metrics->buffer_len = 0;
return ENA_COM_NO_MEM;
+   }
 
return 0;
 }
@@ -3283,6 +3287,7 @@ void ena_com_delete_customer_metrics_buffer(struct 
ena_com_dev *ena_dev)
  customer_metrics->buffer_dma_addr,
  customer_metrics->buffer_dma_handle);
customer_metrics->buffer_virt_addr = NULL;
+   customer_metrics->buffer_len = 0;
}
 }
 
-- 
2.17.1



[PATCH v2 26/33] net/ena: cosmetic changes

2024-03-04 Thread shaibran
From: Shai Brandes 

This patch makes several changes to improve
the style and readability of the code.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index b98540ba63..2db21e7895 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -1914,15 +1914,14 @@ int ena_com_phc_get_timestamp(struct ena_com_dev 
*ena_dev, u64 *timestamp)
 
/* PHC is in active state, update statistics according to 
req_id and error_flags */
if ((READ_ONCE16(read_resp->req_id) != phc->req_id) ||
-   (read_resp->error_flags & ENA_PHC_ERROR_FLAGS)) {
+   (read_resp->error_flags & ENA_PHC_ERROR_FLAGS))
/* Device didn't update req_id during blocking time or 
timestamp is invalid,
 * this indicates on a device error
 */
phc->stats.phc_err++;
-   } else {
+   else
/* Device updated req_id during blocking time with 
valid timestamp */
phc->stats.phc_exp++;
-   }
}
 
/* Setting relative timeouts */
@@ -2431,7 +2430,7 @@ void ena_com_aenq_intr_handler(struct ena_com_dev 
*ena_dev, void *data)
timestamp = (u64)aenq_common->timestamp_low |
((u64)aenq_common->timestamp_high << 32);
 
-   ena_trc_dbg(ena_dev, "AENQ! Group[%x] Syndrome[%x] timestamp: 
[%" ENA_PRIU64 "s]\n",
+   ena_trc_dbg(ena_dev, "AENQ! Group[%x] Syndrome[%x] timestamp: 
[%" ENA_PRIu64 "s]\n",
aenq_common->group,
aenq_common->syndrome,
timestamp);
@@ -3233,16 +3232,15 @@ int ena_com_allocate_customer_metrics_buffer(struct 
ena_com_dev *ena_dev)
 {
struct ena_customer_metrics *customer_metrics = 
&ena_dev->customer_metrics;
 
+   customer_metrics->buffer_len = ENA_CUSTOMER_METRICS_BUFFER_SIZE;
ENA_MEM_ALLOC_COHERENT(ena_dev->dmadev,
   customer_metrics->buffer_len,
   customer_metrics->buffer_virt_addr,
   customer_metrics->buffer_dma_addr,
   customer_metrics->buffer_dma_handle);
-   if (unlikely(customer_metrics->buffer_virt_addr == NULL))
+   if (unlikely(!customer_metrics->buffer_virt_addr))
return ENA_COM_NO_MEM;
 
-   customer_metrics->buffer_len = ENA_CUSTOMER_METRICS_BUFFER_SIZE;
-
return 0;
 }
 
@@ -3285,7 +3283,6 @@ void ena_com_delete_customer_metrics_buffer(struct 
ena_com_dev *ena_dev)
  customer_metrics->buffer_dma_addr,
  customer_metrics->buffer_dma_handle);
customer_metrics->buffer_virt_addr = NULL;
-   customer_metrics->buffer_len = 0;
}
 }
 
-- 
2.17.1



[PATCH v2 22/33] net/ena/hal: rework Rx ring submission queue

2024-03-04 Thread shaibran
From: Shai Brandes 

RX ring submission queue descriptors are always located in host memory
This optimization replaces the generic update tail method with a
tailored method for host memory type descriptors to avoid unnecessary if
statement.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
index b9123f84c3..ebad38d15a 100644
--- a/drivers/net/ena/hal/ena_eth_com.c
+++ b/drivers/net/ena/hal/ena_eth_com.c
@@ -210,11 +210,8 @@ static int ena_com_sq_update_llq_tail(struct ena_com_io_sq 
*io_sq)
return ENA_COM_OK;
 }
 
-static int ena_com_sq_update_tail(struct ena_com_io_sq *io_sq)
+static int ena_com_sq_update_reqular_queue_tail(struct ena_com_io_sq *io_sq)
 {
-   if (io_sq->mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV)
-   return ena_com_sq_update_llq_tail(io_sq);
-
io_sq->tail++;
 
/* Switch phase bit in case of wrap around */
@@ -224,6 +221,14 @@ static int ena_com_sq_update_tail(struct ena_com_io_sq 
*io_sq)
return ENA_COM_OK;
 }
 
+static int ena_com_sq_update_tail(struct ena_com_io_sq *io_sq)
+{
+   if (io_sq->mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV)
+   return ena_com_sq_update_llq_tail(io_sq);
+
+   return ena_com_sq_update_reqular_queue_tail(io_sq);
+}
+
 static struct ena_eth_io_rx_cdesc_base *
ena_com_rx_cdesc_idx_to_ptr(struct ena_com_io_cq *io_cq, u16 idx)
 {
@@ -662,7 +667,7 @@ int ena_com_add_single_rx_desc(struct ena_com_io_sq *io_sq,
desc->buff_addr_hi =
((ena_buf->paddr & GENMASK_ULL(io_sq->dma_addr_bits - 1, 32)) 
>> 32);
 
-   return ena_com_sq_update_tail(io_sq);
+   return ena_com_sq_update_reqular_queue_tail(io_sq);
 }
 
 bool ena_com_cq_empty(struct ena_com_io_cq *io_cq)
-- 
2.17.1



[PATCH v2 25/33] net/ena/hal: add support for device reset request

2024-03-04 Thread shaibran
From: Shai Brandes 

Adds support for reset request message from the device to the driver,
over AENQ, which in turn should cause the driver to trigger reset.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h | 3 ++-
 drivers/net/ena/hal/ena_defs/ena_regs_defs.h  | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
index c3910c50cc..2adce75ed3 100644
--- a/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_admin_defs.h
@@ -1213,7 +1213,8 @@ enum ena_admin_aenq_group {
ENA_ADMIN_KEEP_ALIVE= 4,
ENA_ADMIN_REFRESH_CAPABILITIES  = 5,
ENA_ADMIN_CONF_NOTIFICATIONS= 6,
-   ENA_ADMIN_AENQ_GROUPS_NUM   = 7,
+   ENA_ADMIN_DEVICE_REQUEST_RESET  = 7,
+   ENA_ADMIN_AENQ_GROUPS_NUM   = 8,
 };
 
 enum ena_admin_aenq_notification_syndrome {
diff --git a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h 
b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
index db6a97d675..dd9b629f10 100644
--- a/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
+++ b/drivers/net/ena/hal/ena_defs/ena_regs_defs.h
@@ -25,6 +25,7 @@ enum ena_regs_reset_reason_types {
ENA_REGS_RESET_RX_DESCRIPTOR_MALFORMED  = 16,
ENA_REGS_RESET_TX_DESCRIPTOR_MALFORMED  = 17,
ENA_REGS_RESET_MISSING_ADMIN_INTERRUPT  = 18,
+   ENA_REGS_RESET_DEVICE_REQUEST   = 19,
ENA_REGS_RESET_LAST,
 };
 
-- 
2.17.1



[PATCH v2 28/33] net/ena/hal: cosmetic changes

2024-03-04 Thread shaibran
From: Shai Brandes 

1. modify log prints to use correct format specifier
   for unsigned variables.
2. removed line breaks for lines that do not exceed
   maximal line length.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_eth_com.c   | 22 +++---
 drivers/net/ena/hal/ena_plat_dpdk.h |  5 ++---
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ena/hal/ena_eth_com.c 
b/drivers/net/ena/hal/ena_eth_com.c
index ebad38d15a..87a2dbfba1 100644
--- a/drivers/net/ena/hal/ena_eth_com.c
+++ b/drivers/net/ena/hal/ena_eth_com.c
@@ -64,7 +64,7 @@ static int ena_com_write_bounce_buffer_to_dev(struct 
ena_com_io_sq *io_sq,
 
io_sq->entries_in_tx_burst_left--;
ena_trc_dbg(ena_com_io_sq_to_ena_dev(io_sq),
-   "Decreasing entries_in_tx_burst_left of queue %d to 
%d\n",
+   "Decreasing entries_in_tx_burst_left of queue %u to 
%u\n",
io_sq->qid, io_sq->entries_in_tx_burst_left);
}
 
@@ -259,7 +259,7 @@ static int ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq 
*io_cq,
if (unlikely((status & ENA_ETH_IO_RX_CDESC_BASE_FIRST_MASK) >>
ENA_ETH_IO_RX_CDESC_BASE_FIRST_SHIFT && count != 0)) {
ena_trc_err(dev,
-   "First bit is on in descriptor #%d on q_id: 
%d, req_id: %u\n",
+   "First bit is on in descriptor #%u on q_id: 
%u, req_id: %u\n",
count, io_cq->qid, cdesc->req_id);
return ENA_COM_FAULT;
}
@@ -268,7 +268,7 @@ static int ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq 
*io_cq,
ENA_ETH_IO_RX_CDESC_BASE_MBZ17_MASK)) &&
  ena_com_get_cap(dev, ENA_ADMIN_CDESC_MBZ))) {
ena_trc_err(dev,
-   "Corrupted RX descriptor #%d on q_id: %d, 
req_id: %u\n",
+   "Corrupted RX descriptor #%u on q_id: %u, 
req_id: %u\n",
count, io_cq->qid, cdesc->req_id);
return ENA_COM_FAULT;
}
@@ -288,7 +288,7 @@ static int ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq 
*io_cq,
io_cq->cur_rx_pkt_cdesc_start_idx = head_masked;
 
ena_trc_dbg(ena_com_io_cq_to_ena_dev(io_cq),
-   "ENA q_id: %d packets were completed. first desc 
idx %u descs# %d\n",
+   "ENA q_id: %u packets were completed. first desc 
idx %u descs# %u\n",
io_cq->qid, *first_cdesc_idx, count);
} else {
io_cq->cur_rx_pkt_cdesc_count = count;
@@ -394,7 +394,7 @@ static void ena_com_rx_set_flags(struct ena_com_io_cq 
*io_cq,
ENA_ETH_IO_RX_CDESC_BASE_IPV4_FRAG_SHIFT;
 
ena_trc_dbg(ena_com_io_cq_to_ena_dev(io_cq),
-   "l3_proto %d l4_proto %d l3_csum_err %d l4_csum_err %d hash 
%d frag %d cdesc_status %x\n",
+   "l3_proto %d l4_proto %d l3_csum_err %d l4_csum_err %d hash 
%u frag %d cdesc_status %x\n",
ena_rx_ctx->l3_proto,
ena_rx_ctx->l4_proto,
ena_rx_ctx->l3_csum_err,
@@ -434,7 +434,7 @@ int ena_com_prepare_tx(struct ena_com_io_sq *io_sq,
 
if (unlikely(header_len > io_sq->tx_max_header_size)) {
ena_trc_err(ena_com_io_sq_to_ena_dev(io_sq),
-   "Header size is too large %d max header: %d\n",
+   "Header size is too large %u max header: %u\n",
header_len, io_sq->tx_max_header_size);
return ENA_COM_INVAL;
}
@@ -592,12 +592,12 @@ int ena_com_rx_pkt(struct ena_com_io_cq *io_cq,
}
 
ena_trc_dbg(ena_com_io_cq_to_ena_dev(io_cq),
-   "Fetch rx packet: queue %d completed desc: %d\n",
+   "Fetch rx packet: queue %u completed desc: %u\n",
io_cq->qid, nb_hw_desc);
 
if (unlikely(nb_hw_desc > ena_rx_ctx->max_bufs)) {
ena_trc_err(ena_com_io_cq_to_ena_dev(io_cq),
-   "Too many RX cdescs (%d) > MAX(%d)\n",
+   "Too many RX cdescs (%u) > MAX(%u)\n",
nb_hw_desc, ena_rx_ctx->max_bufs);
return ENA_COM_NO_SPACE;
}
@@ -622,7 +622,7 @@ int ena_com_rx_pkt(struct ena_com_io_cq *io_cq,
io_sq->next_to_comp += nb_hw_desc;
 
ena_trc_dbg(ena_com_io_cq_to_ena_dev(io_cq),
-   "[%s][QID#%d] Updating SQ head to: %d\n", __func__,
+   "Updating Queue %u, SQ head to: %u\n",
io_sq->qid, io_sq->next_to_comp);
 
/* Get rx flags from the last pkt */
@@ -660,8 +660,8 @@ int ena_com_add_single_rx_

[PATCH v2 24/33] net/ena/hal: handle command abort

2024-03-04 Thread shaibran
From: Shai Brandes 

Currently admin_queue->stats.aborted_cmd counter is incremented if an
admin command status is ENA_CMD_ABORTED and only if the admin queue is
in polling mode.
This commit fixes handling the case of incrementing
admin_queue->stats.aborted_cmd if the admin queue is in interrupt
mode as well.
Also added a verification that the command status is a valid
completion status which is currently verified only if the admin queue
is in polling mode.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/hal/ena_com.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ena/hal/ena_com.c b/drivers/net/ena/hal/ena_com.c
index 053e095585..b98540ba63 100644
--- a/drivers/net/ena/hal/ena_com.c
+++ b/drivers/net/ena/hal/ena_com.c
@@ -824,8 +824,19 @@ static int 
ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *com
ret = ENA_COM_TIMER_EXPIRED;
goto err;
}
+   } else if (unlikely(comp_ctx->status == ENA_CMD_ABORTED)) {
+   ena_trc_err(admin_queue->ena_dev, "Command was aborted\n");
+   ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
+   admin_queue->stats.aborted_cmd++;
+   ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);
+   ret = ENA_COM_NO_DEVICE;
+   goto err;
}
 
+   ENA_WARN(comp_ctx->status != ENA_CMD_COMPLETED,
+admin_queue->ena_dev, "Invalid comp status %d\n",
+comp_ctx->status);
+
ret = ena_com_comp_status_to_errno(admin_queue, comp_ctx->comp_status);
 err:
comp_ctxt_release(admin_queue, comp_ctx);
-- 
2.17.1



[PATCH v2 29/33] net/ena: update device-preferred size of rings

2024-03-04 Thread shaibran
From: Shai Brandes 

Update the device-preferred size of the Tx ring to fall within the
valid range when a large LLQ is enabled. For consistency, align the
device-preferred size of the Rx ring accordingly.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 2414f631c8..2a7b7c0cba 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -2595,8 +2595,10 @@ static int ena_infos_get(struct rte_eth_dev *dev,
dev_info->tx_desc_lim.nb_mtu_seg_max = RTE_MIN(ENA_PKT_MAX_BUFS,
adapter->max_tx_sgl_size);
 
-   dev_info->default_rxportconf.ring_size = ENA_DEFAULT_RING_SIZE;
-   dev_info->default_txportconf.ring_size = ENA_DEFAULT_RING_SIZE;
+   dev_info->default_rxportconf.ring_size = RTE_MIN(ENA_DEFAULT_RING_SIZE,
+
dev_info->rx_desc_lim.nb_max);
+   dev_info->default_txportconf.ring_size = RTE_MIN(ENA_DEFAULT_RING_SIZE,
+
dev_info->tx_desc_lim.nb_max);
 
dev_info->err_handle_mode = RTE_ETH_ERROR_HANDLE_MODE_PASSIVE;
 
-- 
2.17.1



[PATCH v2 31/33] net/ena: support max large llq depth from the device

2024-03-04 Thread shaibran
From: Shai Brandes 

Selected AWS instances from later generations enable
large LLQ by default, allowing the transmission of
packets with headers exceeding 96 bytes.

Due to the overall ENA memory BAR size limitation,
large LLQ has the side effect of halving the maximum
number of LLQ entries (from 1024 to 512).

ENA-Express, powered by AWS Scalable Reliable Datagram
(SRD) technology, requires Tx queue with 1024 entries.
Selected AWS instances from upcoming generations will
have double the size of the ENA memory BAR, enabling ENA-Express
to work with a large LLQ of 1024 entries.

The initial default large LLQ size will remain 512.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/rel_notes/release_24_03.rst|  2 +
 drivers/net/ena/ena_ethdev.c  | 38 ---
 drivers/net/ena/hal/ena_defs/ena_admin_defs.h |  4 +-
 3 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 2a22bb07ed..9823616eeb 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -107,6 +107,8 @@ New Features
   * Added support for sub-optimal configuration notifications from the device.
   * Restructured fast release of mbufs when RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE 
optimization is enabled.
   * Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg 
`llq_policy`.
+  * Added support for LLQ header size recommendation from the device.
+  * Allowed large LLQ with 1024 entries when the device supports enlarged 
memory BAR.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index d73e321d0f..43693ee2ee 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -42,6 +42,8 @@
 
 #define DECIMAL_BASE 10
 
+#define MAX_WIDE_LLQ_DEPTH_UNSUPPORTED 0
+
 /*
  * We should try to keep ENA_CLEANUP_BUF_SIZE lower than
  * RTE_MEMPOOL_CACHE_MAX_SIZE, so we can fit this in mempool local cache.
@@ -1071,7 +1073,7 @@ static int
 ena_calc_io_queue_size(struct ena_calc_queue_size_ctx *ctx,
   bool use_large_llq_hdr)
 {
-   struct ena_admin_feature_llq_desc *llq = &ctx->get_feat_ctx->llq;
+   struct ena_admin_feature_llq_desc *dev = &ctx->get_feat_ctx->llq;
struct ena_com_dev *ena_dev = ctx->ena_dev;
uint32_t max_tx_queue_size;
uint32_t max_rx_queue_size;
@@ -1086,7 +1088,7 @@ ena_calc_io_queue_size(struct ena_calc_queue_size_ctx 
*ctx,
if (ena_dev->tx_mem_queue_type ==
ENA_ADMIN_PLACEMENT_POLICY_DEV) {
max_tx_queue_size = RTE_MIN(max_tx_queue_size,
-   llq->max_llq_depth);
+   dev->max_llq_depth);
} else {
max_tx_queue_size = RTE_MIN(max_tx_queue_size,
max_queue_ext->max_tx_sq_depth);
@@ -1106,7 +1108,7 @@ ena_calc_io_queue_size(struct ena_calc_queue_size_ctx 
*ctx,
if (ena_dev->tx_mem_queue_type ==
ENA_ADMIN_PLACEMENT_POLICY_DEV) {
max_tx_queue_size = RTE_MIN(max_tx_queue_size,
-   llq->max_llq_depth);
+   dev->max_llq_depth);
} else {
max_tx_queue_size = RTE_MIN(max_tx_queue_size,
max_queues->max_sq_depth);
@@ -1122,18 +1124,28 @@ ena_calc_io_queue_size(struct ena_calc_queue_size_ctx 
*ctx,
max_rx_queue_size = rte_align32prevpow2(max_rx_queue_size);
max_tx_queue_size = rte_align32prevpow2(max_tx_queue_size);
 
-   if (use_large_llq_hdr) {
-   if ((llq->entry_size_ctrl_supported &
-ENA_ADMIN_LIST_ENTRY_SIZE_256B) &&
-   (ena_dev->tx_mem_queue_type ==
-ENA_ADMIN_PLACEMENT_POLICY_DEV)) {
-   max_tx_queue_size /= 2;
-   PMD_INIT_LOG(INFO,
-   "Forcing large headers and decreasing maximum 
Tx queue size to %d\n",
+   if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV && 
use_large_llq_hdr) {
+   /* intersection between driver configuration and device 
capabilities */
+   if (dev->entry_size_ctrl_supported & 
ENA_ADMIN_LIST_ENTRY_SIZE_256B) {
+   if (dev->max_wide_llq_depth == 
MAX_WIDE_LLQ_DEPTH_UNSUPPORTED) {
+   /* Devices that do not support the double-sized 
ENA memory BAR will
+* report max_wide_llq_depth as 0. In such 
case, driver halves the
+* queue depth when working in large llq policy.
+*/
+   max_tx_queue_size >>= 1;
+   PMD_INIT_LOG(INFO,
+   

[PATCH v2 32/33] net/ena: control path pure polling mode

2024-03-04 Thread shaibran
From: Shai Brandes 

This commit implements a new operation mode that enables purely
polling-based functionality, eliminating the need for interrupts in
the control path. This mode is not activated by default and can be
toggled using the "control_poll_interval" devarg. When operating in
this mode, periodic alarms are used to monitor the control queues.

A non-zero value for this devarg is mandatory for control path
functionality when binding ports to uio_pci_generic kernel module which
lacks interrupt support.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 doc/guides/nics/ena.rst|  52 +---
 doc/guides/rel_notes/release_24_03.rst |   2 +
 drivers/net/ena/ena_ethdev.c   | 108 -
 drivers/net/ena/ena_ethdev.h   |   5 ++
 4 files changed, 133 insertions(+), 34 deletions(-)

diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 53c9341859..d2dd4fa4a0 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -109,12 +109,16 @@ Runtime Configuration
 
* **llq_policy** (default 1)
 
- Controls whether use device recommended header policy or override it.
+ Controls whether use device recommended header policy or override it:
+
  0 - Disable LLQ.
- **Use with extreme caution as it leads to a huge performance
- degradation on AWS instances from 6th generation onwards.**
+ **Use with extreme caution as it leads to a huge performance
+ degradation on AWS instances from 6th generation onwards.**
+
  1 - Accept device recommended LLQ policy (Default).
+
  2 - Enforce normal LLQ policy.
+
  3 - Enforce large LLQ policy.
 
* **miss_txc_to** (default 5)
@@ -126,6 +130,18 @@ Runtime Configuration
  timer service. Setting this parameter to 0 disables this feature. Maximum
  allowed value is 60 seconds.
 
+   * **control_poll_interval** (default 0)
+
+ Enable polling-based functionality of the admin queues, eliminating the
+ need for interrupts in the control-path:
+
+ 0 - Disable (Admin queue will work in interrupt mode).
+
+ [1..1000] - Number of milliseconds to wait between periodic inspection of 
the admin queues.
+
+ **A non-zero value for this devarg is mandatory for control path 
functionality
+ when binding ports to uio_pci_generic kernel module which lacks interrupt 
support.**
+
 ENA Configuration Parameters
 
 
@@ -164,23 +180,23 @@ Prerequisites
 #. Prepare the system as recommended by DPDK suite.  This includes environment
variables, hugepages configuration, tool-chains and configuration.
 
-#. ENA PMD can operate with ``vfio-pci``(*) or ``igb_uio`` driver.
+#. ENA PMD can operate with ``vfio-pci`` (*), ``igb_uio``, or 
``uio_pci_generic`` driver.
 
(*) ENAv2 hardware supports Low Latency Queue v2 (LLQv2). This feature
reduces the latency of the packets by pushing the header directly through
the PCI to the device, before the DMA is even triggered. For proper work
-   kernel PCI driver must support write combining (WC).
+   kernel PCI driver must support write-combining (WC).
In DPDK ``igb_uio`` it must be enabled by loading module with
``wc_activate=1`` flag (example below). However, mainline's vfio-pci
-   driver in kernel doesn't have WC support yet (planed to be added).
+   driver in kernel doesn't have WC support yet (planned to be added).
If vfio-pci is used user should follow `AWS ENA PMD documentation

`_.
 
-#. Insert ``vfio-pci`` or ``igb_uio`` kernel module using the command
-   ``modprobe vfio-pci`` or ``modprobe uio; insmod igb_uio.ko wc_activate=1``
-   respectively.
+#. For ``igb_uio``:
+   Insert ``igb_uio`` kernel module using the command ``modprobe uio; insmod 
igb_uio.ko wc_activate=1``
 
-#. For ``vfio-pci`` users only:
+#. For ``vfio-pci``:
+   Insert ``vfio-pci`` kernel module using the command ``modprobe vfio-pci``
Please make sure that ``IOMMU`` is enabled in your system,
or use ``vfio`` driver in ``noiommu`` mode::
 
@@ -189,7 +205,17 @@ Prerequisites
To use ``noiommu`` mode, the ``vfio-pci`` must be built with flag
``CONFIG_VFIO_NOIOMMU``.
 
-#. Bind the intended ENA device to ``vfio-pci`` or ``igb_uio`` module.
+#. For ``uio_pci_generic``:
+   Insert ``uio_pci_generic`` kernel module using the command ``modprobe 
uio_pci_generic``.
+   Make sure that the IOMMU is disabled or is in passthrough mode.
+   For example: ``modprobe uio_pci_generic intel_iommu=off``.
+
+   Note that when launching the application, the ``control_poll_interval`` 
devarg must be used with a non-zero value (1000 is recommended)
+   as ``uio_pci_generic`` lacks interrupt support. The control-path (admin 
queues) of the ENA require poll-mode
+   to process command completion and asynchronous notification from the device.
+   For example: ``dpdk-app -a "00:06.0,control_path_poll_

[PATCH v2 33/33] net/ena: upgrade driver version to 2.9.0

2024-03-04 Thread shaibran
From: Shai Brandes 

upgrade driver version to 2.9.0.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index af1f6d6d05..f47f585611 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -22,7 +22,7 @@
 #include 
 
 #define DRV_MODULE_VER_MAJOR   2
-#define DRV_MODULE_VER_MINOR   8
+#define DRV_MODULE_VER_MINOR   9
 #define DRV_MODULE_VER_SUBMINOR0
 
 #define __MERGE_64B_H_L(h, l) (((uint64_t)h << 32) | l)
-- 
2.17.1



[PATCH v2 30/33] net/ena: exhaust interrupt callbacks in device close

2024-03-04 Thread shaibran
From: Shai Brandes 

Change rte_intr_callback_unregister to its synchronous variant to
ensure all active interrupt callbacks are completed before proceeding
with the flow. Relocate the interrupt deregistration to precede the
release of stats memory, thereby preventing the interrupt handler
from accessing memory that has already been freed.

Signed-off-by: Shai Brandes 
Reviewed-by: Amit Bernstein 
---
 drivers/net/ena/ena_ethdev.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 2a7b7c0cba..d73e321d0f 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -871,6 +871,7 @@ static int ena_close(struct rte_eth_dev *dev)
struct rte_intr_handle *intr_handle = pci_dev->intr_handle;
struct ena_adapter *adapter = dev->data->dev_private;
int ret = 0;
+   int rc;
 
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
@@ -879,17 +880,17 @@ static int ena_close(struct rte_eth_dev *dev)
ret = ena_stop(dev);
adapter->state = ENA_ADAPTER_STATE_CLOSED;
 
+   rte_intr_disable(intr_handle);
+   rc = rte_intr_callback_unregister_sync(intr_handle, 
ena_interrupt_handler_rte, dev);
+   if (unlikely(rc != 0))
+   PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n");
+
ena_rx_queue_release_all(dev);
ena_tx_queue_release_all(dev);
 
rte_free(adapter->drv_stats);
adapter->drv_stats = NULL;
 
-   rte_intr_disable(intr_handle);
-   rte_intr_callback_unregister(intr_handle,
-ena_interrupt_handler_rte,
-dev);
-
/*
 * MAC is not allocated dynamically. Setting NULL should prevent from
 * release of the resource in the rte_eth_dev_release_port().
-- 
2.17.1



[PATCH] common/cnxk: fix loopback port dataflow issue

2024-03-04 Thread Rahul Bhansali
With loopback interface and IPsec Inbound traffic, getting
NIX_CQERRINT_CPT_DROP interrupt and dataflow is stopped.
This is due to flow control configuration is skipped as
roc_nix_is_esw() returns true for loopback device also.

Fixes: 978dc3a13f7b ("common/cnxk: base support for eswitch VF")
Fixes: f812768a9e66 ("net/cnxk: support eswitch VF as ethernet device")

Signed-off-by: Rahul Bhansali 
---
 drivers/common/cnxk/roc_nix.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/common/cnxk/roc_nix.c b/drivers/common/cnxk/roc_nix.c
index 20202788b5..041621dfaa 100644
--- a/drivers/common/cnxk/roc_nix.c
+++ b/drivers/common/cnxk/roc_nix.c
@@ -385,8 +385,9 @@ sdp_lbk_id_update(struct plt_pci_device *pci_dev, struct 
nix *nix)
nix->sdp_link = true;
break;
case PCI_DEVID_CNXK_RVU_AF_VF:
-   case PCI_DEVID_CNXK_RVU_ESWITCH_VF:
nix->lbk_link = true;
+   break;
+   case PCI_DEVID_CNXK_RVU_ESWITCH_VF:
nix->esw_link = true;
break;
default:
-- 
2.25.1



Re: [PATCH v12] net/iavf: add diagnostic support in TX path

2024-03-04 Thread Bruce Richardson
On Thu, Feb 29, 2024 at 06:38:47PM +, Bruce Richardson wrote:
> On Mon, Feb 19, 2024 at 09:55:14AM +, Mingjin Ye wrote:
> > Implemented a Tx wrapper to perform a thorough check on mbufs,
> > categorizing and counting invalid cases by types for diagnostic
> > purposes. The count of invalid cases is accessible through xstats_get.
> > 
> > Also, the devarg option "mbuf_check" was introduced to configure the
> > diagnostic parameters to enable the appropriate diagnostic features.
> > 
> > supported cases: mbuf, size, segment, offload.
> >  1. mbuf: check for corrupted mbuf.
> >  2. size: check min/max packet length according to hw spec.
> >  3. segment: check number of mbuf segments not exceed hw limitation.
> >  4. offload: check any unsupported offload flag.
> > 
> > parameter format: "mbuf_check=" or "mbuf_check=[,]"
> > eg: dpdk-testpmd -a :81:01.0,mbuf_check=[mbuf,size] -- -i
> > 
> > Signed-off-by: Mingjin Ye 
> > ---
> Carrying ack from v11:
> 
> Acked-by: Anatoly Burakov 
> 
> Patch applied, with some minor rework of the docs, to dpdk-next-net-intel.
> 
Also updated some line wrapping following additional review. Corrected
version pushed now.

/Bruce


[DPDK/vhost/virtio Bug 1394] vq_assert_lock__ fail in vhost_user_set_vring_addr during live migration with HW vDPA

2024-03-04 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1394

Bug ID: 1394
   Summary: vq_assert_lock__ fail in vhost_user_set_vring_addr
during live migration with HW vDPA
   Product: DPDK
   Version: unspecified
  Hardware: x86
OS: Linux
Status: UNCONFIRMED
  Severity: critical
  Priority: Normal
 Component: vhost/virtio
  Assignee: dev@dpdk.org
  Reporter: yaj...@nvidia.com
  Target Milestone: ---

DPDK version: v24.03-rc1

1. boot vDPA and qemu(8.1)
build/examples/dpdk-vdpa -a 5e:00.2,class=vdpa --file-prefix vf0
--log-level=.,info -- --client --iface /tmp/vfe-net 








2. live migration VM to another server
sudo virsh migrate --verbose --live --persistent gen-l-vrt-295-005-CentOS-7.4
qemu+ssh://gen-l-vrt-294/system  --unsafe

3. dpdk crash

After device configured, vhost_user_lock_all_queue_pairs won't be called. The 
vq_assert_lock__ failed in vhost_user_set_vring_addr for vDPA case.

related to commit:
commit 741dc052eaf9459cc576b0d87e96a40069485c32 (HEAD)
Author: David Marchand david.march...@redhat.com
Date:   Tue Dec 5 10:45:34 2023 +0100

vhost: annotate virtqueue access checks

Modifying vq->access_ok should be done with a write lock taken.
Annotate vring_translate() and vring_invalidate().



new port /tmp/vfe-net0, device : 5e:00.2
mlx5_vdpa: MTU cannot be set on device 5e:00.2.
mlx5_vdpa: Region 0: HVA 0x7fff, GPA 0x0, size 0xc000.
mlx5_vdpa: Region 1: HVA 0x7ffdc000, GPA 0x1, size 0x14000.
mlx5_vdpa: Indirect mkey mode is KLM Fixed Buffer Size.
mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 0.
mlx5_vdpa: Virtq 0 notifier state is enabled.
mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 1.
mlx5_vdpa: Virtq 1 notifier state is enabled.
[New Thread 0x7fffe7b9b400 (LWP 962699)]
mlx5_vdpa: vDPA device 0 was configured.
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 0
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 1
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 2
mlx5_vdpa: Update virtq 2 status disable -> enable.
mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 2.
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 3
mlx5_vdpa: Update virtq 3 status disable -> enable.
mlx5_vdpa: Virtq 2 notifier state is enabled.
mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 3.
mlx5_vdpa: Virtq 3 notifier state is enabled.
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_LOG_BASE
VHOST_CONFIG: (/tmp/vfe-net0) log mmap size: 294912, offset: 0
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_FEATURES
VHOST_CONFIG: (/tmp/vfe-net0) negotiated Virtio features: 0x144601803
mlx5_vdpa: mlx5 vdpa: enabling dirty logging...
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_GET_STATUS
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ADDR
EAL: PANIC in vq_assert_lock__():
VHOST_CONFIG: (/tmp/vfe-net0) vhost_user_set_vring_addr() called without access
lock taken.
0: /images/vdpa/dpdk/build/examples/dpdk-vdpa (rte_dump_stack+0x1f) [aadfca]
1: /images/vdpa/dpdk/build/examples/dpdk-vdpa (__rte_panic+0xe2) [a8032b]
2: /images/vdpa/dpdk/build/examples/dpdk-vdpa (40+0x3bac83) [7bac83]
3: /images/vdpa/dpdk/build/examples/dpdk-vdpa (40+0x3bd0ef) [7bd0ef]
4: /images/vdpa/dpdk/build/examples/dpdk-vdpa (vhost_user_msg_handler+0x508)
[7c264f]
5: /images/vdpa/dpdk/build/examples/dpdk-vdpa (40+0x36b797) [76b797]
6: /images/vdpa/dpdk/build/examples/dpdk-vdpa (fdset_event_dispatch+0x1cd)
[769793]
7: /images/vdpa/dpdk/build/examples/dpdk-vdpa (40+0x6970ca) [a970ca]
8: /images/vdpa/dpdk/build/examples/dpdk-vdpa (40+0x6af41c) [aaf41c]
9: /lib64/libpthread.so.0 (76426000+0x814a) [7642e14a]
10: /lib64/libc.so.6 (clone+0x43) [7615ddc3]

Thread 29 "dpdk-vhost-evt" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe839c400 (LWP 962487)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50return ret;
Missing separate debuginfos, use: yum debuginfo-install
elfutils-libelf-0.182-3.el8.x86_64 libgcc-8.4.1-1.el8.x86_64
libibverbs-2307mlnx47-1.2310007.x86_64 libnl3-3.5.0-1.el8.x86_64
libpcap-1.9.1-5.el8.x86_64 numactl-libs-2.0.12-11.el8.x86_64
openssl-libs-1.1.1k-5.el8_5.x86_64 zlib-1.2.11-17.el8.x86_64
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x76082db5 in __GI_abort () at abort.c:79
#2  0x00a80330 in __rte_panic (funcname=0x3461330 <__func__.33032>
"vq_assert_

Re: [PATCH v2 0/3] reload the firmware as needed

2024-03-04 Thread Ferruh Yigit
On 3/1/2024 8:42 AM, Chaoyong He wrote:
> Add the necessary logic to get firmware version from firmware file, and
> only reload the firmware when the firmware version changed.
> 
> Also add a device argument which can force reload the firmware and
> ignore the firmware version.
> 
> ---
> v2:
> * Update commit log to explain what 'MIP' is.
> * Document about the new add 'force_reload_fw' device argument.
> ---
> 
> Peng Zhang (3):
>   net/nfp: add the elf module
>   net/nfp: reload the firmware only when firmware changed
>   net/nfp: add force reload firmware option
>

Series applied to dpdk-next-net/main, thanks.


RE: [PATCH v5 4/4] hash: add SVE support for bulk key lookup

2024-03-04 Thread Konstantin Ananyev


> >> - Implemented SVE code for comparing signatures in bulk lookup.
> >> - Added Defines in code for SVE code support.
> >> - Optimise NEON code
> >> - New SVE code is ~5% slower than optimized NEON for N2 processor.
> >>
> >> Signed-off-by: Yoan Picchi 
> >> Signed-off-by: Harjot Singh 
> >> Reviewed-by: Nathan Brown 
> >> Reviewed-by: Ruifeng Wang 
> >> ---
> >>   lib/hash/rte_cuckoo_hash.c | 196 -
> >>   lib/hash/rte_cuckoo_hash.h |   1 +
> >>   2 files changed, 151 insertions(+), 46 deletions(-)
> >>
> >> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> >> index a07dd3a28d..231d6d6ded 100644
> >> --- a/lib/hash/rte_cuckoo_hash.c
> >> +++ b/lib/hash/rte_cuckoo_hash.c
> >> @@ -442,8 +442,11 @@ rte_hash_create(const struct rte_hash_parameters 
> >> *params)
> >>h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
> >>else
> >>   #elif defined(RTE_ARCH_ARM64)
> >> -  if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
> >> +  if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
> >>h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
> >> +  if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
> >> +  h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
> >> +  }
> >>else
> >>   #endif
> >>h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
> >> @@ -1860,37 +1863,103 @@ rte_hash_free_key_with_position(const struct 
> >> rte_hash *h,
> >>   #if defined(__ARM_NEON)
> >>
> >>   static inline void
> >> -compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
> >> *sec_hash_matches,
> >> -  const struct rte_hash_bucket *prim_bkt,
> >> -  const struct rte_hash_bucket *sec_bkt,
> >> +compare_signatures_dense(uint16_t *hitmask_buffer,
> >> +  const uint16_t *prim_bucket_sigs,
> >> +  const uint16_t *sec_bucket_sigs,
> >>uint16_t sig,
> >>enum rte_hash_sig_compare_function sig_cmp_fn)
> >>   {
> >>unsigned int i;
> >>
> >> +  static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
> >> +  "The hitmask must be exactly wide enough to accept the whole hitmask if 
> >> it is dense");
> >> +
> >>/* For match mask every bits indicates the match */
> >>switch (sig_cmp_fn) {
> >
> > Can I ask to move arch specific comparison code into some arch-specific 
> > headers or so?
> > It is getting really hard to read and understand the generic code with all 
> > these ifdefs and arch specific instructions...
> >

Hi, apologies for long delay in response. 

 
> I can easily enough move the compare_signatures into an arm/x86
> directory, and have a default version in the code.

Yes, that's what I thought about.
 
> The problem would be for bulk lookup. The function is already duplicated
>   2 times (the l and lf version). If I remove the #ifdefs, I'll need to
> duplicate them again into 4 nearly identical versions (dense and
> sparse). The only third options I see would be some preprocessor macro
> to patch the function, but that looks even dirtier to me.

Not sure I understood you here: from looking at the code I don't see any
arch specific ifdefs in bulk_lookup() routines.
What I am missing here?
 

> I think duplicating the code would be bad, but I can do it if you want.
> Unless you have a better solution?
> 
> >> +#if RTE_HASH_BUCKET_ENTRIES <= 8
> >>case RTE_HASH_COMPARE_NEON: {
> >> -  uint16x8_t vmat, x;
> >> +  uint16x8_t vmat, hit1, hit2;
> >>const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
> >> 0x80};
> >>const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
> >>
> >>/* Compare all signatures in the primary bucket */
> >> -  vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
> >> *)prim_bkt->sig_current));
> >> -  x = vandq_u16(vmat, mask);
> >> -  *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
> >> +  vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
> >> +  hit1 = vandq_u16(vmat, mask);
> >> +
> >>/* Compare all signatures in the secondary bucket */
> >> -  vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
> >> *)sec_bkt->sig_current));
> >> -  x = vandq_u16(vmat, mask);
> >> -  *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
> >> +  vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
> >> +  hit2 = vandq_u16(vmat, mask);
> >> +
> >> +  hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
> >> +  hit2 = vorrq_u16(hit1, hit2);
> >> +  *hitmask_buffer = vaddvq_u16(hit2);
> >> +  }
> >> +  break;
> >> +#endif
> >> +#if defined(RTE_HAS_SVE_ACLE)
> >> +  case RTE_HASH_COMPARE_SVE: {
> >> +  svuint16_t vsign, shift, sv_matches;
> >> +  svbool_t pred, match, bucket_wide_pred;
> >> +  int i = 0;
> >> +  uint64_t vl = svcnth();
> >> +
> >> +  vsign = svdup_u16(sig);
> >> +  s

[PATCH v1] doc: fix aging poll frequency devargs information

2024-03-04 Thread Ankur Dwivedi
The information for CNXK NPC MCAM aging poll frequency devargs is moved to
runtime config options section for ethdev device. Initially it was
incorrectly placed in runtime config options section for inline device.

Fixes: a44775505911 ("net/cnxk: support flow aging")
Cc: sta...@dpdk.org

Signed-off-by: Ankur Dwivedi 
---
 doc/guides/nics/cnxk.rst | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index 9ec52e380f..67cfe001ab 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -416,6 +416,18 @@ Runtime Config Options
With the above configuration, PMD would allocate meta buffers of size 512 
for
inline inbound IPsec processing second pass.
 
+- ``NPC MCAM Aging poll frequency in seconds`` (default ``10``)
+
+   Poll frequency for aging control thread can be specified by
+   ``aging_poll_freq`` ``devargs`` parameter.
+
+   For example::
+
+  -a 0002:01:00.2,aging_poll_freq=50
+
+   With the above configuration, driver would poll for aging flows every 50
+   seconds.
+
 .. note::
 
Above devarg parameters are configurable per device, user needs to pass the
@@ -601,18 +613,6 @@ Runtime Config Options for inline device
With the above configuration, driver would poll for soft expiry events every
1000 usec.
 
-- ``NPC MCAM Aging poll frequency in seconds`` (default ``10``)
-
-   Poll frequency for aging control thread can be specified by
-   ``aging_poll_freq`` ``devargs`` parameter.
-
-   For example::
-
-  -a 0002:01:00.2,aging_poll_freq=50
-
-   With the above configuration, driver would poll for aging flows every 50
-   seconds.
-
 Debugging Options
 -
 
-- 
2.25.1



RE: [PATCH v5 0/4] add pointer compression API

2024-03-04 Thread Konstantin Ananyev


> > On Mar 1, 2024, at 5:16 AM, Morten Brørup  
> > wrote:
> >
> >> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com]
> >> Sent: Thursday, 22 February 2024 17.16
> >>
> >>> For some reason your email is not visible to me, even though it's in the
> >>> archive.
> >>
> >> No worries.
> >>
> >>>
> >>> On 02/11/202416:32,Konstantin Ananyev konstantin.v.ananyev  wrote:
> >>>
>  From one side the code itself is very small and straightforward, > from
> >> other side - it is not clear to me what is intended usage for it
>  within DPDK and it's applianances?
>  Konstantin
> >>>
> >>> The intended usage is explained in the cover email (see below) and
> >> demonstrated
> >>> in the test supplied in the following patch - when sending arrays of
> >> pointers
> >>> between cores as it happens in a forwarding example.
> >>
> >> Yes, I saw that. The thing is that test is a 'synthetic' one.
> >> My question was about how do you expect people to use it in more realistic
> >> scenarios?
> >> Let say user has a bunch of mbuf pointers, possibly from different 
> >> mempools.
> >> How he can use this API: how to deduce the base pointer for all of them and
> >> what to
> >> do if it can't be done?
> >
> > I share Konstantin's concerns with this feature.
> >
> > If we want to compress mbuf pointers in applications with a few mbuf pools, 
> > e.g. an mbuf pool per CPU socket, the compression
> algorithm would be different.
> This feature is targeted for pipeline mode of applications. We see many 
> customers using pipeline mode. This feature helps in reducing
> the cost of transferring the packets between cores by reducing the copies 
> involved.

I do understand the intention, and I am not arguing about usefulness of the 
pipeline model. 
My point is you are introducing new API: compress/decompress pointers,
but don't provide (or even describe) any proper way for the developer to use it 
in a safe and predictable manner.
Which from my perspective make it nearly useless and misleading.

> For an application with multiple pools, it depends on how the applications 
> are using multiple pools. But, if there is a bunch of packets
> belonging to multiple mempools, compressing those mbufs may not be possible. 
> But if those mbufs are grouped per mempool and
> are transferred on different queues, then it is possible. Hence the APIs are 
> implemented very generically.

Ok, let's consider even more simplistic scenario - all pointers belong to one 
mempool.
AFAIK, even one mempool can contain elements from different memzones,
and these memzones are not guaranteed to have consecutive VAs.
So even one mempool, with total size <=4GB can contain elements with distances 
between them more than 4GB. 
Now let say at startup user created a mempool, how he can determine 
programmatically
can he apply your compress API safely on it or not?
I presume that if you are serious about this API usage, then such ability has 
to be provided.
Something like:

int compress_pointer_deduce_mempool_base(const struct rte_memepool *mp[],
uint32_t nb_mp, uint32_t compress_size, uintptr_t *base_ptr);

Or probably even more generic one:

struct mem_buf {uintptr_t base, size_t len;}; 
int compress_pointer_deduce_base(const struct mem_buf *mem_buf[],
uint32_t nb_membuf, uint32_t compress_size, uintptr_t *base_ptr);

Even with these functions in-place, user has to be extra careful:
 - he can't add new memory chunks to these mempools (or he'll need to 
re-calcualte the new base_ptr)
 - he needs to make sure that pointers from only these mempools will be used by 
compress/decompress.
But at least it provides some ability to use this feature in real apps.

With such API in place it should be possible to make the auto-test more 
realistic:
- allocate mempool 
- deduce base_pointer
- then we can have a loop with producer/consumer to mimic realistic workload.
As an example:
 producer(s):  mempool_alloc(); ; 
ring_enqueue();  
 consumer(s): ring_dequeue(); ; free_mbuf();
- free mempool

Or probably you can go even further: take some existing pipeline sample app and 
make it use compress/decompress API.
That will provide people with some ability to test it and measure it's perf 
impact.
Again, it will provide an example of the amount of changes required to enable 
it.
My speculation here that majority of users will find the effort too big, 
while the gain way too limited and fragile.
But at least, there would be some realistic reference point for it and users 
can decide themselves is it worth it or not. 

> >
> > I would like to add:
> > If we want to offer optimizations specifically for applications with a 
> > single mbuf pool, I think it should be considered in a system-wide
> context to determine if performance could be improved in more areas.
> > E.g. removing the pool field from the rte_mbuf structure might free up 
> > space to move hot fields from the second cache line to the
> first, so the second cache line rare

Re: [PATCH v2] vhost: fix VDUSE device destruction failure

2024-03-04 Thread Maxime Coquelin
Le lun. 4 mars 2024, 11:36, David Marchand  a
écrit :

> From: Maxime Coquelin 
>
> VDUSE_DESTROY_DEVICE ioctl can fail because the device's
> chardev is not released despite close syscall having been
> called. It happens because the events handler thread is
> still polling the file descriptor.
>
> fdset_pipe_notify() is not enough because it does not
> ensure the notification has been handled by the event
> thread, it just returns once the notification is sent.
>
> To fix this, this patch introduces a synchronization
> mechanism based on pthread's condition, so that
> fdset_pipe_notify_sync() only returns once the pipe's
> read callback has been executed.
>
> Fixes: 51d018fdac4e ("vhost: add VDUSE events handler")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Maxime Coquelin 
> Signed-off-by: David Marchand 
> ---
> Changes since v1:
> - sync'd only when in VDUSE destruction path,
> - added explicit init of sync_mutex,
>
> ---
>  lib/vhost/fd_man.c | 23 +--
>  lib/vhost/fd_man.h |  6 ++
>  lib/vhost/socket.c |  1 +
>  lib/vhost/vduse.c  |  3 ++-
>  4 files changed, 30 insertions(+), 3 deletions(-)
>

Reviewed-by: Maxime Coquelin 

Thanks for improving the patch,
Maxime


> diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c
> index 79a8d2c006..481e6b900a 100644
> --- a/lib/vhost/fd_man.c
> +++ b/lib/vhost/fd_man.c
> @@ -309,10 +309,11 @@ fdset_event_dispatch(void *arg)
>  }
>
>  static void
> -fdset_pipe_read_cb(int readfd, void *dat __rte_unused,
> +fdset_pipe_read_cb(int readfd, void *dat,
>int *remove __rte_unused)
>  {
> char charbuf[16];
> +   struct fdset *fdset = dat;
> int r = read(readfd, charbuf, sizeof(charbuf));
> /*
>  * Just an optimization, we don't care if read() failed
> @@ -320,6 +321,11 @@ fdset_pipe_read_cb(int readfd, void *dat __rte_unused,
>  * compiler happy
>  */
> RTE_SET_USED(r);
> +
> +   pthread_mutex_lock(&fdset->sync_mutex);
> +   fdset->sync = true;
> +   pthread_cond_broadcast(&fdset->sync_cond);
> +   pthread_mutex_unlock(&fdset->sync_mutex);
>  }
>
>  void
> @@ -342,7 +348,7 @@ fdset_pipe_init(struct fdset *fdset)
> }
>
> ret = fdset_add(fdset, fdset->u.readfd,
> -   fdset_pipe_read_cb, NULL, NULL);
> +   fdset_pipe_read_cb, NULL, fdset);
>
> if (ret < 0) {
> VHOST_FDMAN_LOG(ERR,
> @@ -366,5 +372,18 @@ fdset_pipe_notify(struct fdset *fdset)
>  * compiler happy
>  */
> RTE_SET_USED(r);
> +}
> +
> +void
> +fdset_pipe_notify_sync(struct fdset *fdset)
> +{
> +   pthread_mutex_lock(&fdset->sync_mutex);
> +
> +   fdset->sync = false;
> +   fdset_pipe_notify(fdset);
> +
> +   while (!fdset->sync)
> +   pthread_cond_wait(&fdset->sync_cond, &fdset->sync_mutex);
>
> +   pthread_mutex_unlock(&fdset->sync_mutex);
>  }
> diff --git a/lib/vhost/fd_man.h b/lib/vhost/fd_man.h
> index 6315904c8e..7816fb11ac 100644
> --- a/lib/vhost/fd_man.h
> +++ b/lib/vhost/fd_man.h
> @@ -6,6 +6,7 @@
>  #define _FD_MAN_H_
>  #include 
>  #include 
> +#include 
>
>  #define MAX_FDS 1024
>
> @@ -35,6 +36,10 @@ struct fdset {
> int writefd;
> };
> } u;
> +
> +   pthread_mutex_t sync_mutex;
> +   pthread_cond_t sync_cond;
> +   bool sync;
>  };
>
>
> @@ -53,5 +58,6 @@ int fdset_pipe_init(struct fdset *fdset);
>  void fdset_pipe_uninit(struct fdset *fdset);
>
>  void fdset_pipe_notify(struct fdset *fdset);
> +void fdset_pipe_notify_sync(struct fdset *fdset);
>
>  #endif
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index a2fdac30a4..96b3ab5595 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -93,6 +93,7 @@ static struct vhost_user vhost_user = {
> .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
> .fd_mutex = PTHREAD_MUTEX_INITIALIZER,
> .fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
> +   .sync_mutex = PTHREAD_MUTEX_INITIALIZER,
> .num = 0
> },
> .vsocket_cnt = 0,
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index d462428d2c..e0c6991b69 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -36,6 +36,7 @@ static struct vduse vduse = {
> .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
> .fd_mutex = PTHREAD_MUTEX_INITIALIZER,
> .fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
> +   .sync_mutex = PTHREAD_MUTEX_INITIALIZER,
> .num = 0
> },
>  };
> @@ -618,7 +619,7 @@ vduse_device_destroy(const char *path)
> vduse_device_stop(dev);
>
> fdset_del(&vduse.fdset, dev->vduse_dev_fd);
> -   fdset_pipe_notify(&vduse.fdset);
> +   fdset_pipe_notify_sync(&vduse.fdset);
>
> if (dev->vduse_dev_fd >= 0) {
> close(dev->vduse_dev_fd);

Re: Email based retest request process: proposal for new pull/re-apply feature

2024-03-04 Thread Aaron Conole
zhoumin  writes:

> On Wed, Feb 21, 2024 at 2:24AM, Patrick Robb wrote:
>
>  On Tue, Feb 20, 2024 at 1:12 PM Aaron Conole  wrote:
>
>  Why not something like:
>
>  Recheck-request: [attribute-list],[test-list]...
>
>  For example, then we can do:
>
>  Recheck-request: rebase=[identifier],
>
>  where identifier is a branch specifier (or the word 'latest')?
>
>  I hadn't thought about the option of allowing branch specifiers. Agree that 
> allowing a human correction process for
>  the pw_maintainer_cli.py script choosing the wrong branch sounds helpful.
>
>  My original idea was offering 2 options (test original artifact, or re-apply 
> on latest). Do we want to support for
>  checking out to a specific commit and re-applying there? I figured that 
> would not be worth it (too much of a niche
>  case), but your comments are making me reconsider. 
>
> I agree with you that allowing developers to correct the target branch is 
> useful. But, the developer should just provide
> the name of branch instead of commit ID, which is more reasonable. Of course, 
> the rebasing option is more important.
> So, I consider we can allow developers to submit a request as following 
> format:
>
> Recheck-request: rebase=True|branch=main|contexts=iol-compile-amd64-testing, 
> iol-broadcom-Performance,...
>
> We can use "|" as the separator, for example. `rebase` and `branch` can be 
> optional and we can use the default values
> if the developer doesn't provide them. The default is not rebasing for 
> `rebase` option. The default is the branch chosen
> by pw_maintainer_cli.py script for `branch` option. The `contexts` option is 
> required.

Interesting approach.  But I don't know about contexts= or something
like that.  It means there are two passes through the regex.

Also, I don't know about contexts either - if the series was requested
to rebase, every lab that can re-test probably should since the results
aren't going to be valid from the old tests.

>  Just spit-balling on syntax.
>
>  That said, I agree - if a rebase has been requested, all tests need to
>  be rerun.  Maybe we should consider that the test labels should be added
>  with a run number or something?  Or we could also include that the run
>  is a rerun.  That way for labs that don't currently support the recheck
>  request framework, we can easily tell that they weren't re-tried.
>
>  so re-report with a modified test label? That is good in that it shows the 
> behavior more clearly. But, it also means
>  we will not overwrite any fails. So the fail will still be there, and the 
> patchwork patch page will grow a huge table.
>  Maybe this is fine.  
>
> Re-report with a modified test label may be better. That can tell people more 
> information about the CI testings, such as
> that the retest indeed happened.

Just back from PTO - actually, I don't think we need to adjust the
label, but rather the description.  That would allow the mechanism that
overwrites the existing test to keep the "checks" page tidy, but also
making the retest information clear.  WDYT?

>  Also raises the point of getting more coverage for the retest framework at 
> other labs. I will email Min Zhou
>  regarding how he uses the dpdk-ci project for the loongson build jobs and 
> see how well that can integrate with the
>  get_reruns.py script.

That would be great!



Re: [PATCH v4 12/12] eventdev: fix doxygen processing of event vector struct

2024-03-04 Thread Thomas Monjalon
21/02/2024 11:32, Bruce Richardson:
> The event vector struct was missing comments on two members, and also
> was inadvertently creating a local variable called "__rte_aligned" in
> the doxygen output.
> 
> Correct the comment markers to fix the former issue, and fix the latter
> by putting "#ifdef __DOXYGEN" around the alignment constraint.
[..]
> +#ifndef __DOXYGEN__
>  } __rte_aligned(16);
> +#else
> +};
> +#endif

Would it be possible to make __rte_aligned empty in rte_common.h
instead of each call? Does it fix Doxygen bug?




Re: [RFC 2/7] eal: add generic bit manipulation macros

2024-03-04 Thread Mattias Rönnblom



On 2024-03-04 09:16, Heng Wang wrote:

Hi Mattias,
   I have a comment about the _Generic. What if the user gives uint8_t * or 
uint16_t * as the address. One improvement is that we could add a default 
branch in _Generic to throw a compiler error or assert false.


If the user pass an incompatible pointer, the compiler will generate an 
error.



   Another question is what if nr >= sizeof(type) ? What if you do, for example, 
(uint32_t)1 << 35? Maybe we could add an assert in the implementation?



There are already such asserts in the functions the macro delegates to.

That said, DPDK RTE_ASSERT()s are disabled even in debug builds, so I'm 
not sure it's going to help anyone.



Regards,
Heng

-Original Message-
From: Mattias Rönnblom 
Sent: Saturday, March 2, 2024 2:53 PM
To: dev@dpdk.org
Cc: hof...@lysator.liu.se; Heng Wang ; Mattias Rönnblom 

Subject: [RFC 2/7] eal: add generic bit manipulation macros

Add bit-level test/set/clear/assign macros operating on both 32-bit and 64-bit 
words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom 
---
  lib/eal/include/rte_bitops.h | 81 
  1 file changed, 81 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h index 
9a368724d5..afd0f11033 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -107,6 +107,87 @@ extern "C" {
  #define RTE_FIELD_GET64(mask, reg) \
((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
  
+/**

+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr) \
+   _Generic((addr),\
+uint32_t *: rte_bit_test32,\
+uint64_t *: rte_bit_test64)(addr, nr)
+
+/**
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)  \
+   _Generic((addr),\
+uint32_t *: rte_bit_set32, \
+uint64_t *: rte_bit_set64)(addr, nr)
+
+/**
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)\
+   _Generic((addr),\
+uint32_t *: rte_bit_clear32,   \
+uint64_t *: rte_bit_clear64)(addr, nr)
+
+/**
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or
+64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)\
+   _Generic((addr),\
+uint32_t *: rte_bit_assign32,  \
+uint64_t *: rte_bit_assign64)(addr, nr, value)
+
  /**
   * Test if a particular bit in a 32-bit word is set.
   *
--
2.34.1



Re: [PATCH v4 12/12] eventdev: fix doxygen processing of event vector struct

2024-03-04 Thread Bruce Richardson
On Mon, Mar 04, 2024 at 04:35:41PM +0100, Thomas Monjalon wrote:
> 21/02/2024 11:32, Bruce Richardson:
> > The event vector struct was missing comments on two members, and also
> > was inadvertently creating a local variable called "__rte_aligned" in
> > the doxygen output.
> > 
> > Correct the comment markers to fix the former issue, and fix the latter
> > by putting "#ifdef __DOXYGEN" around the alignment constraint.
> [..]
> > +#ifndef __DOXYGEN__
> >  } __rte_aligned(16);
> > +#else
> > +};
> > +#endif
> 
> Would it be possible to make __rte_aligned empty in rte_common.h
> instead of each call? Does it fix Doxygen bug?
> 
I think that should be fixed globally by Tyler's series for "alignas"[1]
With the new placement for the alignment macro, I don't think this doxygen
issue will occur again.

[1] https://patches.dpdk.org/project/dpdk/list/?series=31229



Re: [PATCH v3] ethdev: add Linux ethtool link mode conversion

2024-03-04 Thread Ferruh Yigit
On 3/3/2024 9:56 AM, Thomas Monjalon wrote:
> Speed capabilities of a NIC may be discovered through its Linux
> kernel driver. It is especially useful for bifurcated drivers,
> so they don't have to duplicate the same logic in the DPDK driver.
> 
> Parsing ethtool speed capabilities is made easy thanks to
> the functions added in ethdev for internal usage only.
> Of course these functions work only on Linux,
> so they are not compiled in other environments.
> 
> In order to ease parsing, the ethtool macro names are parsed
> externally in a shell command which generates a C array
> included in this patch.
> It also avoids to depend on a kernel version.
> This C array should be updated in future to get latest ethtool bits.
> Note it is easier to update this array than adding new cases
> in a parsing code.
> 
> The types in the functions are following the ethtool type:
> uint32_t for bitmaps, and int8_t for the number of 32-bitmaps.
> 
> Signed-off-by: Thomas Monjalon 
>

Acked-by: Ferruh Yigit 

Applied to dpdk-next-net/main, thanks.


Re: [PATCH 1/3] MAINTAINERS: add maintainer for TAP device

2024-03-04 Thread Ferruh Yigit
On 2/29/2024 5:31 PM, Stephen Hemminger wrote:
> Add myself as maintainer for TAP device.
> 
> Signed-off-by: Stephen Hemminger 
>

Acked-by: Ferruh Yigit 


Re: [RFC 1/7] eal: extend bit manipulation functions

2024-03-04 Thread Tyler Retzlaff
On Sun, Mar 03, 2024 at 07:26:36AM +0100, Mattias Rönnblom wrote:
> On 2024-03-02 18:05, Stephen Hemminger wrote:
> >On Sat, 2 Mar 2024 14:53:22 +0100
> >Mattias Rönnblom  wrote:
> >
> >>diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >>index 449565eeae..9a368724d5 100644
> >>--- a/lib/eal/include/rte_bitops.h
> >>+++ b/lib/eal/include/rte_bitops.h
> >>@@ -2,6 +2,7 @@
> >>   * Copyright(c) 2020 Arm Limited
> >>   * Copyright(c) 2010-2019 Intel Corporation
> >>   * Copyright(c) 2023 Microsoft Corporation
> >>+ * Copyright(c) 2024 Ericsson AB
> >>   */
> >
> >Unless this is coming from another project code base, the common
> >practice is not to add copyright for each contributor in later versions.
> >
> 
> Unless it's a large contribution (compared to the rest of the file)?
> 
> I guess that's why the 916c50d commit adds the Microsoft copyright notice.
> 
> >>+/**
> >>+ * Test if a particular bit in a 32-bit word is set.
> >>+ *
> >>+ * This function does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the 32-bit word to query.
> >>+ * @param nr
> >>+ *   The index of the bit (0-31).
> >>+ * @return
> >>+ *   Returns true if the bit is set, and false otherwise.
> >>+ */
> >>+static inline bool
> >>+rte_bit_test32(const uint32_t *addr, unsigned int nr);
> >
> >Is it possible to reorder these inlines to avoid having
> >forward declarations?
> >
> 
> Yes, but I'm not sure it's a net gain.
> 
> A statement expression macro seems like a perfect tool for the job,
> but then MSVC doesn't support statement expressions. You could also
> have a macro that just generate the function body, as oppose to the
> whole function.

statement expressions can be used even with MSVC when using C. but GCC
documentation discourages their use for C++. since the header is
consumed by C++ in addition to C it's preferrable to avoid them.

> 
> I'll consider if I should just bite the bullet and expand all the
> macros. 4x duplication.
> 
> >Also, new functions should be marked __rte_experimental
> >for a release or two.
> 
> Yes, thanks.


Re: [PATCH] common/cnxk: fix loopback port dataflow issue

2024-03-04 Thread Jerin Jacob
On Mon, Mar 4, 2024 at 6:10 PM Rahul Bhansali  wrote:
>
> With loopback interface and IPsec Inbound traffic, getting
> NIX_CQERRINT_CPT_DROP interrupt and dataflow is stopped.
> This is due to flow control configuration is skipped as
> roc_nix_is_esw() returns true for loopback device also.
>
> Fixes: 978dc3a13f7b ("common/cnxk: base support for eswitch VF")
> Fixes: f812768a9e66 ("net/cnxk: support eswitch VF as ethernet device")
>
> Signed-off-by: Rahul Bhansali 

Applied to dpdk-next-net-mrvl/for-main. Thanks


> ---
>  drivers/common/cnxk/roc_nix.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/common/cnxk/roc_nix.c b/drivers/common/cnxk/roc_nix.c
> index 20202788b5..041621dfaa 100644
> --- a/drivers/common/cnxk/roc_nix.c
> +++ b/drivers/common/cnxk/roc_nix.c
> @@ -385,8 +385,9 @@ sdp_lbk_id_update(struct plt_pci_device *pci_dev, struct 
> nix *nix)
> nix->sdp_link = true;
> break;
> case PCI_DEVID_CNXK_RVU_AF_VF:
> -   case PCI_DEVID_CNXK_RVU_ESWITCH_VF:
> nix->lbk_link = true;
> +   break;
> +   case PCI_DEVID_CNXK_RVU_ESWITCH_VF:
> nix->esw_link = true;
> break;
> default:
> --
> 2.25.1
>


  1   2   >