Re: [RFC PATCH v4 1/4] dts: code adjustments for sphinx

2023-10-22 Thread Yoan Picchi

On 8/31/23 11:04, Juraj Linkeš wrote:

sphinx-build only imports the Python modules when building the
documentation; it doesn't run DTS. This requires changes that make the
code importable without running it. This means:
* properly guarding argument parsing in the if __name__ == '__main__'
   block.
* the logger used by DTS runner underwent the same treatment so that it
   doesn't create unnecessary log files.
* however, DTS uses the arguments to construct an object holding global
   variables. The defaults for the global variables needed to be moved
   from argument parsing elsewhere.
* importing the remote_session module from framework resulted in
   circular imports because of one module trying to import another
   module. This is fixed by more granular imports.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/config/__init__.py  |  3 -
  dts/framework/dts.py  | 34 ++-
  dts/framework/remote_session/__init__.py  | 41 -
  .../interactive_remote_session.py |  0
  .../{remote => }/interactive_shell.py |  0
  .../{remote => }/python_shell.py  |  0
  .../remote_session/remote/__init__.py | 27 --
  .../{remote => }/remote_session.py|  0
  .../{remote => }/ssh_session.py   |  0
  .../{remote => }/testpmd_shell.py |  0
  dts/framework/settings.py | 92 +++
  dts/framework/test_suite.py   |  3 +-
  dts/framework/testbed_model/__init__.py   | 12 +--
  dts/framework/testbed_model/common.py | 29 ++
  dts/framework/testbed_model/{hw => }/cpu.py   | 13 +++
  dts/framework/testbed_model/hw/__init__.py| 27 --
  .../linux_session.py  |  4 +-
  dts/framework/testbed_model/node.py   | 22 -
  .../os_session.py | 14 +--
  dts/framework/testbed_model/{hw => }/port.py  |  0
  .../posix_session.py  |  2 +-
  dts/framework/testbed_model/sut_node.py   |  8 +-
  dts/framework/testbed_model/tg_node.py| 30 +-
  .../traffic_generator/__init__.py | 24 +
  .../capturing_traffic_generator.py|  2 +-
  .../{ => traffic_generator}/scapy.py  | 17 +---
  .../traffic_generator.py  | 16 +++-
  .../testbed_model/{hw => }/virtual_device.py  |  0
  dts/framework/utils.py| 53 +--
  dts/main.py   |  3 +-
  30 files changed, 229 insertions(+), 247 deletions(-)
  rename dts/framework/remote_session/{remote => 
}/interactive_remote_session.py (100%)
  rename dts/framework/remote_session/{remote => }/interactive_shell.py (100%)
  rename dts/framework/remote_session/{remote => }/python_shell.py (100%)
  delete mode 100644 dts/framework/remote_session/remote/__init__.py
  rename dts/framework/remote_session/{remote => }/remote_session.py (100%)
  rename dts/framework/remote_session/{remote => }/ssh_session.py (100%)
  rename dts/framework/remote_session/{remote => }/testpmd_shell.py (100%)
  create mode 100644 dts/framework/testbed_model/common.py
  rename dts/framework/testbed_model/{hw => }/cpu.py (95%)
  delete mode 100644 dts/framework/testbed_model/hw/__init__.py
  rename dts/framework/{remote_session => testbed_model}/linux_session.py (98%)
  rename dts/framework/{remote_session => testbed_model}/os_session.py (97%)
  rename dts/framework/testbed_model/{hw => }/port.py (100%)
  rename dts/framework/{remote_session => testbed_model}/posix_session.py (99%)
  create mode 100644 dts/framework/testbed_model/traffic_generator/__init__.py
  rename dts/framework/testbed_model/{ => 
traffic_generator}/capturing_traffic_generator.py (99%)
  rename dts/framework/testbed_model/{ => traffic_generator}/scapy.py (96%)
  rename dts/framework/testbed_model/{ => 
traffic_generator}/traffic_generator.py (80%)
  rename dts/framework/testbed_model/{hw => }/virtual_device.py (100%)

diff --git a/dts/framework/config/__init__.py b/dts/framework/config/__init__.py
index cb7e00ba34..5de8b54bcf 100644
--- a/dts/framework/config/__init__.py
+++ b/dts/framework/config/__init__.py
@@ -324,6 +324,3 @@ def load_config() -> Configuration:
  config: dict[str, Any] = warlock.model_factory(schema, 
name="_Config")(config_data)
  config_obj: Configuration = Configuration.from_dict(dict(config))
  return config_obj
-
-
-CONFIGURATION = load_config()
diff --git a/dts/framework/dts.py b/dts/framework/dts.py
index f773f0c38d..925a212210 100644
--- a/dts/framework/dts.py
+++ b/dts/framework/dts.py
@@ -3,22 +3,23 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022-2023 University of New Hampshire
  
+import logging

  import sys
  
  from .config import (

-CONFIGURATION,
  BuildTargetConfiguration,
  ExecutionConfiguration,
  TestSuiteConfig,
+load_config,
  )
  from .exception import BlockingTestSuiteError
  f

[PATCH v2 0/4] hash: add SVE support for bulk key lookup

2023-10-23 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

Yoan Picchi (4):
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap   |   2 +
 app/test/test_hash.c   |  99 ++
 lib/hash/rte_cuckoo_hash.c | 264 +
 lib/hash/rte_cuckoo_hash.h |   1 +
 4 files changed, 287 insertions(+), 79 deletions(-)

-- 
2.25.1



[PATCH v2 1/4] hash: pack the hitmask for hash in bulk lookup

2023-10-23 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.

Signed-off-by: Yoan Picchi 
---
 .mailmap   |   2 +
 lib/hash/rte_cuckoo_hash.c | 118 ++---
 2 files changed, 86 insertions(+), 34 deletions(-)

diff --git a/.mailmap b/.mailmap
index 3f5bab26a8..b9c49aa7f6 100644
--- a/.mailmap
+++ b/.mailmap
@@ -485,6 +485,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
@@ -1602,6 +1603,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 19b23f2a97..2aa96eb862 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1850,8 +1850,50 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 
 }
 
+#if defined(__ARM_NEON)
+
+static inline void
+compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
+   const struct rte_hash_bucket *prim_bkt,
+   const struct rte_hash_bucket *sec_bkt,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   unsigned int i;
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   }
+   break;
+   default:
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *prim_hash_matches |=
+   ((sig == prim_bkt->sig_current[i]) << i);
+   *sec_hash_matches |=
+   ((sig == sec_bkt->sig_current[i]) << i);
+   }
+   }
+}
+
+#else
+
 static inline void
-compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
+compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
const struct rte_hash_bucket *prim_bkt,
const struct rte_hash_bucket *sec_bkt,
uint16_t sig,
@@ -1878,25 +1920,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
/* Extract the even-index bits only */
*sec_hash_matches &= 0x;
break;
-#elif defined(__ARM_NEON)
-   case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
-
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
-   /* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
-   /* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
-   }
-   break;
-#endif
+#endif /* defined(__SSE2__) */
default:
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
*prim_hash_matches |=
@@ -1907,6 +1931,8 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
}
 }
 
+#endif /* defined(__ARM_NEON) */
+
 static inline void
 __bulk_lookup_l(const struct rte_hash *h, const void **keys,
const struct rte_hash_bucket **primary_bkt,
@@ -1921,18 +1947,30 @@ __bulk_lookup_l(const struct rte_hash *h, const void 
**keys,
uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
struct rte_hash_bucket *cur_bkt, *next_bkt;
 
+#if defined(__ARM_NEON)
+ 

[PATCH v2 2/4] hash: optimize compare signature for NEON

2023-10-23 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
---
 lib/hash/rte_cuckoo_hash.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 2aa96eb862..a4b907c45c 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1864,19 +1864,17 @@ compare_signatures_dense(uint32_t *prim_hash_matches, 
uint32_t *sec_hash_matches
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16x8_t vmat, x;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
+   x = vandq_u16(vmat, mask);
*prim_hash_matches = (uint32_t)(vaddvq_u16(x));
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
+   x = vandq_u16(vmat, mask);
*sec_hash_matches = (uint32_t)(vaddvq_u16(x));
}
break;
-- 
2.25.1



[PATCH v2 3/4] test/hash: check bulk lookup of keys after collision

2023-10-23 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
---
 app/test/test_hash.c | 99 ++--
 1 file changed, 76 insertions(+), 23 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index d586878a22..b6e22c5ecc 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | 3 << 16;
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+   RETURN_IF_ERROR(pos[KEY_PER_BUCKET] < 0,
+   "failed to add key (pos[%d]=%d)", KEY_PER_BUCKET, 
pos[KEY_PER_BUCKET]);
+   expected_pos[KEY_PER_BUCKET] = pos[KEY_PER_BUCKET];
 
/* Lookup */
-   for (i = 0; i < 5; i++) {
+   for (i = 0; i < KEY_PER_BUCKET+1; i++) {
pos[i] = rte_hash

[PATCH v2 4/4] hash: add SVE support for bulk key lookup

2023-10-23 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- Added Defines in code for SVE code support.
- Optimise NEON code
- New SVE code is ~3% slower than optimized NEON for N2 processor.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
---
 lib/hash/rte_cuckoo_hash.c | 196 -
 lib/hash/rte_cuckoo_hash.h |   1 +
 2 files changed, 151 insertions(+), 46 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index a4b907c45c..cda39d1441 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -435,8 +435,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
+   }
else
 #endif
h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
@@ -1853,37 +1856,103 @@ rte_hash_free_key_with_position(const struct rte_hash 
*h,
 #if defined(__ARM_NEON)
 
 static inline void
-compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
-   const struct rte_hash_bucket *prim_bkt,
-   const struct rte_hash_bucket *sec_bkt,
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
uint16_t sig,
enum rte_hash_sig_compare_function sig_cmp_fn)
 {
unsigned int i;
 
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept the whole hitmask if 
it is dense");
+
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
+#if defined(__ARM_NEON) && RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, x;
+   uint16x8_t vmat, hit1, hit2;
const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
+
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
+   }
+   break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparison at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred,
+   primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, sv_matches, 
shift);
+ 

Re: [RFC PATCH v4 1/4] dts: code adjustments for sphinx

2023-10-23 Thread Yoan Picchi

On 10/23/23 07:44, Juraj Linkeš wrote:





My only nitpick comment would be on the name of the file common.py that
only contain the MesonArgs class. Looks good otherwise


Could you elaborate a bit more, Yoan? The common.py module is supposed
to be extended with code common to all other modules in the
testbed_model package. Right now we only have MesonArgs which fits in
common.py, but we could also move something else into common.py. We
also could rename common.py to something else, but then the above
purpose would not be clear.

I'm finishing the docstrings soon so expect a new version where things
like these will be clearer. :-)


My issue with the name is that it isn't clear what is the purpose of 
this file. It only tell to some extend how it is used.

If we start adding more things in this file, then I see two options:
	- Either this is related to the current class, and thus the file could 
be named meson_arg or something along those lines.
	- Or it is unrelated to the current class, and we end up with a file 
coalescing random bits of code, which is usually a bit dirty in OOP.


Like I said, it's a bit of a nitpick, but given it's an RFC I hope 
you'll give it a thought in the next version.


Re: [RFC PATCH v4 1/4] dts: code adjustments for sphinx

2023-10-24 Thread Yoan Picchi

On 10/24/23 07:39, Juraj Linkeš wrote:

On Mon, Oct 23, 2023 at 1:52 PM Yoan Picchi  wrote:


On 10/23/23 07:44, Juraj Linkeš wrote:





My only nitpick comment would be on the name of the file common.py that
only contain the MesonArgs class. Looks good otherwise


Could you elaborate a bit more, Yoan? The common.py module is supposed
to be extended with code common to all other modules in the
testbed_model package. Right now we only have MesonArgs which fits in
common.py, but we could also move something else into common.py. We
also could rename common.py to something else, but then the above
purpose would not be clear.

I'm finishing the docstrings soon so expect a new version where things
like these will be clearer. :-)


My issue with the name is that it isn't clear what is the purpose of
this file. It only tell to some extend how it is used.


Well, the name suggests it's code that's common to other modules, as
in code that other modules use, just like the MesonArgs class, which
is used in three different modules. I've chosen common.py as that's
what some of the DPDK libs (such as EAL) seem to be using for this
purpose. Maybe there's a better name though or we could move the class
elsewhere.


If we start adding more things in this file, then I see two options:
 - Either this is related to the current class, and thus the file could
be named meson_arg or something along those lines.
 - Or it is unrelated to the current class, and we end up with a file
coalescing random bits of code, which is usually a bit dirty in OOP.



It's code that's reused in multiple places, I'm not sure whether that
qualifies as random bits of code. It could be in os_session.py (as
that would work import-wise), but that's not a good place to put it,
as the logic is actually utilized in sut_node.py. But putting it into
sut_node.py doesn't work because of imports. Maybe we could just put
it into utils.py in the framework dir, which is a very similar file,
if not the same. My original thoughts were to have a file with common
code in each package (if needed), depending on where the code is used
(package level-wise), but it looks like we can just have this sort of
common utilities on the top level.


Like I said, it's a bit of a nitpick, but given it's an RFC I hope
you'll give it a thought in the next version.


I thought a lot about this before submitting this RFC, but I wanted
someone to have a look at this exact thing - whether the common.py
file makes sense and what is the better name, common.py or utils.py
(which is why I have both in this patch). I'll move the MesonArgs
class to the top level utils.py and remove the common.py file as that
makes the most sense to me now.

If you have any recommendations we may be able to make this even better.


I didn't meant to imply you did not think a lot about it, sorry if it 
came that way.
I prefer the idea of utils.py to a common.py, be it at package level or 
above. There might also be the option of __init__.py but I'm not sure 
about it.
That being said, I'm relatively new to dpdk and didn't know common.py 
was a common thing in EAL so I'll leave it up to you.


Re: [RFC PATCH v4 3/4] dts: add doc generation

2023-10-26 Thread Yoan Picchi

On 8/31/23 11:04, Juraj Linkeš wrote:

The tool used to generate developer docs is sphinx, which is already
used in DPDK. The configuration is kept the same to preserve the style.

Sphinx generates the documentation from Python docstrings. The docstring
format most suitable for DTS seems to be the Google format [0] which
requires the sphinx.ext.napoleon extension.

There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx import the
   code.
* Also the same Python packages as DTS, for the same reason.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš 
---
  buildtools/call-sphinx-build.py | 29 ---
  doc/api/meson.build |  1 +
  doc/guides/conf.py  | 32 +
  doc/guides/meson.build  |  1 +
  doc/guides/tools/dts.rst| 29 +++
  dts/doc/doc-index.rst   | 17 +++
  dts/doc/meson.build | 50 +
  dts/meson.build | 16 +++
  meson.build |  1 +
  9 files changed, 161 insertions(+), 15 deletions(-)
  create mode 100644 dts/doc/doc-index.rst
  create mode 100644 dts/doc/meson.build
  create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 39a60d09fa..c2f3acfb1d 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,37 +3,46 @@
  # Copyright(c) 2019 Intel Corporation
  #
  
+import argparse

  import sys
  import os
  from os.path import join
  from subprocess import run, PIPE, STDOUT
  from packaging.version import Version
  
-# assign parameters to variables

-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default='.')
+args, extra_args = parser.parse_known_args()
  
  # set the version in environment for sphinx to pick up

-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+os.environ['DTS_ROOT'] = args.dts_root
  
  # for sphinx version >= 1.7 add parallelism using "-j auto"

-ver = run([sphinx, '--version'], stdout=PIPE,
+ver = run([args.sphinx, '--version'], stdout=PIPE,
stderr=STDOUT).stdout.decode().split()[-1]
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
  if Version(ver) >= Version('1.7'):
  sphinx_cmd += ['-j', 'auto']
  
  # find all the files sphinx will process so we can write them as dependencies

  srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
  srcfiles.extend([join(root, f) for f in files])
  
  # run sphinx, putting the html output in a "html" directory

-with open(join(dst, 'sphinx_html.out'), 'w') as out:
-process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
-  stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+process = run(
+sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+stdout=out
+)
  
  # create a gcc format .d file giving all the dependencies of this doc build

-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
  d.write('html: ' + ' '.join(srcfiles) + '\n')
  
  sys.exit(process.returncode)

diff --git a/doc/api/meson.build b/doc/api/meson.build
index 2876a78a7e..1f0c725a94 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
  # SPDX-License-Identifier: BSD-3-Clause
  # Copyright(c) 2018 Luca Boccassi 
  
+doc_api_build_dir = meson.current_build_dir()

  doxygen = find_program('doxygen', required: get_option('enable_docs'))
  
  if not doxygen.found()

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..737e5a5688 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
  from sphinx import __version__ as sphinx_version
  from os import listdir
  from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
  from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
  
  import configparser
  
@@ -24,6 +23,29 @@

file=stderr)
  pass
  
+extensions = ['sphinx.ext.napoleon']

+
+# Python docstring options
+autodoc_default_options = {
+'members': True,
+'member-order': 'bysource',
+'show-inheritance': True,
+}
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_use_ivar = True
+napoleon_use_rtype = False
+add_module_names = False
+toc_object_entries = False
+
+# Sidebar config
+html_theme_options = {
+'collapse_navigation': False,
+'navigation_dept

[PATCH v4 0/4] hash: add SVE support for bulk key lookup

2024-02-26 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

*** BLURB HERE ***

Yoan Picchi (4):
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap   |   2 +
 app/test/test_hash.c   |  99 ++
 lib/hash/rte_cuckoo_hash.c | 264 +
 lib/hash/rte_cuckoo_hash.h |   1 +
 4 files changed, 287 insertions(+), 79 deletions(-)

-- 
2.25.1



[PATCH v4 1/4] hash: pack the hitmask for hash in bulk lookup

2024-02-26 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap   |   2 +
 lib/hash/rte_cuckoo_hash.c | 118 ++---
 2 files changed, 86 insertions(+), 34 deletions(-)

diff --git a/.mailmap b/.mailmap
index 12d2875641..60500bbe36 100644
--- a/.mailmap
+++ b/.mailmap
@@ -492,6 +492,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
@@ -1625,6 +1626,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 9cf94645f6..0550165584 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1857,8 +1857,50 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 
 }
 
+#if defined(__ARM_NEON)
+
+static inline void
+compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
+   const struct rte_hash_bucket *prim_bkt,
+   const struct rte_hash_bucket *sec_bkt,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   unsigned int i;
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   }
+   break;
+   default:
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *prim_hash_matches |=
+   ((sig == prim_bkt->sig_current[i]) << i);
+   *sec_hash_matches |=
+   ((sig == sec_bkt->sig_current[i]) << i);
+   }
+   }
+}
+
+#else
+
 static inline void
-compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
+compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
const struct rte_hash_bucket *prim_bkt,
const struct rte_hash_bucket *sec_bkt,
uint16_t sig,
@@ -1885,25 +1927,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
/* Extract the even-index bits only */
*sec_hash_matches &= 0x;
break;
-#elif defined(__ARM_NEON)
-   case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
-
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
-   /* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
-   /* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
-   }
-   break;
-#endif
+#endif /* defined(__SSE2__) */
default:
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
*prim_hash_matches |=
@@ -1914,6 +1938,8 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
}
 }
 
+#endif /* defined(__ARM_NEON) */
+
 static inline void
 __bulk_lookup_l(const struct rte_hash *h, const void **keys,
const struct rte_hash_bucket **primary_bkt,
@@ -1928,18 +1954,30 @@ __bulk_lookup_l(const struct rte_hash *h, const void 
**keys,
uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
struct rte_hash_

[PATCH v4 2/4] hash: optimize compare signature for NEON

2024-02-26 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/rte_cuckoo_hash.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 0550165584..a07dd3a28d 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1871,19 +1871,17 @@ compare_signatures_dense(uint32_t *prim_hash_matches, 
uint32_t *sec_hash_matches
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16x8_t vmat, x;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
+   x = vandq_u16(vmat, mask);
*prim_hash_matches = (uint32_t)(vaddvq_u16(x));
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
+   x = vandq_u16(vmat, mask);
*sec_hash_matches = (uint32_t)(vaddvq_u16(x));
}
break;
-- 
2.25.1



[PATCH v4 3/4] test/hash: check bulk lookup of keys after collision

2024-02-26 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 app/test/test_hash.c | 99 ++--
 1 file changed, 76 insertions(+), 23 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index d586878a22..c4e7f8190e 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | 3 << 16;
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+   RETURN_IF_ERROR(pos[KEY_PER_BUCKET] < 0,
+   "failed to add key (pos[%d]=%d)", KEY_PER_BUCKET, 
pos[KEY_PER_BUCKET]);
+   expected_pos[KEY_PER_BUCKET] = pos[KEY_PER_BUCKET];
 
/* Lookup */
-   for (i = 0; i < 5; i++) {
+   for (i = 0; i < KEY_PER

[PATCH v4 4/4] hash: add SVE support for bulk key lookup

2024-02-26 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- Added Defines in code for SVE code support.
- Optimise NEON code
- New SVE code is ~5% slower than optimized NEON for N2 processor.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 

Change-Id: Ief614e2f90fd85484195b8116bfbf56d6dfec71e
---
 lib/hash/rte_cuckoo_hash.c | 196 -
 lib/hash/rte_cuckoo_hash.h |   1 +
 2 files changed, 151 insertions(+), 46 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index a07dd3a28d..231d6d6ded 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -442,8 +442,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
+   }
else
 #endif
h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
@@ -1860,37 +1863,103 @@ rte_hash_free_key_with_position(const struct rte_hash 
*h,
 #if defined(__ARM_NEON)
 
 static inline void
-compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
-   const struct rte_hash_bucket *prim_bkt,
-   const struct rte_hash_bucket *sec_bkt,
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
uint16_t sig,
enum rte_hash_sig_compare_function sig_cmp_fn)
 {
unsigned int i;
 
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept the whole hitmask if 
it is dense");
+
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
+#if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, x;
+   uint16x8_t vmat, hit1, hit2;
const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
+
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
+   }
+   break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparison at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred,
+   primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+ 

[PATCH v4 0/4] hash: add SVE support for bulk key lookup

2024-02-26 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

Yoan Picchi (4):
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap   |   2 +
 app/test/test_hash.c   |  99 ++
 lib/hash/rte_cuckoo_hash.c | 264 +
 lib/hash/rte_cuckoo_hash.h |   1 +
 4 files changed, 287 insertions(+), 79 deletions(-)

-- 
2.25.1



[PATCH v4 1/4] hash: pack the hitmask for hash in bulk lookup

2024-02-26 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap   |   2 +
 lib/hash/rte_cuckoo_hash.c | 118 ++---
 2 files changed, 86 insertions(+), 34 deletions(-)

diff --git a/.mailmap b/.mailmap
index 12d2875641..60500bbe36 100644
--- a/.mailmap
+++ b/.mailmap
@@ -492,6 +492,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
@@ -1625,6 +1626,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 9cf94645f6..0550165584 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1857,8 +1857,50 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 
 }
 
+#if defined(__ARM_NEON)
+
+static inline void
+compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
+   const struct rte_hash_bucket *prim_bkt,
+   const struct rte_hash_bucket *sec_bkt,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   unsigned int i;
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   }
+   break;
+   default:
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *prim_hash_matches |=
+   ((sig == prim_bkt->sig_current[i]) << i);
+   *sec_hash_matches |=
+   ((sig == sec_bkt->sig_current[i]) << i);
+   }
+   }
+}
+
+#else
+
 static inline void
-compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
+compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
const struct rte_hash_bucket *prim_bkt,
const struct rte_hash_bucket *sec_bkt,
uint16_t sig,
@@ -1885,25 +1927,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
/* Extract the even-index bits only */
*sec_hash_matches &= 0x;
break;
-#elif defined(__ARM_NEON)
-   case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
-
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
-   /* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
-   /* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
-   }
-   break;
-#endif
+#endif /* defined(__SSE2__) */
default:
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
*prim_hash_matches |=
@@ -1914,6 +1938,8 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
}
 }
 
+#endif /* defined(__ARM_NEON) */
+
 static inline void
 __bulk_lookup_l(const struct rte_hash *h, const void **keys,
const struct rte_hash_bucket **primary_bkt,
@@ -1928,18 +1954,30 @@ __bulk_lookup_l(const struct rte_hash *h, const void 
**keys,
uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
struct rte_hash_

[PATCH v4 2/4] hash: optimize compare signature for NEON

2024-02-26 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/rte_cuckoo_hash.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 0550165584..a07dd3a28d 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1871,19 +1871,17 @@ compare_signatures_dense(uint32_t *prim_hash_matches, 
uint32_t *sec_hash_matches
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16x8_t vmat, x;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
+   x = vandq_u16(vmat, mask);
*prim_hash_matches = (uint32_t)(vaddvq_u16(x));
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
+   x = vandq_u16(vmat, mask);
*sec_hash_matches = (uint32_t)(vaddvq_u16(x));
}
break;
-- 
2.25.1



[PATCH v4 4/4] hash: add SVE support for bulk key lookup

2024-02-26 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- Added Defines in code for SVE code support.
- Optimise NEON code
- New SVE code is ~5% slower than optimized NEON for N2 processor.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 

Change-Id: Ief614e2f90fd85484195b8116bfbf56d6dfec71e
---
 lib/hash/rte_cuckoo_hash.c | 196 -
 lib/hash/rte_cuckoo_hash.h |   1 +
 2 files changed, 151 insertions(+), 46 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index a07dd3a28d..231d6d6ded 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -442,8 +442,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
+   }
else
 #endif
h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
@@ -1860,37 +1863,103 @@ rte_hash_free_key_with_position(const struct rte_hash 
*h,
 #if defined(__ARM_NEON)
 
 static inline void
-compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
-   const struct rte_hash_bucket *prim_bkt,
-   const struct rte_hash_bucket *sec_bkt,
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
uint16_t sig,
enum rte_hash_sig_compare_function sig_cmp_fn)
 {
unsigned int i;
 
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept the whole hitmask if 
it is dense");
+
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
+#if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, x;
+   uint16x8_t vmat, hit1, hit2;
const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
+
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
+   }
+   break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparison at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred,
+   primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+ 

[PATCH v4 3/4] test/hash: check bulk lookup of keys after collision

2024-02-26 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 app/test/test_hash.c | 99 ++--
 1 file changed, 76 insertions(+), 23 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index d586878a22..c4e7f8190e 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | 3 << 16;
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+   RETURN_IF_ERROR(pos[KEY_PER_BUCKET] < 0,
+   "failed to add key (pos[%d]=%d)", KEY_PER_BUCKET, 
pos[KEY_PER_BUCKET]);
+   expected_pos[KEY_PER_BUCKET] = pos[KEY_PER_BUCKET];
 
/* Lookup */
-   for (i = 0; i < 5; i++) {
+   for (i = 0; i < KEY_PER

[PATCH v5 2/4] hash: optimize compare signature for NEON

2024-02-27 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/rte_cuckoo_hash.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 0550165584..a07dd3a28d 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1871,19 +1871,17 @@ compare_signatures_dense(uint32_t *prim_hash_matches, 
uint32_t *sec_hash_matches
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16x8_t vmat, x;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
+   x = vandq_u16(vmat, mask);
*prim_hash_matches = (uint32_t)(vaddvq_u16(x));
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
+   x = vandq_u16(vmat, mask);
*sec_hash_matches = (uint32_t)(vaddvq_u16(x));
}
break;
-- 
2.34.1



[PATCH v5 0/4] hash: add SVE support for bulk key lookup

2024-02-27 Thread Yoan Picchi
From: Yoan Picchi 

This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

V4->V5:
  Commit message

Yoan Picchi (4):
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap   |   2 +
 app/test/test_hash.c   |  99 ++
 lib/hash/rte_cuckoo_hash.c | 264 +
 lib/hash/rte_cuckoo_hash.h |   1 +
 4 files changed, 287 insertions(+), 79 deletions(-)

-- 
2.34.1



[PATCH v5 3/4] test/hash: check bulk lookup of keys after collision

2024-02-27 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 app/test/test_hash.c | 99 ++--
 1 file changed, 76 insertions(+), 23 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index d586878a22..c4e7f8190e 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | 3 << 16;
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+   RETURN_IF_ERROR(pos[KEY_PER_BUCKET] < 0,
+   "failed to add key (pos[%d]=%d)", KEY_PER_BUCKET, 
pos[KEY_PER_BUCKET]);
+   expected_pos[KEY_PER_BUCKET] = pos[KEY_PER_BUCKET];
 
/* Lookup */
-   for (i = 0; i < 5; i++) {
+   for (i = 0; i < KEY_PER

[PATCH v5 1/4] hash: pack the hitmask for hash in bulk lookup

2024-02-27 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap   |   2 +
 lib/hash/rte_cuckoo_hash.c | 118 ++---
 2 files changed, 86 insertions(+), 34 deletions(-)

diff --git a/.mailmap b/.mailmap
index 12d2875641..60500bbe36 100644
--- a/.mailmap
+++ b/.mailmap
@@ -492,6 +492,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
@@ -1625,6 +1626,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 9cf94645f6..0550165584 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1857,8 +1857,50 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 
 }
 
+#if defined(__ARM_NEON)
+
+static inline void
+compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
+   const struct rte_hash_bucket *prim_bkt,
+   const struct rte_hash_bucket *sec_bkt,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   unsigned int i;
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   }
+   break;
+   default:
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *prim_hash_matches |=
+   ((sig == prim_bkt->sig_current[i]) << i);
+   *sec_hash_matches |=
+   ((sig == sec_bkt->sig_current[i]) << i);
+   }
+   }
+}
+
+#else
+
 static inline void
-compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
+compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
const struct rte_hash_bucket *prim_bkt,
const struct rte_hash_bucket *sec_bkt,
uint16_t sig,
@@ -1885,25 +1927,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
/* Extract the even-index bits only */
*sec_hash_matches &= 0x;
break;
-#elif defined(__ARM_NEON)
-   case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
-
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
-   /* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
-   /* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
-   }
-   break;
-#endif
+#endif /* defined(__SSE2__) */
default:
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
*prim_hash_matches |=
@@ -1914,6 +1938,8 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
}
 }
 
+#endif /* defined(__ARM_NEON) */
+
 static inline void
 __bulk_lookup_l(const struct rte_hash *h, const void **keys,
const struct rte_hash_bucket **primary_bkt,
@@ -1928,18 +1954,30 @@ __bulk_lookup_l(const struct rte_hash *h, const void 
**keys,
uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
struct rte_hash_

[PATCH v5 4/4] hash: add SVE support for bulk key lookup

2024-02-27 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- Added Defines in code for SVE code support.
- Optimise NEON code
- New SVE code is ~5% slower than optimized NEON for N2 processor.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
 lib/hash/rte_cuckoo_hash.c | 196 -
 lib/hash/rte_cuckoo_hash.h |   1 +
 2 files changed, 151 insertions(+), 46 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index a07dd3a28d..231d6d6ded 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -442,8 +442,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
+   }
else
 #endif
h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
@@ -1860,37 +1863,103 @@ rte_hash_free_key_with_position(const struct rte_hash 
*h,
 #if defined(__ARM_NEON)
 
 static inline void
-compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
-   const struct rte_hash_bucket *prim_bkt,
-   const struct rte_hash_bucket *sec_bkt,
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
uint16_t sig,
enum rte_hash_sig_compare_function sig_cmp_fn)
 {
unsigned int i;
 
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept the whole hitmask if 
it is dense");
+
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
+#if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, x;
+   uint16x8_t vmat, hit1, hit2;
const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
+
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
+   }
+   break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparison at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred,
+   primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, sv_matches, 
shift);
+   *h

Re: [PATCH v5 4/4] hash: add SVE support for bulk key lookup

2024-02-28 Thread Yoan Picchi

On 2/28/24 10:56, Konstantin Ananyev wrote:




- Implemented SVE code for comparing signatures in bulk lookup.
- Added Defines in code for SVE code support.
- Optimise NEON code
- New SVE code is ~5% slower than optimized NEON for N2 processor.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
  lib/hash/rte_cuckoo_hash.c | 196 -
  lib/hash/rte_cuckoo_hash.h |   1 +
  2 files changed, 151 insertions(+), 46 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index a07dd3a28d..231d6d6ded 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -442,8 +442,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
  #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
+   }
else
  #endif
h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
@@ -1860,37 +1863,103 @@ rte_hash_free_key_with_position(const struct rte_hash 
*h,
  #if defined(__ARM_NEON)

  static inline void
-compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
-   const struct rte_hash_bucket *prim_bkt,
-   const struct rte_hash_bucket *sec_bkt,
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
uint16_t sig,
enum rte_hash_sig_compare_function sig_cmp_fn)
  {
unsigned int i;

+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept the whole hitmask if it 
is dense");
+
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {


Can I ask to move arch specific comparison code into some arch-specific headers 
or so?
It is getting really hard to read and understand the generic code with all 
these ifdefs and arch specific instructions...



I can easily enough move the compare_signatures into an arm/x86 
directory, and have a default version in the code.
The problem would be for bulk lookup. The function is already duplicated 
 2 times (the l and lf version). If I remove the #ifdefs, I'll need to 
duplicate them again into 4 nearly identical versions (dense and 
sparse). The only third options I see would be some preprocessor macro 
to patch the function, but that looks even dirtier to me.
I think duplicating the code would be bad, but I can do it if you want. 
Unless you have a better solution?



+#if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, x;
+   uint16x8_t vmat, hit1, hit2;
const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);

/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
+
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
+   }
+   break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && RTE_HASH_BUCKET_ENTRIES 
<= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);

[PATCH v9 0/4] hash: add SVE support for bulk key lookup

2024-04-30 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

V4->V5:
  Commit message

V5->V6:
  Move the arch-specific code into new arch-specific files
  Isolate the data struture refactor from adding SVE

V6->V7:
  Commit message
  Moved RTE_HASH_COMPARE_SVE to the last commit of the chain

V7->V8:
  Commit message
  Typos and missing spaces

V8->V9:
  Use __rte_unused instead of (void)
  Fix an indentation mistake

Yoan Picchi (4):
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap  |   2 +
 app/test/test_hash.c  |  99 ---
 lib/hash/arch/arm/compare_signatures.h| 117 +
 lib/hash/arch/common/compare_signatures.h |  37 
 lib/hash/arch/x86/compare_signatures.h|  53 ++
 lib/hash/rte_cuckoo_hash.c| 199 --
 lib/hash/rte_cuckoo_hash.h|   1 +
 7 files changed, 393 insertions(+), 115 deletions(-)
 create mode 100644 lib/hash/arch/arm/compare_signatures.h
 create mode 100644 lib/hash/arch/common/compare_signatures.h
 create mode 100644 lib/hash/arch/x86/compare_signatures.h

-- 
2.25.1



[PATCH v9 1/4] hash: pack the hitmask for hash in bulk lookup

2024-04-30 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.
In addition, the new dense hitmask interweave the primary
and secondary matches which allow a better cache usage and
enable future improvements for the SIMD implementations
The default non SIMD path now use this dense mask.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap  |   2 +
 lib/hash/arch/arm/compare_signatures.h|  61 +++
 lib/hash/arch/common/compare_signatures.h |  37 +
 lib/hash/arch/x86/compare_signatures.h|  53 ++
 lib/hash/rte_cuckoo_hash.c| 192 --
 5 files changed, 254 insertions(+), 91 deletions(-)
 create mode 100644 lib/hash/arch/arm/compare_signatures.h
 create mode 100644 lib/hash/arch/common/compare_signatures.h
 create mode 100644 lib/hash/arch/x86/compare_signatures.h

diff --git a/.mailmap b/.mailmap
index 66ebc20666..00b50414d3 100644
--- a/.mailmap
+++ b/.mailmap
@@ -494,6 +494,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
@@ -1633,6 +1634,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
new file mode 100644
index 00..46d15da89f
--- /dev/null
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * Arm's version uses a densely packed hitmask buffer:
+ * Every bit is in use.
+ */
+
+#include 
+#include 
+#include 
+#include "rte_cuckoo_hash.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+
+   static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 
8),
+ "hitmask_buffer must be wide enough to fit a 
dense hitmask");
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+#if RTE_HASH_BUCKET_ENTRIES <= 8
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16_t low, high;
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   low = (uint16_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   high = (uint16_t)(vaddvq_u16(x));
+   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+
+   }
+   break;
+#endif
+   default:
+   for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *hitmask_buffer |=
+   (sig == prim_bucket_sigs[i]) << i;
+   *hitmask_buffer |=
+   ((sig == sec_bucket_sigs[i]) << i) << 
RTE_HASH_BUCKET_ENTRIES;
+   }
+   }
+}
diff --git a/lib/hash/arch/common/compare_signatures.h 
b/lib/hash/arch/common/compare_signatures.h
new file mode 100644
index 00..f43b367005
--- /dev/null
+++ b/lib/hash/arch/common/compare_signatures.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * The generic version could use either a dense or sparsely packed hitmask 
buffer,
+ * but the dense one is slightly faster.
+ */
+
+#include 
+#include 
+#include 
+#include "rte_cuckoo_hash.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   __rte_unused enum rte_hash_sig_compare_function 
sig_cmp_fn)
+{
+
+   static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 
8),
+

[PATCH v9 2/4] hash: optimize compare signature for NEON

2024-04-30 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/arch/arm/compare_signatures.h | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
index 46d15da89f..72bd171484 100644
--- a/lib/hash/arch/arm/compare_signatures.h
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -30,23 +30,21 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
switch (sig_cmp_fn) {
 #if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
-   uint16_t low, high;
+   uint16x8_t vmat, hit1, hit2;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   low = (uint16_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   high = (uint16_t)(vaddvq_u16(x));
-   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
 
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
}
break;
 #endif
-- 
2.25.1



[PATCH v9 3/4] test/hash: check bulk lookup of keys after collision

2024-04-30 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 app/test/test_hash.c | 99 ++--
 1 file changed, 76 insertions(+), 23 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index d586878a22..4f871b3499 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | (3 << 16);
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+   RETURN_IF_ERROR(pos[KEY_PER_BUCKET] < 0,
+   "failed to add key (pos[%d]=%d)", KEY_PER_BUCKET, 
pos[KEY_PER_BUCKET]);
+   expected_pos[KEY_PER_BUCKET] = pos[KEY_PER_BUCKET];
 
/* Lookup */
-   for (i = 0; i < 5; i++) {
+   for (i = 0; i <

[PATCH v9 4/4] hash: add SVE support for bulk key lookup

2024-04-30 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- New SVE code is ~5% slower than optimized NEON for N2 processor for
128b vectors.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
 lib/hash/arch/arm/compare_signatures.h | 58 ++
 lib/hash/rte_cuckoo_hash.c |  7 +++-
 lib/hash/rte_cuckoo_hash.h |  1 +
 3 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
index 72bd171484..b4b4cf04e9 100644
--- a/lib/hash/arch/arm/compare_signatures.h
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
*hitmask_buffer = vaddvq_u16(hit2);
}
break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparisons at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred,
+   primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, sv_matches, 
shift);
+   *hitmask_buffer = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   } else {
+   do {
+   pred = svwhilelt_b16(i, 
RTE_HASH_BUCKET_ENTRIES);
+   uint16_t lower_half = 0;
+   uint16_t upper_half = 0;
+   /* Compare all signatures in the primary bucket 
*/
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &prim_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   lower_half = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   /* Compare all signatures in the secondary 
bucket */
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &sec_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   upper_half = svorv_u16(svptrue_b16(), 
sv_matches)
+   << RTE_HASH_BUCKET_ENTRIES;
+   }
+   hitmask_buffer[i / 8] = upper_half | lower_half;
+   i += vl;
+   } while (i < RTE_HASH_BUCKET_ENTRIES);
+   }
+   }
+   break;
 #endif
default:
for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 0697743cdf..75f555ba2c 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -450,8 +450,13 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+#if defined(RTE_HAS_SVE_ACLE)
+   if (rte_cpu_get_flag_enabled(RTE

[PATCH v10 0/4] hash: add SVE support for bulk key lookup

2024-07-03 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

V4->V5:
  Commit message

V5->V6:
  Move the arch-specific code into new arch-specific files
  Isolate the data struture refactor from adding SVE

V6->V7:
  Commit message
  Moved RTE_HASH_COMPARE_SVE to the last commit of the chain

V7->V8:
  Commit message
  Typos and missing spaces

V8->V9:
  Use __rte_unused instead of (void)
  Fix an indentation mistake

V9->V10:
  Fix more formating and indentation
  Move the new compare signature file directly in hash instead of being
in a new subdir
  Re-order includes
  Remove duplicated static check
  Move rte_hash_sig_compare_function's definition into a private header

Yoan Picchi (4):
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap  |   2 +
 app/test/test_hash.c  |  99 ---
 lib/hash/compare_signatures_arm_pvt.h | 117 +
 lib/hash/compare_signatures_generic_pvt.h |  37 
 lib/hash/compare_signatures_x86_pvt.h |  49 ++
 lib/hash/hash_sig_cmp_func_pvt.h  |  20 +++
 lib/hash/rte_cuckoo_hash.c| 197 --
 lib/hash/rte_cuckoo_hash.h|  10 +-
 8 files changed, 407 insertions(+), 124 deletions(-)
 create mode 100644 lib/hash/compare_signatures_arm_pvt.h
 create mode 100644 lib/hash/compare_signatures_generic_pvt.h
 create mode 100644 lib/hash/compare_signatures_x86_pvt.h
 create mode 100644 lib/hash/hash_sig_cmp_func_pvt.h

-- 
2.25.1



[PATCH v10 2/4] hash: optimize compare signature for NEON

2024-07-03 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/compare_signatures_arm_pvt.h | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/hash/compare_signatures_arm_pvt.h 
b/lib/hash/compare_signatures_arm_pvt.h
index e83bae9912..1d5464c4ce 100644
--- a/lib/hash/compare_signatures_arm_pvt.h
+++ b/lib/hash/compare_signatures_arm_pvt.h
@@ -32,21 +32,21 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
switch (sig_cmp_fn) {
 #if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
-   uint16_t low, high;
+   uint16x8_t vmat, hit1, hit2;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   low = (uint16_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   high = (uint16_t)(vaddvq_u16(x));
-   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
 
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
}
break;
 #endif
-- 
2.25.1



[PATCH v10 3/4] test/hash: check bulk lookup of keys after collision

2024-07-03 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap |  1 +
 app/test/test_hash.c | 99 ++--
 2 files changed, 77 insertions(+), 23 deletions(-)

diff --git a/.mailmap b/.mailmap
index ec525981fe..41a8a99a7c 100644
--- a/.mailmap
+++ b/.mailmap
@@ -505,6 +505,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 24d3b547ad..ab3b37de3f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | (3 << 16);
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+

[PATCH v10 1/4] hash: pack the hitmask for hash in bulk lookup

2024-07-03 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.
In addition, the new dense hitmask interweave the primary
and secondary matches which allow a better cache usage and
enable future improvements for the SIMD implementations
The default non SIMD path now use this dense mask.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap  |   1 +
 lib/hash/compare_signatures_arm_pvt.h |  60 +++
 lib/hash/compare_signatures_generic_pvt.h |  37 +
 lib/hash/compare_signatures_x86_pvt.h |  49 ++
 lib/hash/hash_sig_cmp_func_pvt.h  |  20 +++
 lib/hash/rte_cuckoo_hash.c| 190 +++---
 lib/hash/rte_cuckoo_hash.h|  10 +-
 7 files changed, 267 insertions(+), 100 deletions(-)
 create mode 100644 lib/hash/compare_signatures_arm_pvt.h
 create mode 100644 lib/hash/compare_signatures_generic_pvt.h
 create mode 100644 lib/hash/compare_signatures_x86_pvt.h
 create mode 100644 lib/hash/hash_sig_cmp_func_pvt.h

diff --git a/.mailmap b/.mailmap
index f76037213d..ec525981fe 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1661,6 +1661,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/compare_signatures_arm_pvt.h 
b/lib/hash/compare_signatures_arm_pvt.h
new file mode 100644
index 00..e83bae9912
--- /dev/null
+++ b/lib/hash/compare_signatures_arm_pvt.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * Arm's version uses a densely packed hitmask buffer:
+ * Every bit is in use.
+ */
+
+#include 
+#include 
+#include 
+
+#include "rte_cuckoo_hash.h"
+#include "hash_sig_cmp_func_pvt.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+
+   static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 
8),
+   "hitmask_buffer must be wide enough to fit a dense hitmask");
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+#if RTE_HASH_BUCKET_ENTRIES <= 8
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16_t low, high;
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   low = (uint16_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   high = (uint16_t)(vaddvq_u16(x));
+   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+
+   }
+   break;
+#endif
+   default:
+   for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *hitmask_buffer |= (sig == prim_bucket_sigs[i]) << i;
+   *hitmask_buffer |=
+   ((sig == sec_bucket_sigs[i]) << i) << 
RTE_HASH_BUCKET_ENTRIES;
+   }
+   }
+}
diff --git a/lib/hash/compare_signatures_generic_pvt.h 
b/lib/hash/compare_signatures_generic_pvt.h
new file mode 100644
index 00..18c2f651c4
--- /dev/null
+++ b/lib/hash/compare_signatures_generic_pvt.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * The generic version could use either a dense or sparsely packed hitmask 
buffer,
+ * but the dense one is slightly faster.
+ */
+
+#include 
+#include 
+#include 
+
+#include "rte_cuckoo_hash.h"
+#include "hash_sig_cmp_func_pvt.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   __rte_unused enum rte_hash_sig_compare_function 
sig_cmp_fn)
+{
+
+   static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 
8),
+   &

[PATCH v10 4/4] hash: add SVE support for bulk key lookup

2024-07-03 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- New SVE code is ~5% slower than optimized NEON for N2 processor for
128b vectors.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
 lib/hash/compare_signatures_arm_pvt.h | 57 +++
 lib/hash/rte_cuckoo_hash.c|  7 +++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/lib/hash/compare_signatures_arm_pvt.h 
b/lib/hash/compare_signatures_arm_pvt.h
index 1d5464c4ce..efec78afb0 100644
--- a/lib/hash/compare_signatures_arm_pvt.h
+++ b/lib/hash/compare_signatures_arm_pvt.h
@@ -49,6 +49,63 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
*hitmask_buffer = vaddvq_u16(hit2);
}
break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparisons at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred, 
primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, sv_matches, 
shift);
+   *hitmask_buffer = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   } else {
+   do {
+   pred = svwhilelt_b16(i, 
RTE_HASH_BUCKET_ENTRIES);
+   uint16_t lower_half = 0;
+   uint16_t upper_half = 0;
+   /* Compare all signatures in the primary bucket 
*/
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &prim_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   lower_half = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   /* Compare all signatures in the secondary 
bucket */
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &sec_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   upper_half = svorv_u16(svptrue_b16(), 
sv_matches)
+   << RTE_HASH_BUCKET_ENTRIES;
+   }
+   hitmask_buffer[i / 8] = upper_half | lower_half;
+   i += vl;
+   } while (i < RTE_HASH_BUCKET_ENTRIES);
+   }
+   }
+   break;
 #endif
default:
for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 61cc12d83b..e5831ad146 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -452,8 +452,13 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+#if defined(RTE_HAS_SVE_ACLE)
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
+#endif
+   }
   

Re: [PATCH v10 1/4] hash: pack the hitmask for hash in bulk lookup

2024-07-05 Thread Yoan Picchi
I'll push a v11 tonight. There is a couple of comments I disagree with 
tough:


On 7/4/24 21:31, David Marchand wrote:

Hello Yoan,

On Wed, Jul 3, 2024 at 7:13 PM Yoan Picchi  wrote:


Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.
In addition, the new dense hitmask interweave the primary
and secondary matches which allow a better cache usage and
enable future improvements for the SIMD implementations
The default non SIMD path now use this dense mask.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 


This patch does too many things at the same time.
There is code movement and behavior modifications all mixed in.

As there was still no review from the lib maintainer... I am going a
bit more in depth this time.
Please split this patch to make it less hard to understand.

I can see the need for at least one patch for isolating the change on
sig_cmp_fn from the exposed API, then one patch for moving the code to
per arch headers with *no behavior change*, and one patch for
introducing/switching to "dense hitmask".

More comments below.



---
  .mailmap  |   1 +
  lib/hash/compare_signatures_arm_pvt.h |  60 +++
  lib/hash/compare_signatures_generic_pvt.h |  37 +
  lib/hash/compare_signatures_x86_pvt.h |  49 ++
  lib/hash/hash_sig_cmp_func_pvt.h  |  20 +++
  lib/hash/rte_cuckoo_hash.c| 190 +++---
  lib/hash/rte_cuckoo_hash.h|  10 +-
  7 files changed, 267 insertions(+), 100 deletions(-)
  create mode 100644 lib/hash/compare_signatures_arm_pvt.h
  create mode 100644 lib/hash/compare_signatures_generic_pvt.h
  create mode 100644 lib/hash/compare_signatures_x86_pvt.h
  create mode 100644 lib/hash/hash_sig_cmp_func_pvt.h

diff --git a/.mailmap b/.mailmap
index f76037213d..ec525981fe 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1661,6 +1661,7 @@ Yixue Wang 
  Yi Yang  
  Yi Zhang 
  Yoann Desmouceaux 
+Yoan Picchi 
  Yogesh Jangra 
  Yogev Chaimovich 
  Yongjie Gu 
diff --git a/lib/hash/compare_signatures_arm_pvt.h 
b/lib/hash/compare_signatures_arm_pvt.h
new file mode 100644
index 00..e83bae9912
--- /dev/null
+++ b/lib/hash/compare_signatures_arm_pvt.h


I guess pvt stands for private.
No need for such suffix, this header won't be exported in any case.


pvt do stand for private, yes. I had a look at the other lib and what 
they used to state a header as private. Several (rcu, ring and stack) 
use _pvt so it looks like that's might be the standard? If no, then how 
am I supposed to differentiate a public and a private header?






@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * Arm's version uses a densely packed hitmask buffer:
+ * Every bit is in use.
+ */


Please put a header guard.

#ifndef _H
#define _H


+
+#include 
+#include 
+#include 
+
+#include "rte_cuckoo_hash.h"
+#include "hash_sig_cmp_func_pvt.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+
+   static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 
8),
+   "hitmask_buffer must be wide enough to fit a dense hitmask");
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+#if RTE_HASH_BUCKET_ENTRIES <= 8
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16_t low, high;
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   low = (uint16_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   high = (uint16_t)(vaddvq_u16(x));
+   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+
+   }
+   break;
+#endif
+   default:
+   for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *hitmask_buffer |= (sig == prim_bucket_sigs[i]) << i;
+   *hitmask_buffer |=
+   

[PATCH v11 1/7] hash: make compare signature function enum private

2024-07-05 Thread Yoan Picchi
enum rte_hash_sig_compare_function is only used internally. This
patch move it out of the public ABI and into the C file.

Signed-off-by: Yoan Picchi 
---
 lib/hash/rte_cuckoo_hash.c | 10 ++
 lib/hash/rte_cuckoo_hash.h | 10 +-
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index d87aa52b5b..e1d50e7d40 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -33,6 +33,16 @@ RTE_LOG_REGISTER_DEFAULT(hash_logtype, INFO);
 
 #include "rte_cuckoo_hash.h"
 
+/* Enum used to select the implementation of the signature comparison function 
to use
+ * eg: A system supporting SVE might want to use a NEON or scalar 
implementation.
+ */
+enum rte_hash_sig_compare_function {
+   RTE_HASH_COMPARE_SCALAR = 0,
+   RTE_HASH_COMPARE_SSE,
+   RTE_HASH_COMPARE_NEON,
+   RTE_HASH_COMPARE_NUM
+};
+
 /* Mask of all flags supported by this version */
 #define RTE_HASH_EXTRA_FLAGS_MASK (RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT | \
   RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD | \
diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
index a528f1d1a0..26a992419a 100644
--- a/lib/hash/rte_cuckoo_hash.h
+++ b/lib/hash/rte_cuckoo_hash.h
@@ -134,14 +134,6 @@ struct rte_hash_key {
char key[0];
 };
 
-/* All different signature compare functions */
-enum rte_hash_sig_compare_function {
-   RTE_HASH_COMPARE_SCALAR = 0,
-   RTE_HASH_COMPARE_SSE,
-   RTE_HASH_COMPARE_NEON,
-   RTE_HASH_COMPARE_NUM
-};
-
 /** Bucket structure */
 struct __rte_cache_aligned rte_hash_bucket {
uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
@@ -199,7 +191,7 @@ struct __rte_cache_aligned rte_hash {
/**< Custom function used to compare keys. */
enum cmp_jump_table_case cmp_jump_table_idx;
/**< Indicates which compare function to use. */
-   enum rte_hash_sig_compare_function sig_cmp_fn;
+   unsigned int sig_cmp_fn;
/**< Indicates which signature compare function to use. */
uint32_t bucket_bitmask;
/**< Bitmask for getting bucket index from hash signature. */
-- 
2.34.1



[PATCH v11 0/7] hash: add SVE support for bulk key lookup

2024-07-05 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

V4->V5:
  Commit message

V5->V6:
  Move the arch-specific code into new arch-specific files
  Isolate the data struture refactor from adding SVE

V6->V7:
  Commit message
  Moved RTE_HASH_COMPARE_SVE to the last commit of the chain

V7->V8:
  Commit message
  Typos and missing spaces

V8->V9:
  Use __rte_unused instead of (void)
  Fix an indentation mistake

V9->V10:
  Fix more formating and indentation
  Move the new compare signature file directly in hash instead of being
in a new subdir
  Re-order includes
  Remove duplicated static check
  Move rte_hash_sig_compare_function's definition into a private header

V10->V11:
  Split the "pack the hitmask" commit into four commits:
Move the compare function enum out of the ABI
Move the compare function implementations into arch-specific files
Add a missing check on RTE_HASH_BUCKET_ENTRIES in case we change it
  in the future
Implement the dense hitmask
  Add missing header guards
  Move compare function enum into cuckoo_hash.c instead of its own header.

Yoan Picchi (7):
  hash: make compare signature function enum private
  hash: split compare signature into arch-specific files
  hash: add a check on hash entry max size
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap  |   2 +
 app/test/test_hash.c  |  99 ---
 lib/hash/compare_signatures_arm_pvt.h | 121 +
 lib/hash/compare_signatures_generic_pvt.h |  40 +
 lib/hash/compare_signatures_x86_pvt.h |  55 ++
 lib/hash/rte_cuckoo_hash.c| 207 --
 lib/hash/rte_cuckoo_hash.h|  10 +-
 7 files changed, 410 insertions(+), 124 deletions(-)
 create mode 100644 lib/hash/compare_signatures_arm_pvt.h
 create mode 100644 lib/hash/compare_signatures_generic_pvt.h
 create mode 100644 lib/hash/compare_signatures_x86_pvt.h

-- 
2.34.1



[PATCH v11 2/7] hash: split compare signature into arch-specific files

2024-07-05 Thread Yoan Picchi
Move the compare_signatures function into architecture-specific files
They all have the default scalar option as an option if we disable
vectorisation.

Signed-off-by: Yoan Picchi 
---
 .mailmap  |  1 +
 lib/hash/compare_signatures_arm_pvt.h | 55 +++
 lib/hash/compare_signatures_generic_pvt.h | 33 
 lib/hash/compare_signatures_x86_pvt.h | 48 +
 lib/hash/rte_cuckoo_hash.c| 65 +++
 5 files changed, 145 insertions(+), 57 deletions(-)
 create mode 100644 lib/hash/compare_signatures_arm_pvt.h
 create mode 100644 lib/hash/compare_signatures_generic_pvt.h
 create mode 100644 lib/hash/compare_signatures_x86_pvt.h

diff --git a/.mailmap b/.mailmap
index f76037213d..ec525981fe 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1661,6 +1661,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/compare_signatures_arm_pvt.h 
b/lib/hash/compare_signatures_arm_pvt.h
new file mode 100644
index 00..80b6afb7a5
--- /dev/null
+++ b/lib/hash/compare_signatures_arm_pvt.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+#ifndef _COMPARE_SIGNATURE_ARM_PVT_H_
+#define _COMPARE_SIGNATURE_ARM_PVT_H_
+
+#include 
+#include 
+#include 
+
+#include "rte_cuckoo_hash.h"
+
+static inline void
+compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
+   const struct rte_hash_bucket *prim_bkt,
+   const struct rte_hash_bucket *sec_bkt,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   unsigned int i;
+
+   /* For match mask the first bit of every two bits indicates the match */
+   switch (sig_cmp_fn) {
+#if defined(__ARM_NEON)
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
+   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
+   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   }
+   break;
+#endif
+   default:
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *prim_hash_matches |=
+   ((sig == prim_bkt->sig_current[i]) << (i << 1));
+   *sec_hash_matches |=
+   ((sig == sec_bkt->sig_current[i]) << (i << 1));
+   }
+   }
+}
+
+#endif
diff --git a/lib/hash/compare_signatures_generic_pvt.h 
b/lib/hash/compare_signatures_generic_pvt.h
new file mode 100644
index 00..43587adcef
--- /dev/null
+++ b/lib/hash/compare_signatures_generic_pvt.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+#ifndef _COMPARE_SIGNATURE_GENERIC_PVT_H_
+#define _COMPARE_SIGNATURE_GENERIC_PVT_H_
+
+#include 
+#include 
+#include 
+
+#include "rte_cuckoo_hash.h"
+
+static inline void
+compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
+   const struct rte_hash_bucket *prim_bkt,
+   const struct rte_hash_bucket *sec_bkt,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   unsigned int i;
+
+   /* For match mask the first bit of every two bits indicates the match */
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *prim_hash_matches |=
+   ((sig == prim_bkt->sig_current[i]) << (i << 1));
+   *sec_hash_matches |=
+   ((sig == sec_bkt->sig_current[i]) << (i << 1));
+   }
+}
+
+#endif
diff --git a/lib/hash/compare_signatures_x86_pvt.h 
b/lib/hash/compare_signatures_x86_pvt.h
new file mode 100644
index 00..11a82aced9
--- /dev/null
+++ b/lib/hash/compare_signatures_x86_pvt.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+#ifndef _COMPARE_SIGNATURE_

[PATCH v11 3/7] hash: add a check on hash entry max size

2024-07-05 Thread Yoan Picchi
If were to change RTE_HASH_BUCKET_ENTRIES to be over 8, it would no longer
fit in the vector (8*16b=128b), therefore failing to check some of the
signatures. This patch adds a compile time check to fallback to scalar
code in this case.

Signed-off-by: Yoan Picchi 
---
 lib/hash/compare_signatures_arm_pvt.h | 2 +-
 lib/hash/compare_signatures_x86_pvt.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/hash/compare_signatures_arm_pvt.h 
b/lib/hash/compare_signatures_arm_pvt.h
index 80b6afb7a5..74b3286c95 100644
--- a/lib/hash/compare_signatures_arm_pvt.h
+++ b/lib/hash/compare_signatures_arm_pvt.h
@@ -23,7 +23,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
 
/* For match mask the first bit of every two bits indicates the match */
switch (sig_cmp_fn) {
-#if defined(__ARM_NEON)
+#if defined(__ARM_NEON) && RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
uint16x8_t vmat, vsig, x;
int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
diff --git a/lib/hash/compare_signatures_x86_pvt.h 
b/lib/hash/compare_signatures_x86_pvt.h
index 11a82aced9..f77b37f1cd 100644
--- a/lib/hash/compare_signatures_x86_pvt.h
+++ b/lib/hash/compare_signatures_x86_pvt.h
@@ -23,7 +23,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
 
/* For match mask the first bit of every two bits indicates the match */
switch (sig_cmp_fn) {
-#if defined(__SSE2__)
+#if defined(__SSE2__) && RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_SSE:
/* Compare all signatures in the bucket */
*prim_hash_matches = 
_mm_movemask_epi8(_mm_cmpeq_epi16(_mm_load_si128(
-- 
2.34.1



[PATCH v11 4/7] hash: pack the hitmask for hash in bulk lookup

2024-07-05 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.
In addition, the new dense hitmask interweave the primary
and secondary matches which allow a better cache usage and
enable future improvements for the SIMD implementations
The default non SIMD path now use this dense mask.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/compare_signatures_arm_pvt.h |  47 
 lib/hash/compare_signatures_generic_pvt.h |  31 +++---
 lib/hash/compare_signatures_x86_pvt.h |   9 +-
 lib/hash/rte_cuckoo_hash.c| 124 --
 4 files changed, 145 insertions(+), 66 deletions(-)

diff --git a/lib/hash/compare_signatures_arm_pvt.h 
b/lib/hash/compare_signatures_arm_pvt.h
index 74b3286c95..0fc657c49b 100644
--- a/lib/hash/compare_signatures_arm_pvt.h
+++ b/lib/hash/compare_signatures_arm_pvt.h
@@ -6,48 +6,57 @@
 #ifndef _COMPARE_SIGNATURE_ARM_PVT_H_
 #define _COMPARE_SIGNATURE_ARM_PVT_H_
 
+/*
+ * Arm's version uses a densely packed hitmask buffer:
+ * Every bit is in use.
+ */
+
 #include 
 #include 
 #include 
 
 #include "rte_cuckoo_hash.h"
 
+#define DENSE_HASH_BULK_LOOKUP 1
+
 static inline void
-compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
-   const struct rte_hash_bucket *prim_bkt,
-   const struct rte_hash_bucket *sec_bkt,
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
uint16_t sig,
enum rte_hash_sig_compare_function sig_cmp_fn)
 {
-   unsigned int i;
 
-   /* For match mask the first bit of every two bits indicates the match */
+   static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 
8),
+   "hitmask_buffer must be wide enough to fit a dense hitmask");
+
+   /* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
 #if defined(__ARM_NEON) && RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16_t low, high;
 
vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   low = (uint16_t)(vaddvq_u16(x));
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   high = (uint16_t)(vaddvq_u16(x));
+   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+
}
break;
 #endif
default:
-   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-   *prim_hash_matches |=
-   ((sig == prim_bkt->sig_current[i]) << (i << 1));
-   *sec_hash_matches |=
-   ((sig == sec_bkt->sig_current[i]) << (i << 1));
+   for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *hitmask_buffer |= (sig == prim_bucket_sigs[i]) << i;
+   *hitmask_buffer |=
+   ((sig == sec_bucket_sigs[i]) << i) << 
RTE_HASH_BUCKET_ENTRIES;
}
}
 }
diff --git a/lib/hash/compare_signatures_generic_pvt.h 
b/lib/hash/compare_signatures_generic_pvt.h
index 43587adcef..1d065d4c28 100644
--- a/lib/hash/compare_signatures_generic_pvt.h
+++ b/lib/hash/compare_signatures_generic_pvt.h
@@ -6,27 +6,34 @@
 #ifndef _COMPARE_SIGNATURE_GENERIC_PVT_H_
 #define _COMPARE_SIGNATURE_GENERIC_PVT_H_
 
+/*
+ * The generic version could use either a dense or sparsely packed hitmask 
buffer,
+ * but the dense one is slightly faster.
+ */
+
 #include 
 #include 
 #include 
 
 #

[PATCH v11 5/7] hash: optimize compare signature for NEON

2024-07-05 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/compare_signatures_arm_pvt.h | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/hash/compare_signatures_arm_pvt.h 
b/lib/hash/compare_signatures_arm_pvt.h
index 0fc657c49b..0245fec26f 100644
--- a/lib/hash/compare_signatures_arm_pvt.h
+++ b/lib/hash/compare_signatures_arm_pvt.h
@@ -34,21 +34,21 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
switch (sig_cmp_fn) {
 #if defined(__ARM_NEON) && RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
-   uint16_t low, high;
+   uint16x8_t vmat, hit1, hit2;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   low = (uint16_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   high = (uint16_t)(vaddvq_u16(x));
-   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
 
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
}
break;
 #endif
-- 
2.34.1



[PATCH v11 7/7] hash: add SVE support for bulk key lookup

2024-07-05 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- New SVE code is ~5% slower than optimized NEON for N2 processor for
128b vectors.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
 lib/hash/compare_signatures_arm_pvt.h | 57 +++
 lib/hash/rte_cuckoo_hash.c|  8 +++-
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/lib/hash/compare_signatures_arm_pvt.h 
b/lib/hash/compare_signatures_arm_pvt.h
index 0245fec26f..86843b8a8a 100644
--- a/lib/hash/compare_signatures_arm_pvt.h
+++ b/lib/hash/compare_signatures_arm_pvt.h
@@ -51,6 +51,63 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
*hitmask_buffer = vaddvq_u16(hit2);
}
break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparisons at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred, 
primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, sv_matches, 
shift);
+   *hitmask_buffer = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   } else {
+   do {
+   pred = svwhilelt_b16(i, 
RTE_HASH_BUCKET_ENTRIES);
+   uint16_t lower_half = 0;
+   uint16_t upper_half = 0;
+   /* Compare all signatures in the primary bucket 
*/
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &prim_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   lower_half = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   /* Compare all signatures in the secondary 
bucket */
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &sec_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   upper_half = svorv_u16(svptrue_b16(), 
sv_matches)
+   << RTE_HASH_BUCKET_ENTRIES;
+   }
+   hitmask_buffer[i / 8] = upper_half | lower_half;
+   i += vl;
+   } while (i < RTE_HASH_BUCKET_ENTRIES);
+   }
+   }
+   break;
 #endif
default:
for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 187918a05a..c30ea13000 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -40,6 +40,7 @@ enum rte_hash_sig_compare_function {
RTE_HASH_COMPARE_SCALAR = 0,
RTE_HASH_COMPARE_SSE,
RTE_HASH_COMPARE_NEON,
+   RTE_HASH_COMPARE_SVE,
RTE_HASH_COMPARE_NUM
 };
 
@@ -461,8 +462,13 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_

[PATCH v11 6/7] test/hash: check bulk lookup of keys after collision

2024-07-05 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap |  1 +
 app/test/test_hash.c | 99 ++--
 2 files changed, 77 insertions(+), 23 deletions(-)

diff --git a/.mailmap b/.mailmap
index ec525981fe..41a8a99a7c 100644
--- a/.mailmap
+++ b/.mailmap
@@ -505,6 +505,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 24d3b547ad..ab3b37de3f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | (3 << 16);
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+

[PATCH v12 0/7] hash: add SVE support for bulk key lookup

2024-07-08 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

V4->V5:
  Commit message

V5->V6:
  Move the arch-specific code into new arch-specific files
  Isolate the data struture refactor from adding SVE

V6->V7:
  Commit message
  Moved RTE_HASH_COMPARE_SVE to the last commit of the chain

V7->V8:
  Commit message
  Typos and missing spaces

V8->V9:
  Use __rte_unused instead of (void)
  Fix an indentation mistake

V9->V10:
  Fix more formating and indentation
  Move the new compare signature file directly in hash instead of being
in a new subdir
  Re-order includes
  Remove duplicated static check
  Move rte_hash_sig_compare_function's definition into a private header

V10->V11:
  Split the "pack the hitmask" commit into four commits:
Move the compare function enum out of the ABI
Move the compare function implementations into arch-specific files
Add a missing check on RTE_HASH_BUCKET_ENTRIES in case we change it
  in the future
Implement the dense hitmask
  Add missing header guards
  Move compare function enum into cuckoo_hash.c instead of its own header.

V11->V12:
  Change the name of the compare function file (remove the _pvt suffix)

Yoan Picchi (7):
  hash: make compare signature function enum private
  hash: split compare signature into arch-specific files
  hash: add a check on hash entry max size
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap  |   2 +
 app/test/test_hash.c  |  99 +---
 lib/hash/compare_signatures_arm.h | 121 +++
 lib/hash/compare_signatures_generic.h |  40 +
 lib/hash/compare_signatures_x86.h |  55 +++
 lib/hash/rte_cuckoo_hash.c| 207 ++
 lib/hash/rte_cuckoo_hash.h|  10 +-
 7 files changed, 410 insertions(+), 124 deletions(-)
 create mode 100644 lib/hash/compare_signatures_arm.h
 create mode 100644 lib/hash/compare_signatures_generic.h
 create mode 100644 lib/hash/compare_signatures_x86.h

-- 
2.25.1



[PATCH v12 3/7] hash: add a check on hash entry max size

2024-07-08 Thread Yoan Picchi
If were to change RTE_HASH_BUCKET_ENTRIES to be over 8, it would no longer
fit in the vector (8*16b=128b), therefore failing to check some of the
signatures. This patch adds a compile time check to fallback to scalar
code in this case.

Signed-off-by: Yoan Picchi 
---
 lib/hash/compare_signatures_arm.h | 2 +-
 lib/hash/compare_signatures_x86.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/hash/compare_signatures_arm.h 
b/lib/hash/compare_signatures_arm.h
index 80b6afb7a5..74b3286c95 100644
--- a/lib/hash/compare_signatures_arm.h
+++ b/lib/hash/compare_signatures_arm.h
@@ -23,7 +23,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
 
/* For match mask the first bit of every two bits indicates the match */
switch (sig_cmp_fn) {
-#if defined(__ARM_NEON)
+#if defined(__ARM_NEON) && RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
uint16x8_t vmat, vsig, x;
int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
diff --git a/lib/hash/compare_signatures_x86.h 
b/lib/hash/compare_signatures_x86.h
index 11a82aced9..f77b37f1cd 100644
--- a/lib/hash/compare_signatures_x86.h
+++ b/lib/hash/compare_signatures_x86.h
@@ -23,7 +23,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
 
/* For match mask the first bit of every two bits indicates the match */
switch (sig_cmp_fn) {
-#if defined(__SSE2__)
+#if defined(__SSE2__) && RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_SSE:
/* Compare all signatures in the bucket */
*prim_hash_matches = 
_mm_movemask_epi8(_mm_cmpeq_epi16(_mm_load_si128(
-- 
2.25.1



[PATCH v12 1/7] hash: make compare signature function enum private

2024-07-08 Thread Yoan Picchi
enum rte_hash_sig_compare_function is only used internally. This
patch move it out of the public ABI and into the C file.

Signed-off-by: Yoan Picchi 
---
 lib/hash/rte_cuckoo_hash.c | 10 ++
 lib/hash/rte_cuckoo_hash.h | 10 +-
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index d87aa52b5b..e1d50e7d40 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -33,6 +33,16 @@ RTE_LOG_REGISTER_DEFAULT(hash_logtype, INFO);
 
 #include "rte_cuckoo_hash.h"
 
+/* Enum used to select the implementation of the signature comparison function 
to use
+ * eg: A system supporting SVE might want to use a NEON or scalar 
implementation.
+ */
+enum rte_hash_sig_compare_function {
+   RTE_HASH_COMPARE_SCALAR = 0,
+   RTE_HASH_COMPARE_SSE,
+   RTE_HASH_COMPARE_NEON,
+   RTE_HASH_COMPARE_NUM
+};
+
 /* Mask of all flags supported by this version */
 #define RTE_HASH_EXTRA_FLAGS_MASK (RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT | \
   RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD | \
diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
index a528f1d1a0..26a992419a 100644
--- a/lib/hash/rte_cuckoo_hash.h
+++ b/lib/hash/rte_cuckoo_hash.h
@@ -134,14 +134,6 @@ struct rte_hash_key {
char key[0];
 };
 
-/* All different signature compare functions */
-enum rte_hash_sig_compare_function {
-   RTE_HASH_COMPARE_SCALAR = 0,
-   RTE_HASH_COMPARE_SSE,
-   RTE_HASH_COMPARE_NEON,
-   RTE_HASH_COMPARE_NUM
-};
-
 /** Bucket structure */
 struct __rte_cache_aligned rte_hash_bucket {
uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
@@ -199,7 +191,7 @@ struct __rte_cache_aligned rte_hash {
/**< Custom function used to compare keys. */
enum cmp_jump_table_case cmp_jump_table_idx;
/**< Indicates which compare function to use. */
-   enum rte_hash_sig_compare_function sig_cmp_fn;
+   unsigned int sig_cmp_fn;
/**< Indicates which signature compare function to use. */
uint32_t bucket_bitmask;
/**< Bitmask for getting bucket index from hash signature. */
-- 
2.25.1



[PATCH v12 2/7] hash: split compare signature into arch-specific files

2024-07-08 Thread Yoan Picchi
Move the compare_signatures function into architecture-specific files
They all have the default scalar option as an option if we disable
vectorisation.

Signed-off-by: Yoan Picchi 
---
 .mailmap  |  1 +
 lib/hash/compare_signatures_arm.h | 55 +++
 lib/hash/compare_signatures_generic.h | 33 ++
 lib/hash/compare_signatures_x86.h | 48 
 lib/hash/rte_cuckoo_hash.c| 65 ---
 5 files changed, 145 insertions(+), 57 deletions(-)
 create mode 100644 lib/hash/compare_signatures_arm.h
 create mode 100644 lib/hash/compare_signatures_generic.h
 create mode 100644 lib/hash/compare_signatures_x86.h

diff --git a/.mailmap b/.mailmap
index f76037213d..ec525981fe 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1661,6 +1661,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/compare_signatures_arm.h 
b/lib/hash/compare_signatures_arm.h
new file mode 100644
index 00..80b6afb7a5
--- /dev/null
+++ b/lib/hash/compare_signatures_arm.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+#ifndef _COMPARE_SIGNATURE_ARM_PVT_H_
+#define _COMPARE_SIGNATURE_ARM_PVT_H_
+
+#include 
+#include 
+#include 
+
+#include "rte_cuckoo_hash.h"
+
+static inline void
+compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
+   const struct rte_hash_bucket *prim_bkt,
+   const struct rte_hash_bucket *sec_bkt,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   unsigned int i;
+
+   /* For match mask the first bit of every two bits indicates the match */
+   switch (sig_cmp_fn) {
+#if defined(__ARM_NEON)
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
+   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
+   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   }
+   break;
+#endif
+   default:
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *prim_hash_matches |=
+   ((sig == prim_bkt->sig_current[i]) << (i << 1));
+   *sec_hash_matches |=
+   ((sig == sec_bkt->sig_current[i]) << (i << 1));
+   }
+   }
+}
+
+#endif
diff --git a/lib/hash/compare_signatures_generic.h 
b/lib/hash/compare_signatures_generic.h
new file mode 100644
index 00..43587adcef
--- /dev/null
+++ b/lib/hash/compare_signatures_generic.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+#ifndef _COMPARE_SIGNATURE_GENERIC_PVT_H_
+#define _COMPARE_SIGNATURE_GENERIC_PVT_H_
+
+#include 
+#include 
+#include 
+
+#include "rte_cuckoo_hash.h"
+
+static inline void
+compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
+   const struct rte_hash_bucket *prim_bkt,
+   const struct rte_hash_bucket *sec_bkt,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   unsigned int i;
+
+   /* For match mask the first bit of every two bits indicates the match */
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *prim_hash_matches |=
+   ((sig == prim_bkt->sig_current[i]) << (i << 1));
+   *sec_hash_matches |=
+   ((sig == sec_bkt->sig_current[i]) << (i << 1));
+   }
+}
+
+#endif
diff --git a/lib/hash/compare_signatures_x86.h 
b/lib/hash/compare_signatures_x86.h
new file mode 100644
index 00..11a82aced9
--- /dev/null
+++ b/lib/hash/compare_signatures_x86.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+#ifndef _COMPARE_SIGNATURE_X86_PVT_H_
+#define _COMPARE_SIGNATURE_X86_PVT_H_
+
+#include 

[PATCH v12 5/7] hash: optimize compare signature for NEON

2024-07-08 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/compare_signatures_arm.h | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/hash/compare_signatures_arm.h 
b/lib/hash/compare_signatures_arm.h
index 0fc657c49b..0245fec26f 100644
--- a/lib/hash/compare_signatures_arm.h
+++ b/lib/hash/compare_signatures_arm.h
@@ -34,21 +34,21 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
switch (sig_cmp_fn) {
 #if defined(__ARM_NEON) && RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
-   uint16_t low, high;
+   uint16x8_t vmat, hit1, hit2;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   low = (uint16_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   high = (uint16_t)(vaddvq_u16(x));
-   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
 
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
}
break;
 #endif
-- 
2.25.1



[PATCH v12 4/7] hash: pack the hitmask for hash in bulk lookup

2024-07-08 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.
In addition, the new dense hitmask interweave the primary
and secondary matches which allow a better cache usage and
enable future improvements for the SIMD implementations
The default non SIMD path now use this dense mask.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/compare_signatures_arm.h |  47 ++
 lib/hash/compare_signatures_generic.h |  31 ---
 lib/hash/compare_signatures_x86.h |   9 +-
 lib/hash/rte_cuckoo_hash.c| 124 +++---
 4 files changed, 145 insertions(+), 66 deletions(-)

diff --git a/lib/hash/compare_signatures_arm.h 
b/lib/hash/compare_signatures_arm.h
index 74b3286c95..0fc657c49b 100644
--- a/lib/hash/compare_signatures_arm.h
+++ b/lib/hash/compare_signatures_arm.h
@@ -6,48 +6,57 @@
 #ifndef _COMPARE_SIGNATURE_ARM_PVT_H_
 #define _COMPARE_SIGNATURE_ARM_PVT_H_
 
+/*
+ * Arm's version uses a densely packed hitmask buffer:
+ * Every bit is in use.
+ */
+
 #include 
 #include 
 #include 
 
 #include "rte_cuckoo_hash.h"
 
+#define DENSE_HASH_BULK_LOOKUP 1
+
 static inline void
-compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
-   const struct rte_hash_bucket *prim_bkt,
-   const struct rte_hash_bucket *sec_bkt,
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
uint16_t sig,
enum rte_hash_sig_compare_function sig_cmp_fn)
 {
-   unsigned int i;
 
-   /* For match mask the first bit of every two bits indicates the match */
+   static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 
8),
+   "hitmask_buffer must be wide enough to fit a dense hitmask");
+
+   /* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
 #if defined(__ARM_NEON) && RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16_t low, high;
 
vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   low = (uint16_t)(vaddvq_u16(x));
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   high = (uint16_t)(vaddvq_u16(x));
+   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+
}
break;
 #endif
default:
-   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-   *prim_hash_matches |=
-   ((sig == prim_bkt->sig_current[i]) << (i << 1));
-   *sec_hash_matches |=
-   ((sig == sec_bkt->sig_current[i]) << (i << 1));
+   for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *hitmask_buffer |= (sig == prim_bucket_sigs[i]) << i;
+   *hitmask_buffer |=
+   ((sig == sec_bucket_sigs[i]) << i) << 
RTE_HASH_BUCKET_ENTRIES;
}
}
 }
diff --git a/lib/hash/compare_signatures_generic.h 
b/lib/hash/compare_signatures_generic.h
index 43587adcef..1d065d4c28 100644
--- a/lib/hash/compare_signatures_generic.h
+++ b/lib/hash/compare_signatures_generic.h
@@ -6,27 +6,34 @@
 #ifndef _COMPARE_SIGNATURE_GENERIC_PVT_H_
 #define _COMPARE_SIGNATURE_GENERIC_PVT_H_
 
+/*
+ * The generic version could use either a dense or sparsely packed hitmask 
buffer,
+ * but the dense one is slightly faster.
+ */
+
 #include 
 #include 
 #include 
 
 #include "rte_cuckoo_hash.h"
 
+#defin

[PATCH v12 6/7] test/hash: check bulk lookup of keys after collision

2024-07-08 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap |  1 +
 app/test/test_hash.c | 99 ++--
 2 files changed, 77 insertions(+), 23 deletions(-)

diff --git a/.mailmap b/.mailmap
index ec525981fe..41a8a99a7c 100644
--- a/.mailmap
+++ b/.mailmap
@@ -505,6 +505,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 24d3b547ad..ab3b37de3f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | (3 << 16);
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+

[PATCH v12 7/7] hash: add SVE support for bulk key lookup

2024-07-08 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- New SVE code is ~5% slower than optimized NEON for N2 processor for
128b vectors.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
 lib/hash/compare_signatures_arm.h | 57 +++
 lib/hash/rte_cuckoo_hash.c|  8 -
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/lib/hash/compare_signatures_arm.h 
b/lib/hash/compare_signatures_arm.h
index 0245fec26f..86843b8a8a 100644
--- a/lib/hash/compare_signatures_arm.h
+++ b/lib/hash/compare_signatures_arm.h
@@ -51,6 +51,63 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
*hitmask_buffer = vaddvq_u16(hit2);
}
break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparisons at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred, 
primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, sv_matches, 
shift);
+   *hitmask_buffer = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   } else {
+   do {
+   pred = svwhilelt_b16(i, 
RTE_HASH_BUCKET_ENTRIES);
+   uint16_t lower_half = 0;
+   uint16_t upper_half = 0;
+   /* Compare all signatures in the primary bucket 
*/
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &prim_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   lower_half = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   /* Compare all signatures in the secondary 
bucket */
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &sec_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   upper_half = svorv_u16(svptrue_b16(), 
sv_matches)
+   << RTE_HASH_BUCKET_ENTRIES;
+   }
+   hitmask_buffer[i / 8] = upper_half | lower_half;
+   i += vl;
+   } while (i < RTE_HASH_BUCKET_ENTRIES);
+   }
+   }
+   break;
 #endif
default:
for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 7512861aac..ba4093a887 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -40,6 +40,7 @@ enum rte_hash_sig_compare_function {
RTE_HASH_COMPARE_SCALAR = 0,
RTE_HASH_COMPARE_SSE,
RTE_HASH_COMPARE_NEON,
+   RTE_HASH_COMPARE_SVE,
RTE_HASH_COMPARE_NUM
 };
 
@@ -461,8 +462,13 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COM

Re: [RFC PATCH v4 2/4] dts: add doc generation dependencies

2023-10-27 Thread Yoan Picchi
uot;, hash = 
"sha256:6a7e7d8af34eb8fc57d52a09c6b6b9c46ff44aea5951bc831eeb9245378f3689"},
+{file = "sphinx_rtd_theme-1.2.2.tar.gz", hash = 
"sha256:01c5c5a72e2d025bd23d1f06c59a4831b06e6ce6c01fdd5ebfe9986c0a880fc7"},
+]
+
+[package.dependencies]
+docutils = "<0.19"
+sphinx = ">=1.6,<7"
+sphinxcontrib-jquery = ">=4,<5"
+
+[package.extras]
+dev = ["bump2version", "sphinxcontrib-httpdomain", "transifex-client", "wheel"]
+
+[[package]]
+name = "sphinxcontrib-applehelp"
+version = "1.0.4"
+description = "sphinxcontrib-applehelp is a Sphinx extension which outputs Apple 
help books"
+optional = false
+python-versions = ">=3.8"
+files = [
+{file = "sphinxcontrib-applehelp-1.0.4.tar.gz", hash = 
"sha256:828f867945bbe39817c210a1abfd1bc4895c8b73fcaade56d45357a348a07d7e"},
+{file = "sphinxcontrib_applehelp-1.0.4-py3-none-any.whl", hash = 
"sha256:29d341f67fb0f6f586b23ad80e072c8e6ad0b48417db2bde114a4c9746feb228"},
+]
+
+[package.extras]
+lint = ["docutils-stubs", "flake8", "mypy"]
+test = ["pytest"]
+
+[[package]]
+name = "sphinxcontrib-devhelp"
+version = "1.0.2"
+description = "sphinxcontrib-devhelp is a sphinx extension which outputs Devhelp 
document."
+optional = false
+python-versions = ">=3.5"
+files = [
+{file = "sphinxcontrib-devhelp-1.0.2.tar.gz", hash = 
"sha256:ff7f1afa7b9642e7060379360a67e9c41e8f3121f2ce9164266f61b9f4b338e4"},
+{file = "sphinxcontrib_devhelp-1.0.2-py2.py3-none-any.whl", hash = 
"sha256:8165223f9a335cc1af7ffe1ed31d2871f325254c0423bc0c4c7cd1c1e4734a2e"},
+]
+
+[package.extras]
+lint = ["docutils-stubs", "flake8", "mypy"]
+test = ["pytest"]
+
+[[package]]
+name = "sphinxcontrib-htmlhelp"
+version = "2.0.1"
+description = "sphinxcontrib-htmlhelp is a sphinx extension which renders HTML help 
files"
+optional = false
+python-versions = ">=3.8"
+files = [
+{file = "sphinxcontrib-htmlhelp-2.0.1.tar.gz", hash = 
"sha256:0cbdd302815330058422b98a113195c9249825d681e18f11e8b1f78a2f11efff"},
+{file = "sphinxcontrib_htmlhelp-2.0.1-py3-none-any.whl", hash = 
"sha256:c38cb46dccf316c79de6e5515e1770414b797162b23cd3d06e67020e1d2a6903"},
+]
+
+[package.extras]
+lint = ["docutils-stubs", "flake8", "mypy"]
+test = ["html5lib", "pytest"]
+
+[[package]]
+name = "sphinxcontrib-jquery"
+version = "4.1"
+description = "Extension to include jQuery on newer Sphinx releases"
+optional = false
+python-versions = ">=2.7"
+files = [
+{file = "sphinxcontrib-jquery-4.1.tar.gz", hash = 
"sha256:1620739f04e36a2c779f1a131a2dfd49b2fd07351bf1968ced074365933abc7a"},
+{file = "sphinxcontrib_jquery-4.1-py2.py3-none-any.whl", hash = 
"sha256:f936030d7d0147dd026a4f2b5a57343d233f1fc7b363f68b3d4f1cb0993878ae"},
+]
+
+[package.dependencies]
+Sphinx = ">=1.8"
+
+[[package]]
+name = "sphinxcontrib-jsmath"
+version = "1.0.1"
+description = "A sphinx extension which renders display math in HTML via 
JavaScript"
+optional = false
+python-versions = ">=3.5"
+files = [
+{file = "sphinxcontrib-jsmath-1.0.1.tar.gz", hash = 
"sha256:a9925e4a4587247ed2191a22df5f6970656cb8ca2bd6284309578f2153e0c4b8"},
+{file = "sphinxcontrib_jsmath-1.0.1-py2.py3-none-any.whl", hash = 
"sha256:2ec2eaebfb78f3f2078e73666b1415417a116cc848b72e5172e596c871103178"},
+]
+
+[package.extras]
+test = ["flake8", "mypy", "pytest"]
+
+[[package]]
+name = "sphinxcontrib-qthelp"
+version = "1.0.3"
+description = "sphinxcontrib-qthelp is a sphinx extension which outputs QtHelp 
document."
+optional = false
+python-versions = ">=3.5"
+files = [
+{file = "sphinxcontrib-qthelp-1.0.3.tar.gz", hash = 
"sha256:4c33767ee058b70dba89a6fc5c1892c0d57a54be67ddd3e7875a18d14cba5a72"},
+{file = "sphinxcontrib_qthelp-1.0.3-py2.py3-none-any.whl", hash = 
"sha256:bd9fc24bcb748a8d51fd4ecaade681350aa63009a347a8c14e637895444dfab6"},
+]
+
+[package.extras]
+lint = ["docutils-stubs", "flake8", "mypy"]
+test = ["pytest"]
+
+[[package]]
+name = "sphinxcontrib-serializinghtml"
+version = "1.1.5"
+description = "sphinxcontrib-serializinghtml is a sphinx extension which outputs 
\"serialized\" HTML files (json and pickle)."
+optional = false
+python-versions = ">=3.5"
+files = [
+{file = "sphinxcontrib-serializinghtml-1.1.5.tar.gz", hash = 
"sha256:aa5f6de5dfdf809ef505c4895e51ef5c9eac17d0f287933eb49ec495280b6952"},
+{file = "sphinxcontrib_serializinghtml-1.1.5-py2.py3-none-any.whl", hash = 
"sha256:352a9a00ae864471d3a7ead8d7d79f5fc0b57e8b3f95e9867eb9eb28999b92fd"},
+]
+
+[package.extras]
+lint = ["docutils-stubs", "flake8", "mypy"]
+test = ["pytest"]
+
  [[package]]
  name = "toml"
  version = "0.10.2"
@@ -819,6 +1247,23 @@ files = [
  {file = "typing_extensions-4.7.1.tar.gz", hash = 
"sha256:b75ddc264f0ba5615db7ba217daeb99701ad295353c45f9e95963337ceeeffb2"},
  ]
  
+[[package]]

+name = "urllib3"
+version = "2.0.4"
+description = "HTTP library with thread-safe connection pooling, file post, and 
more."
+optional = false
+python-versions = ">=3.7"
+files = [
+{file = "urllib3-2.0.4-py3-none-any.whl", hash = 
"sha256:de7df1803967d2c2a98e4b11bb7d6bd9210474c46e8a0401514e3a42a75ebde4"},
+{file = "urllib3-2.0.4.tar.gz", hash = 
"sha256:8d22f86aae8ef5e410d4f539fde9ce6b2113a001bb4d189e0aed70642d602b11"},
+]
+
+[package.extras]
+brotli = ["brotli (>=1.0.9)", "brotlicffi (>=0.8.0)"]
+secure = ["certifi", "cryptography (>=1.9)", "idna (>=2.0.0)", "pyopenssl (>=17.1.0)", 
"urllib3-secure-extra"]
+socks = ["pysocks (>=1.5.6,!=1.5.7,<2.0)"]
+zstd = ["zstandard (>=0.18.0)"]
+
  [[package]]
  name = "warlock"
  version = "2.0.1"
@@ -837,4 +1282,4 @@ jsonschema = ">=4,<5"
  [metadata]
  lock-version = "2.0"
  python-versions = "^3.10"
-content-hash = 
"0b1e4a1cb8323e17e5ee5951c97e74bde6e60d0413d7b25b1803d5b2bab39639"
+content-hash = 
"fea1a3eddd1286d2ccd3bdb61c6ce085403f31567dbe4f55b6775bcf1e325372"
diff --git a/dts/pyproject.toml b/dts/pyproject.toml
index 6762edfa6b..159940ce02 100644
--- a/dts/pyproject.toml
+++ b/dts/pyproject.toml
@@ -34,6 +34,13 @@ pylama = "^8.4.1"
  pyflakes = "^2.5.0"
  toml = "^0.10.2"
  
+[tool.poetry.group.docs]

+optional = true
+
+[tool.poetry.group.docs.dependencies]
+sphinx = "<7"
+sphinx-rtd-theme = "^1.2.2"
+
  [build-system]
  requires = ["poetry-core>=1.0.0"]
  build-backend = "poetry.core.masonry.api"

Reviewed-by: Yoan Picchi 



Re: [RFC PATCH v4 4/4] dts: format docstrigs to google format

2023-10-31 Thread Yoan Picchi

On 8/31/23 11:04, Juraj Linkeš wrote:

WIP: only one module is reformatted to serve as a demonstration.

The google format is documented here [0].

[0]: https://google.github.io/styleguide/pyguide.html

Signed-off-by: Juraj Linkeš 
Acked-by: Jeremy Spweock 
---
  dts/framework/testbed_model/node.py | 171 +++-
  1 file changed, 118 insertions(+), 53 deletions(-)

diff --git a/dts/framework/testbed_model/node.py 
b/dts/framework/testbed_model/node.py
index 23efa79c50..619743ebe7 100644
--- a/dts/framework/testbed_model/node.py
+++ b/dts/framework/testbed_model/node.py
@@ -3,8 +3,13 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022-2023 University of New Hampshire
  
-"""

-A node is a generic host that DTS connects to and manages.
+"""Common functionality for node management.
+
+There's a base class, Node, that's supposed to be extended by other classes


This comment and all the following ones are all something of a nitpick.
This sounds too passive. Why not having something simpler like:
The virtual class Node is meant to be extended by other classes
with functionality specific to that node type.


+with functionality specific to that node type.
+The only part that can be used standalone is the Node.skip_setup static method,
+which is a decorator used to skip method execution if skip_setup is passed
+by the user on the cmdline or in an env variable.


I'd extend env to the full word as this is meant to go in the documentation.


  """
  
  from abc import ABC

@@ -35,10 +40,26 @@
  
  
  class Node(ABC):

-"""
-Basic class for node management. This class implements methods that
-manage a node, such as information gathering (of CPU/PCI/NIC) and
-environment setup.
+"""The base class for node management.
+
+It shouldn't be instantiated, but rather extended.
+It implements common methods to manage any node:
+
+   * connection to the node
+   * information gathering of CPU
+   * hugepages setup
+
+Arguments:
+node_config: The config from the input configuration file.
+
+Attributes:
+main_session: The primary OS-agnostic remote session used
+to communicate with the node.
+config: The configuration used to create the node.
+name: The name of the node.
+lcores: The list of logical cores that DTS can use on the node.
+It's derived from logical cores present on the node and user 
configuration.
+ports: The ports of this node specified in user configuration.
  """
  
  main_session: OSSession

@@ -77,9 +98,14 @@ def _init_ports(self) -> None:
  self.configure_port_state(port)
  
  def set_up_execution(self, execution_config: ExecutionConfiguration) -> None:

-"""
-Perform the execution setup that will be done for each execution
-this node is part of.
+"""Execution setup steps.
+
+Configure hugepages and call self._set_up_execution where
+the rest of the configuration steps (if any) are implemented.
+
+Args:
+execution_config: The execution configuration according to which
+the setup steps will be taken.
  """
  self._setup_hugepages()
  self._set_up_execution(execution_config)
@@ -88,58 +114,78 @@ def set_up_execution(self, execution_config: 
ExecutionConfiguration) -> None:
  self.virtual_devices.append(VirtualDevice(vdev))
  
  def _set_up_execution(self, execution_config: ExecutionConfiguration) -> None:

-"""
-This method exists to be optionally overwritten by derived classes and
-is not decorated so that the derived class doesn't have to use the 
decorator.
+"""Optional additional execution setup steps for derived classes.
+
+Derived classes should overwrite this
+if they want to add additional execution setup steps.


I'd probably use need or require instead of want (it's the dev that 
wants, not the class)



  """
  
  def tear_down_execution(self) -> None:

-"""
-Perform the execution teardown that will be done after each execution
-this node is part of concludes.
+"""Execution teardown steps.
+
+There are currently no common execution teardown steps
+common to all DTS node types.
  """
  self.virtual_devices = []
  self._tear_down_execution()
  
  def _tear_down_execution(self) -> None:

-"""
-This method exists to be optionally overwritten by derived classes and
-is not decorated so that the derived class doesn't have to use the 
decorator.
+"""Optional additional execution teardown steps for derived classes.
+
+Derived classes should overwrite this
+if they want to add additional execution teardown steps.
  """
  
  def set_up_build_target(

  self, build_target_config: BuildTargetConfiguration

[PATCH v3 0/4] hash: add SVE support for bulk key lookup

2023-11-07 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

Yoan Picchi (4):
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap   |   2 +
 app/test/test_hash.c   |  99 ++
 lib/hash/rte_cuckoo_hash.c | 264 +
 lib/hash/rte_cuckoo_hash.h |   1 +
 4 files changed, 287 insertions(+), 79 deletions(-)

-- 
2.25.1



[PATCH v3 2/4] hash: optimize compare signature for NEON

2023-11-07 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/rte_cuckoo_hash.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 2aa96eb862..a4b907c45c 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1864,19 +1864,17 @@ compare_signatures_dense(uint32_t *prim_hash_matches, 
uint32_t *sec_hash_matches
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16x8_t vmat, x;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
+   x = vandq_u16(vmat, mask);
*prim_hash_matches = (uint32_t)(vaddvq_u16(x));
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
+   x = vandq_u16(vmat, mask);
*sec_hash_matches = (uint32_t)(vaddvq_u16(x));
}
break;
-- 
2.25.1



[PATCH v3 1/4] hash: pack the hitmask for hash in bulk lookup

2023-11-07 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap   |   2 +
 lib/hash/rte_cuckoo_hash.c | 118 ++---
 2 files changed, 86 insertions(+), 34 deletions(-)

diff --git a/.mailmap b/.mailmap
index 3f5bab26a8..b9c49aa7f6 100644
--- a/.mailmap
+++ b/.mailmap
@@ -485,6 +485,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
@@ -1602,6 +1603,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 19b23f2a97..2aa96eb862 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1850,8 +1850,50 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 
 }
 
+#if defined(__ARM_NEON)
+
+static inline void
+compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
+   const struct rte_hash_bucket *prim_bkt,
+   const struct rte_hash_bucket *sec_bkt,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   unsigned int i;
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   }
+   break;
+   default:
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *prim_hash_matches |=
+   ((sig == prim_bkt->sig_current[i]) << i);
+   *sec_hash_matches |=
+   ((sig == sec_bkt->sig_current[i]) << i);
+   }
+   }
+}
+
+#else
+
 static inline void
-compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
+compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
const struct rte_hash_bucket *prim_bkt,
const struct rte_hash_bucket *sec_bkt,
uint16_t sig,
@@ -1878,25 +1920,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
/* Extract the even-index bits only */
*sec_hash_matches &= 0x;
break;
-#elif defined(__ARM_NEON)
-   case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1};
-
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
-   /* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
-   /* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bkt->sig_current));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
-   }
-   break;
-#endif
+#endif /* defined(__SSE2__) */
default:
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
*prim_hash_matches |=
@@ -1907,6 +1931,8 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
}
 }
 
+#endif /* defined(__ARM_NEON) */
+
 static inline void
 __bulk_lookup_l(const struct rte_hash *h, const void **keys,
const struct rte_hash_bucket **primary_bkt,
@@ -1921,18 +1947,30 @@ __bulk_lookup_l(const struct rte_hash *h, const void 
**keys,
uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
struct rte_hash_

[PATCH v3 3/4] test/hash: check bulk lookup of keys after collision

2023-11-07 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 app/test/test_hash.c | 99 ++--
 1 file changed, 76 insertions(+), 23 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index d586878a22..c4e7f8190e 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | 3 << 16;
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+   RETURN_IF_ERROR(pos[KEY_PER_BUCKET] < 0,
+   "failed to add key (pos[%d]=%d)", KEY_PER_BUCKET, 
pos[KEY_PER_BUCKET]);
+   expected_pos[KEY_PER_BUCKET] = pos[KEY_PER_BUCKET];
 
/* Lookup */
-   for (i = 0; i < 5; i++) {
+   for (i = 0; i < KEY_PER

[PATCH v3 4/4] hash: add SVE support for bulk key lookup

2023-11-07 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- Added Defines in code for SVE code support.
- Optimise NEON code
- New SVE code is ~5% slower than optimized NEON for N2 processor.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
 lib/hash/rte_cuckoo_hash.c | 196 -
 lib/hash/rte_cuckoo_hash.h |   1 +
 2 files changed, 151 insertions(+), 46 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index a4b907c45c..61637d02eb 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -435,8 +435,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
+   }
else
 #endif
h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
@@ -1853,37 +1856,103 @@ rte_hash_free_key_with_position(const struct rte_hash 
*h,
 #if defined(__ARM_NEON)
 
 static inline void
-compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
-   const struct rte_hash_bucket *prim_bkt,
-   const struct rte_hash_bucket *sec_bkt,
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
uint16_t sig,
enum rte_hash_sig_compare_function sig_cmp_fn)
 {
unsigned int i;
 
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept the whole hitmask if 
it is dense");
+
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {
+#if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, x;
+   uint16x8_t vmat, hit1, hit2;
const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *sec_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
+
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
+   }
+   break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparison at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred,
+   primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, sv_matches, 
shift);
+   *h

Re: [PATCH v5 03/23] dts: add basic developer docs

2023-11-07 Thread Yoan Picchi

On 11/6/23 17:15, Juraj Linkeš wrote:

Expand the framework contribution guidelines and add how to document the
code with Python docstrings.

Signed-off-by: Juraj Linkeš 
---
  doc/guides/tools/dts.rst | 73 
  1 file changed, 73 insertions(+)

diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 32c18ee472..b1e99107c3 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -264,6 +264,65 @@ which be changed with the ``--output-dir`` command line 
argument.
  The results contain basic statistics of passed/failed test cases and DPDK 
version.
  
  
+Contributing to DTS

+---
+
+There are two areas of contribution: The DTS framework and DTS test suites.
+
+The framework contains the logic needed to run test cases, such as connecting 
to nodes,
+running DPDK apps and collecting results.
+
+The test cases call APIs from the framework to test their scenarios. Adding 
test cases may
+require adding code to the framework as well.
+
+
+Framework Coding Guidelines
+~~~
+
+When adding code to the DTS framework, pay attention to the rest of the code
+and try not to divert much from it. The :ref:`DTS developer tools 
` will issue
+warnings when some of the basics are not met.
+
+The code must be properly documented with docstrings. The style must conform to
+the `Google style 
`_.
+See an example of the style
+`here 
`_.
+For cases which are not covered by the Google style, refer
+to `PEP 257 `_. There are some cases which 
are not covered by
+the two style guides, where we deviate or where some additional clarification 
is helpful:
+
+   * The __init__() methods of classes are documented separately from the 
docstring of the class
+ itself.
+   * The docstrigs of implemented abstract methods should refer to the 
superclass's definition
+ if there's no deviation.
+   * Instance variables/attributes should be documented in the docstring of 
the class
+ in the ``Attributes:`` section.
+   * The dataclass.dataclass decorator changes how the attributes are 
processed. The dataclass
+ attributes which result in instance variables/attributes should also be 
recorded
+ in the ``Attributes:`` section.
+   * Class variables/attributes, on the other hand, should be documented with 
``#:`` above
+ the type annotated line. The description may be omitted if the meaning is 
obvious.
+   * The Enum and TypedDict also process the attributes in particular ways and 
should be documented
+ with ``#:`` as well. This is mainly so that the autogenerated docs 
contain the assigned value.
+   * When referencing a parameter of a function or a method in their 
docstring, don't use
+ any articles and put the parameter into single backticks. This mimics the 
style of
+ `Python's documentation `_.
+   * When specifying a value, use double backticks::
+
+def foo(greet: bool) -> None:
+"""Demonstration of single and double backticks.
+
+`greet` controls whether ``Hello World`` is printed.
+
+Args:
+   greet: Whether to print the ``Hello World`` message.
+"""
+if greet:
+   print(f"Hello World")
+
+   * The docstring maximum line length is the same as the code maximum line 
length.
+
+
  How To Write a Test Suite
  -
  
@@ -293,6 +352,18 @@ There are four types of methods that comprise a test suite:

 | These methods don't need to be implemented if there's no need for them 
in a test suite.
   In that case, nothing will happen when they're is executed.


Not your change, but it does highlight a previous mistake : "they're is"

  
+#. **Configuration, traffic and other logic**

+
+   The ``TestSuite`` class contains a variety of methods for anything that
+   a test suite setup or teardown or a test case may need to do.


Three way or. There's a need for an oxford coma: setup, teardown, or a 
test case



+
+   The test suites also frequently use a DPDK app, such as testpmd, in 
interactive mode
+   and use the interactive shell instances directly.
+
+   These are the two main ways to call the framework logic in test suites. If 
there's any
+   functionality or logic missing from the framework, it should be implemented 
so that
+   the test suites can use one of these two ways.
+
  #. **Test case verification**
  
 Test case verification should be done with the ``verify`` method, which records the result.

@@ -308,6 +379,8 @@ There are four types of methods that comprise a test suite:
 and used by the test suite via the ``sut_node`` field.
  
  
+.. _dts_dev_tools:

+
  DTS Developer Tools
  ---
  




Re: [PATCH v5 02/23] dts: add docstring checker

2023-11-07 Thread Yoan Picchi

On 11/6/23 17:15, Juraj Linkeš wrote:

Python docstrings are the in-code way to document the code. The
docstring checker of choice is pydocstyle which we're executing from
Pylama, but the current latest versions are not complatible due to [0],
so pin the pydocstyle version to the latest working version.

[0] https://github.com/klen/pylama/issues/232

Signed-off-by: Juraj Linkeš 
---
  dts/poetry.lock| 12 ++--
  dts/pyproject.toml |  6 +-
  2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/dts/poetry.lock b/dts/poetry.lock
index f7b3b6d602..a734fa71f0 100644
--- a/dts/poetry.lock
+++ b/dts/poetry.lock
@@ -489,20 +489,20 @@ files = [
  
  [[package]]

  name = "pydocstyle"
-version = "6.3.0"
+version = "6.1.1"
  description = "Python docstring style checker"
  optional = false
  python-versions = ">=3.6"
  files = [
-{file = "pydocstyle-6.3.0-py3-none-any.whl", hash = 
"sha256:118762d452a49d6b05e194ef344a55822987a462831ade91ec5c06fd2169d019"},
-{file = "pydocstyle-6.3.0.tar.gz", hash = 
"sha256:7ce43f0c0ac87b07494eb9c0b462c0b73e6ff276807f204d6b53edc72b7e44e1"},
+{file = "pydocstyle-6.1.1-py3-none-any.whl", hash = 
"sha256:6987826d6775056839940041beef5c08cc7e3d71d63149b48e36727f70144dc4"},
+{file = "pydocstyle-6.1.1.tar.gz", hash = 
"sha256:1d41b7c459ba0ee6c345f2eb9ae827cab14a7533a88c5c6f7e94923f72df92dc"},
  ]
  
  [package.dependencies]

-snowballstemmer = ">=2.2.0"
+snowballstemmer = "*"
  
  [package.extras]

-toml = ["tomli (>=1.2.3)"]
+toml = ["toml"]
  
  [[package]]

  name = "pyflakes"
@@ -837,4 +837,4 @@ jsonschema = ">=4,<5"
  [metadata]
  lock-version = "2.0"
  python-versions = "^3.10"
-content-hash = 
"0b1e4a1cb8323e17e5ee5951c97e74bde6e60d0413d7b25b1803d5b2bab39639"
+content-hash = 
"3501e97b3dadc19fe8ae179fe21b1edd2488001da9a8e86ff2bca0b86b99b89b"
diff --git a/dts/pyproject.toml b/dts/pyproject.toml
index 6762edfa6b..3943c87c87 100644
--- a/dts/pyproject.toml
+++ b/dts/pyproject.toml
@@ -25,6 +25,7 @@ PyYAML = "^6.0"
  types-PyYAML = "^6.0.8"
  fabric = "^2.7.1"
  scapy = "^2.5.0"
+pydocstyle = "6.1.1"
  
  [tool.poetry.group.dev.dependencies]

  mypy = "^0.961"
@@ -39,10 +40,13 @@ requires = ["poetry-core>=1.0.0"]
  build-backend = "poetry.core.masonry.api"
  
  [tool.pylama]

-linters = "mccabe,pycodestyle,pyflakes"
+linters = "mccabe,pycodestyle,pydocstyle,pyflakes"
  format = "pylint"
  max_line_length = 88 # 
https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length
  
+[tool.pylama.linter.pydocstyle]

+convention = "google"
+
  [tool.mypy]
  python_version = "3.10"
  enable_error_code = ["ignore-without-code"]


Reviewed-by: Yoan Picchi 


Re: [PATCH v5 01/23] dts: code adjustments for doc generation

2023-11-08 Thread Yoan Picchi

On 11/6/23 17:15, Juraj Linkeš wrote:

The standard Python tool for generating API documentation, Sphinx,
imports modules one-by-one when generating the documentation. This
requires code changes:
* properly guarding argument parsing in the if __name__ == '__main__'
   block,
* the logger used by DTS runner underwent the same treatment so that it
   doesn't create log files outside of a DTS run,
* however, DTS uses the arguments to construct an object holding global
   variables. The defaults for the global variables needed to be moved
   from argument parsing elsewhere,
* importing the remote_session module from framework resulted in
   circular imports because of one module trying to import another
   module. This is fixed by reorganizing the code,
* some code reorganization was done because the resulting structure
   makes more sense, improving documentation clarity.

The are some other changes which are documentation related:
* added missing type annotation so they appear in the generated docs,
* reordered arguments in some methods,
* removed superfluous arguments and attributes,
* change private functions/methods/attributes to private and vice-versa.

The above all appear in the generated documentation and the with them,
the documentation is improved.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/config/__init__.py  | 10 ++-
  dts/framework/dts.py  | 33 +--
  dts/framework/exception.py| 54 +---
  dts/framework/remote_session/__init__.py  | 41 -
  .../interactive_remote_session.py |  0
  .../{remote => }/interactive_shell.py |  0
  .../{remote => }/python_shell.py  |  0
  .../remote_session/remote/__init__.py | 27 --
  .../{remote => }/remote_session.py|  0
  .../{remote => }/ssh_session.py   | 12 +--
  .../{remote => }/testpmd_shell.py |  0
  dts/framework/settings.py | 87 +++
  dts/framework/test_result.py  |  4 +-
  dts/framework/test_suite.py   |  7 +-
  dts/framework/testbed_model/__init__.py   | 12 +--
  dts/framework/testbed_model/{hw => }/cpu.py   | 13 +++
  dts/framework/testbed_model/hw/__init__.py| 27 --
  .../linux_session.py  |  6 +-
  dts/framework/testbed_model/node.py   | 26 --
  .../os_session.py | 22 ++---
  dts/framework/testbed_model/{hw => }/port.py  |  0
  .../posix_session.py  |  4 +-
  dts/framework/testbed_model/sut_node.py   |  8 +-
  dts/framework/testbed_model/tg_node.py| 30 +--
  .../traffic_generator/__init__.py | 24 +
  .../capturing_traffic_generator.py|  6 +-
  .../{ => traffic_generator}/scapy.py  | 23 ++---
  .../traffic_generator.py  | 16 +++-
  .../testbed_model/{hw => }/virtual_device.py  |  0
  dts/framework/utils.py| 46 +++---
  dts/main.py   |  9 +-
  31 files changed, 259 insertions(+), 288 deletions(-)
  rename dts/framework/remote_session/{remote => 
}/interactive_remote_session.py (100%)
  rename dts/framework/remote_session/{remote => }/interactive_shell.py (100%)
  rename dts/framework/remote_session/{remote => }/python_shell.py (100%)
  delete mode 100644 dts/framework/remote_session/remote/__init__.py
  rename dts/framework/remote_session/{remote => }/remote_session.py (100%)
  rename dts/framework/remote_session/{remote => }/ssh_session.py (91%)
  rename dts/framework/remote_session/{remote => }/testpmd_shell.py (100%)
  rename dts/framework/testbed_model/{hw => }/cpu.py (95%)
  delete mode 100644 dts/framework/testbed_model/hw/__init__.py
  rename dts/framework/{remote_session => testbed_model}/linux_session.py (97%)
  rename dts/framework/{remote_session => testbed_model}/os_session.py (95%)
  rename dts/framework/testbed_model/{hw => }/port.py (100%)
  rename dts/framework/{remote_session => testbed_model}/posix_session.py (98%)
  create mode 100644 dts/framework/testbed_model/traffic_generator/__init__.py
  rename dts/framework/testbed_model/{ => 
traffic_generator}/capturing_traffic_generator.py (96%)
  rename dts/framework/testbed_model/{ => traffic_generator}/scapy.py (95%)
  rename dts/framework/testbed_model/{ => 
traffic_generator}/traffic_generator.py (80%)
  rename dts/framework/testbed_model/{hw => }/virtual_device.py (100%)

diff --git a/dts/framework/config/__init__.py b/dts/framework/config/__init__.py
index cb7e00ba34..2044c82611 100644
--- a/dts/framework/config/__init__.py
+++ b/dts/framework/config/__init__.py
@@ -17,6 +17,7 @@
  import warlock  # type: ignore[import]
  import yaml
  
+from framework.exception import ConfigurationError

  from framework.settings import SETTINGS
  from framework.utils import StrEnum
  
@@ -89,7 +90,7 @@ class TrafficGeneratorConfig:

  traffi

Re: [PATCH v6 22/23] dts: add doc generation dependencies

2023-11-08 Thread Yoan Picchi

On 11/8/23 12:53, Juraj Linkeš wrote:

Sphinx imports every Python module when generating documentation from
docstrings, meaning all dts dependencies, including Python version,
must be satisfied.
By adding Sphinx to dts dependencies we make sure that the proper
Python version and dependencies are used when Sphinx is executed.

Signed-off-by: Juraj Linkeš 
---
  dts/poetry.lock| 499 -
  dts/pyproject.toml |   7 +
  2 files changed, 505 insertions(+), 1 deletion(-)

diff --git a/dts/poetry.lock b/dts/poetry.lock
index a734fa71f0..dea98f6913 100644
--- a/dts/poetry.lock
+++ b/dts/poetry.lock
@@ -1,5 +1,16 @@
  # This file is automatically @generated by Poetry 1.5.1 and should not be 
changed by hand.
  
+[[package]]

+name = "alabaster"
+version = "0.7.13"
+description = "A configurable sidebar-enabled Sphinx theme"
+optional = false
+python-versions = ">=3.6"
+files = [
+{file = "alabaster-0.7.13-py3-none-any.whl", hash = 
"sha256:1ee19aca801bbabb5ba3f5f258e4422dfa86f82f3e9cefb0859b283cdd7f62a3"},
+{file = "alabaster-0.7.13.tar.gz", hash = 
"sha256:a27a4a084d5e690e16e01e03ad2b2e552c61a65469419b907243193de1a84ae2"},
+]
+
  [[package]]
  name = "attrs"
  version = "23.1.0"
@@ -18,6 +29,23 @@ docs = ["furo", "myst-parser", "sphinx", "sphinx-notfound-page", 
"sphinxcontrib-
  tests = ["attrs[tests-no-zope]", "zope-interface"]
  tests-no-zope = ["cloudpickle", "hypothesis", "mypy (>=1.1.1)", "pympler", "pytest (>=4.3.0)", 
"pytest-mypy-plugins", "pytest-xdist[psutil]"]
  
+[[package]]

+name = "babel"
+version = "2.13.1"
+description = "Internationalization utilities"
+optional = false
+python-versions = ">=3.7"
+files = [
+{file = "Babel-2.13.1-py3-none-any.whl", hash = 
"sha256:7077a4984b02b6727ac10f1f7294484f737443d7e2e66c5e4380e41a3ae0b4ed"},
+{file = "Babel-2.13.1.tar.gz", hash = 
"sha256:33e0952d7dd6374af8dbf6768cc4ddf3ccfefc244f9986d4074704f2fbd18900"},
+]
+
+[package.dependencies]
+setuptools = {version = "*", markers = "python_version >= \"3.12\""}
+
+[package.extras]
+dev = ["freezegun (>=1.0,<2.0)", "pytest (>=6.0)", "pytest-cov"]
+
  [[package]]
  name = "bcrypt"
  version = "4.0.1"
@@ -86,6 +114,17 @@ d = ["aiohttp (>=3.7.4)"]
  jupyter = ["ipython (>=7.8.0)", "tokenize-rt (>=3.2.0)"]
  uvloop = ["uvloop (>=0.15.2)"]
  
+[[package]]

+name = "certifi"
+version = "2023.7.22"
+description = "Python package for providing Mozilla's CA Bundle."
+optional = false
+python-versions = ">=3.6"
+files = [
+{file = "certifi-2023.7.22-py3-none-any.whl", hash = 
"sha256:92d6037539857d8206b8f6ae472e8b77db8058fec5937a1ef3f54304089edbb9"},
+{file = "certifi-2023.7.22.tar.gz", hash = 
"sha256:539cc1d13202e33ca466e88b2807e29f4c13049d6d87031a3c110744495cb082"},
+]
+
  [[package]]
  name = "cffi"
  version = "1.15.1"
@@ -162,6 +201,105 @@ files = [
  [package.dependencies]
  pycparser = "*"
  
+[[package]]

+name = "charset-normalizer"
+version = "3.3.2"
+description = "The Real First Universal Charset Detector. Open, modern and actively 
maintained alternative to Chardet."
+optional = false
+python-versions = ">=3.7.0"
+files = [
+{file = "charset-normalizer-3.3.2.tar.gz", hash = 
"sha256:f30c3cb33b24454a82faecaf01b19c18562b1e89558fb6c56de4d9118a032fd5"},
+{file = "charset_normalizer-3.3.2-cp310-cp310-macosx_10_9_universal2.whl", hash = 
"sha256:25baf083bf6f6b341f4121c2f3c548875ee6f5339300e08be3f2b2ba1721cdd3"},
+{file = "charset_normalizer-3.3.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = 
"sha256:06435b539f889b1f6f4ac1758871aae42dc3a8c0e24ac9e60c2384973ad73027"},
+{file = "charset_normalizer-3.3.2-cp310-cp310-macosx_11_0_arm64.whl", hash = 
"sha256:9063e24fdb1e498ab71cb7419e24622516c4a04476b17a2dab57e8baa30d6e03"},
+{file = 
"charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", 
hash = "sha256:6897af51655e3691ff853668779c7bad41579facacf5fd7253b0133308cf000d"},
+{file = 
"charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", 
hash = "sha256:1d3193f4a680c64b4b6a9115943538edb896edc190f0b222e73761716519268e"},
+{file = 
"charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash 
= "sha256:cd70574b12bb8a4d2aaa0094515df2463cb429d8536cfb6c7ce983246983e5a6"},
+{file = 
"charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", 
hash = "sha256:8465322196c8b4d7ab6d1e049e4c5cb460d0394da4a27d23cc242fbf0034b6b5"},
+{file = 
"charset_normalizer-3.3.2-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl",
 hash = "sha256:a9a8e9031d613fd2009c182b69c7b2c1ef8239a0efb1df3f7c8da66d5dd3d537"},
+{file = "charset_normalizer-3.3.2-cp310-cp310-musllinux_1_1_aarch64.whl", hash = 
"sha256:beb58fe5cdb101e3a055192ac291b7a21e3b7ef4f67fa1d74e331a7f2124341c"},
+{file = "charset_normalizer-3.3.2-cp310-cp310-musllinux_1_1_i686.whl", hash = 
"sha256:e06ed3eb3218bc64786f7db419

Re: [PATCH v6 05/23] dts: settings docstring update

2023-11-08 Thread Yoan Picchi

On 11/8/23 12:53, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/settings.py | 101 +-
  1 file changed, 100 insertions(+), 1 deletion(-)

diff --git a/dts/framework/settings.py b/dts/framework/settings.py
index 7f5841d073..787db7c198 100644
--- a/dts/framework/settings.py
+++ b/dts/framework/settings.py
@@ -3,6 +3,70 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022 University of New Hampshire
  
+"""Environment variables and command line arguments parsing.

+
+This is a simple module utilizing the built-in argparse module to parse 
command line arguments,
+augment them with values from environment variables and make them available 
across the framework.
+
+The command line value takes precedence, followed by the environment variable 
value,
+followed by the default value defined in this module.
+
+The command line arguments along with the supported environment variables are:
+
+.. option:: --config-file
+.. envvar:: DTS_CFG_FILE
+
+The path to the YAML test run configuration file.
+
+.. option:: --output-dir, --output
+.. envvar:: DTS_OUTPUT_DIR
+
+The directory where DTS logs and results are saved.
+
+.. option:: --compile-timeout
+.. envvar:: DTS_COMPILE_TIMEOUT
+
+The timeout for compiling DPDK.
+
+.. option:: -t, --timeout
+.. envvar:: DTS_TIMEOUT
+
+The timeout for all DTS operation except for compiling DPDK.
+
+.. option:: -v, --verbose
+.. envvar:: DTS_VERBOSE
+
+Set to any value to enable logging everything to the console.
+
+.. option:: -s, --skip-setup
+.. envvar:: DTS_SKIP_SETUP
+
+Set to any value to skip building DPDK.
+
+.. option:: --tarball, --snapshot, --git-ref
+.. envvar:: DTS_DPDK_TARBALL
+
+The path to a DPDK tarball, git commit ID, tag ID or tree ID to test.
+
+.. option:: --test-cases
+.. envvar:: DTS_TESTCASES
+
+A comma-separated list of test cases to execute. Unknown test cases will 
be silently ignored.
+
+.. option:: --re-run, --re_run
+.. envvar:: DTS_RERUN
+
+Re-run each test case this many times in case of a failure.
+
+Attributes:
+SETTINGS: The module level variable storing framework-wide DTS settings.


In the generated doc, "Attributes" doesn't appear. It ends up looking 
like SETTINGS is just another environment variable, with no separation 
with the above list.



+
+Typical usage example::
+
+  from framework.settings import SETTINGS
+  foo = SETTINGS.foo
+"""
+
  import argparse
  import os
  from collections.abc import Callable, Iterable, Sequence
@@ -16,6 +80,23 @@
  
  
  def _env_arg(env_var: str) -> Any:

+"""A helper method augmenting the argparse Action with environment variable 
> +
+If the supplied environment variable is defined, then the default value
+of the argument is modified. This satisfies the priority order of
+command line argument > environment variable > default value.
+
+Arguments with no values (flags) should be defined using the const keyword 
argument
+(True or False). When the argument is specified, it will be set to const, 
if not specified,
+the default will be stored (possibly modified by the corresponding 
environment variable).
+
+Other arguments work the same as default argparse arguments, that is using
+the default 'store' action.
+
+Returns:
+  The modified argparse.Action.
+"""
+
  class _EnvironmentArgument(argparse.Action):
  def __init__(
  self,
@@ -68,14 +149,28 @@ def __call__(
  
  @dataclass(slots=True)

  class Settings:
+"""Default framework-wide user settings.
+
+The defaults may be modified at the start of the run.
+"""
+
+#:
  config_file_path: Path = 
Path(__file__).parent.parent.joinpath("conf.yaml")
+#:
  output_dir: str = "output"
+#:
  timeout: float = 15
+#:
  verbose: bool = False
+#:
  skip_setup: bool = False
+#:
  dpdk_tarball_path: Path | str = "dpdk.tar.xz"
+#:
  compile_timeout: float = 1200
+#:
  test_cases: list[str] = field(default_factory=list)
+#:
  re_run: int = 0


For some reason in the doc, __init__ also appears : 
__init__(config_file_path: ~pathlib.Path = PosixPath('/ho...


  
  
@@ -169,7 +264,7 @@ def _get_parser() -> argparse.ArgumentParser:

  action=_env_arg("DTS_RERUN"),
  default=SETTINGS.re_run,
  type=int,
-help="[DTS_RERUN] Re-run each test case the specified amount of times "
+help="[DTS_RERUN] Re-run each test case the specified number of times "
  "if a test failure occurs",
  )
  
@@ -177,6 +272,10 @@ def _get_parser() -> argparse.ArgumentParser:
  
  
  def get_settings() -> Settings:

+"""Create new settings with inputs from the user.
+
+The inputs are taken from the command line and from environment variables.
+"""
  parsed_args = _get_parser().parse_args()
   

Re: [PATCH v6 06/23] dts: logger and settings docstring update

2023-11-08 Thread Yoan Picchi

On 11/8/23 12:53, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/logger.py | 72 +--
  dts/framework/utils.py  | 96 ++---
  2 files changed, 121 insertions(+), 47 deletions(-)

diff --git a/dts/framework/logger.py b/dts/framework/logger.py
index bb2991e994..d3eb75a4e4 100644
--- a/dts/framework/logger.py
+++ b/dts/framework/logger.py
@@ -3,9 +3,9 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022-2023 University of New Hampshire
  
-"""

-DTS logger module with several log level. DTS framework and TestSuite logs
-are saved in different log files.
+"""DTS logger module.
+
+DTS framework and TestSuite logs are saved in different log files.
  """
  
  import logging

@@ -18,19 +18,21 @@
  stream_fmt = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
  
  
-class LoggerDictType(TypedDict):

-logger: "DTSLOG"
-name: str
-node: str
-
+class DTSLOG(logging.LoggerAdapter):
+"""DTS logger adapter class for framework and testsuites.
  
-# List for saving all using loggers

-Loggers: list[LoggerDictType] = []
+The :option:`--verbose` command line argument and the 
:envvar:`DTS_VERBOSE` environment
+variable control the verbosity of output. If enabled, all messages will be 
emitted to the
+console.
  
+The :option:`--output` command line argument and the :envvar:`DTS_OUTPUT_DIR` environment

+variable modify the directory where the logs will be stored.
  
-class DTSLOG(logging.LoggerAdapter):

-"""
-DTS log class for framework and testsuite.
+Attributes:
+node: The additional identifier. Currently unused.
+sh: The handler which emits logs to console.
+fh: The handler which emits logs to a file.
+verbose_fh: Just as fh, but logs with a different, more verbose, 
format.
  """
  
  _logger: logging.Logger

@@ -40,6 +42,15 @@ class DTSLOG(logging.LoggerAdapter):
  verbose_fh: logging.FileHandler
  
  def __init__(self, logger: logging.Logger, node: str = "suite"):

+"""Extend the constructor with additional handlers.
+
+One handler logs to the console, the other one to a file, with either 
a regular or verbose
+format.
+
+Args:
+logger: The logger from which to create the logger adapter.
+node: An additional identifier. Currently unused.
+"""
  self._logger = logger
  # 1 means log everything, this will be used by file handlers if their 
level
  # is not set
@@ -92,26 +103,43 @@ def __init__(self, logger: logging.Logger, node: str = 
"suite"):
  super(DTSLOG, self).__init__(self._logger, dict(node=self.node))
  
  def logger_exit(self) -> None:

-"""
-Remove stream handler and logfile handler.
-"""
+"""Remove the stream handler and the logfile handler."""
  for handler in (self.sh, self.fh, self.verbose_fh):
  handler.flush()
  self._logger.removeHandler(handler)
  
  
+class _LoggerDictType(TypedDict):

+logger: DTSLOG
+name: str
+node: str
+
+
+# List for saving all loggers in use
+_Loggers: list[_LoggerDictType] = []
+
+
  def getLogger(name: str, node: str = "suite") -> DTSLOG:
+"""Get DTS logger adapter identified by name and node.
+
+An existing logger will be return if one with the exact name and node 
already exists.
+A new one will be created and stored otherwise.
+
+Args:
+name: The name of the logger.
+node: An additional identifier for the logger.
+
+Returns:
+A logger uniquely identified by both name and node.
  """
-Get logger handler and if there's no handler for specified Node will 
create one.
-"""
-global Loggers
+global _Loggers
  # return saved logger
-logger: LoggerDictType
-for logger in Loggers:
+logger: _LoggerDictType
+for logger in _Loggers:
  if logger["name"] == name and logger["node"] == node:
  return logger["logger"]
  
  # return new logger

  dts_logger: DTSLOG = DTSLOG(logging.getLogger(name), node)
-Loggers.append({"logger": dts_logger, "name": name, "node": node})
+_Loggers.append({"logger": dts_logger, "name": name, "node": node})
  return dts_logger
diff --git a/dts/framework/utils.py b/dts/framework/utils.py
index f0c916471c..0613adf7ad 100644
--- a/dts/framework/utils.py
+++ b/dts/framework/utils.py
@@ -3,6 +3,16 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022-2023 University of New Hampshire
  
+"""Various utility classes and functions.

+
+These are used in multiple modules across the framework. They're here because
+they provide some non-specific functionality, greatly simplify imports or just 
don't
+fit elsewhere.
+
+Attributes:
+REGEX_FOR_PCI_ADDRESS: The regex representing

Re: [PATCH v7 21/21] dts: test suites docstring update

2023-11-16 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/tests/TestSuite_hello_world.py | 16 +
  dts/tests/TestSuite_os_udp.py  | 19 +++
  dts/tests/TestSuite_smoke_tests.py | 53 +++---
  3 files changed, 70 insertions(+), 18 deletions(-)

diff --git a/dts/tests/TestSuite_hello_world.py 
b/dts/tests/TestSuite_hello_world.py
index 7e3d95c0cf..662a8f8726 100644
--- a/dts/tests/TestSuite_hello_world.py
+++ b/dts/tests/TestSuite_hello_world.py
@@ -1,7 +1,8 @@
  # SPDX-License-Identifier: BSD-3-Clause
  # Copyright(c) 2010-2014 Intel Corporation
  
-"""

+"""The DPDK hello world app test suite.
+
  Run the helloworld example app and verify it prints a message for each used 
core.
  No other EAL parameters apart from cores are used.
  """
@@ -15,22 +16,25 @@
  
  
  class TestHelloWorld(TestSuite):

+"""DPDK hello world app test suite."""
+
  def set_up_suite(self) -> None:
-"""
+"""Set up the test suite.
+
  Setup:
  Build the app we're about to test - helloworld.
  """
  self.app_helloworld_path = self.sut_node.build_dpdk_app("helloworld")
  
  def test_hello_world_single_core(self) -> None:

-"""
+"""Single core test case.
+
  Steps:
  Run the helloworld app on the first usable logical core.
  Verify:
  The app prints a message from the used core:
  "hello from core "
  """
-
  # get the first usable core
  lcore_amount = LogicalCoreCount(1, 1, 1)
  lcores = LogicalCoreCountFilter(self.sut_node.lcores, 
lcore_amount).filter()
@@ -44,14 +48,14 @@ def test_hello_world_single_core(self) -> None:
  )
  
  def test_hello_world_all_cores(self) -> None:

-"""
+"""All cores test case.
+
  Steps:
  Run the helloworld app on all usable logical cores.
  Verify:
  The app prints a message from all used cores:
  "hello from core "
  """
-
  # get the maximum logical core number
  eal_para = self.sut_node.create_eal_parameters(
  lcore_filter_specifier=LogicalCoreList(self.sut_node.lcores)
diff --git a/dts/tests/TestSuite_os_udp.py b/dts/tests/TestSuite_os_udp.py
index bf6b93deb5..e0c5239612 100644
--- a/dts/tests/TestSuite_os_udp.py
+++ b/dts/tests/TestSuite_os_udp.py
@@ -1,7 +1,8 @@
  # SPDX-License-Identifier: BSD-3-Clause
  # Copyright(c) 2023 PANTHEON.tech s.r.o.
  
-"""

+"""Basic IPv4 OS routing test suite.
+
  Configure SUT node to route traffic from if1 to if2.
  Send a packet to the SUT node, verify it comes back on the second port on the 
TG node.
  """
@@ -13,24 +14,27 @@
  
  
  class TestOSUdp(TestSuite):

+"""IPv4 UDP OS routing test suite."""
+
  def set_up_suite(self) -> None:
-"""
+"""Set up the test suite.
+
  Setup:
-Configure SUT ports and SUT to route traffic from if1 to if2.
+Bind the SUT ports to the OS driver, configure the ports and 
configure the SUT
+to route traffic from if1 to if2.
  """
  
-# This test uses kernel drivers

  self.sut_node.bind_ports_to_driver(for_dpdk=False)
  self.configure_testbed_ipv4()
  
  def test_os_udp(self) -> None:

-"""
+"""Basic UDP IPv4 traffic test case.
+
  Steps:
  Send a UDP packet.
  Verify:
  The packet with proper addresses arrives at the other TG port.
  """
-
  packet = Ether() / IP() / UDP()
  
  received_packets = self.send_packet_and_capture(packet)

@@ -40,7 +44,8 @@ def test_os_udp(self) -> None:
  self.verify_packets(expected_packet, received_packets)
  
  def tear_down_suite(self) -> None:

-"""
+"""Tear down the test suite.
+
  Teardown:
  Remove the SUT port configuration configured in setup.
  """
diff --git a/dts/tests/TestSuite_smoke_tests.py 
b/dts/tests/TestSuite_smoke_tests.py
index e8016d1b54..6fae099a0e 100644
--- a/dts/tests/TestSuite_smoke_tests.py
+++ b/dts/tests/TestSuite_smoke_tests.py
@@ -1,6 +1,17 @@
  # SPDX-License-Identifier: BSD-3-Clause
  # Copyright(c) 2023 University of New Hampshire
  
+"""Smoke test suite.

+
+Smoke tests are a class of tests which are used for validating a minimal set 
of important features.
+These are the most important features without which (or when they're faulty) 
the software wouldn't
+work properly. Thus, if any failure occurs while testing these features,
+there isn't that much of a reason to continue testing, as the software is 
fundamentally broken.
+
+These tests don't have to include only DPDK tests, as the reason for failures 
could be
+in the infrastructure (a faulty link between NICs or a misconfiguration).
+"""
+
  import re
  
  from fram

Re: [PATCH v7 21/21] dts: test suites docstring update

2023-11-20 Thread Yoan Picchi

On 11/20/23 10:17, Juraj Linkeš wrote:

On Thu, Nov 16, 2023 at 6:36 PM Yoan Picchi  wrote:


On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
   dts/tests/TestSuite_hello_world.py | 16 +
   dts/tests/TestSuite_os_udp.py  | 19 +++
   dts/tests/TestSuite_smoke_tests.py | 53 +++---
   3 files changed, 70 insertions(+), 18 deletions(-)

diff --git a/dts/tests/TestSuite_hello_world.py 
b/dts/tests/TestSuite_hello_world.py
index 7e3d95c0cf..662a8f8726 100644
--- a/dts/tests/TestSuite_hello_world.py
+++ b/dts/tests/TestSuite_hello_world.py
@@ -1,7 +1,8 @@
   # SPDX-License-Identifier: BSD-3-Clause
   # Copyright(c) 2010-2014 Intel Corporation

-"""
+"""The DPDK hello world app test suite.
+
   Run the helloworld example app and verify it prints a message for each used 
core.
   No other EAL parameters apart from cores are used.
   """
@@ -15,22 +16,25 @@


   class TestHelloWorld(TestSuite):
+"""DPDK hello world app test suite."""
+
   def set_up_suite(self) -> None:
-"""
+"""Set up the test suite.
+
   Setup:
   Build the app we're about to test - helloworld.
   """
   self.app_helloworld_path = self.sut_node.build_dpdk_app("helloworld")

   def test_hello_world_single_core(self) -> None:
-"""
+"""Single core test case.
+
   Steps:
   Run the helloworld app on the first usable logical core.
   Verify:
   The app prints a message from the used core:
   "hello from core "
   """
-
   # get the first usable core
   lcore_amount = LogicalCoreCount(1, 1, 1)
   lcores = LogicalCoreCountFilter(self.sut_node.lcores, 
lcore_amount).filter()
@@ -44,14 +48,14 @@ def test_hello_world_single_core(self) -> None:
   )

   def test_hello_world_all_cores(self) -> None:
-"""
+"""All cores test case.
+
   Steps:
   Run the helloworld app on all usable logical cores.
   Verify:
   The app prints a message from all used cores:
   "hello from core "
   """
-
   # get the maximum logical core number
   eal_para = self.sut_node.create_eal_parameters(
   lcore_filter_specifier=LogicalCoreList(self.sut_node.lcores)
diff --git a/dts/tests/TestSuite_os_udp.py b/dts/tests/TestSuite_os_udp.py
index bf6b93deb5..e0c5239612 100644
--- a/dts/tests/TestSuite_os_udp.py
+++ b/dts/tests/TestSuite_os_udp.py
@@ -1,7 +1,8 @@
   # SPDX-License-Identifier: BSD-3-Clause
   # Copyright(c) 2023 PANTHEON.tech s.r.o.

-"""
+"""Basic IPv4 OS routing test suite.
+
   Configure SUT node to route traffic from if1 to if2.
   Send a packet to the SUT node, verify it comes back on the second port on 
the TG node.
   """
@@ -13,24 +14,27 @@


   class TestOSUdp(TestSuite):
+"""IPv4 UDP OS routing test suite."""
+
   def set_up_suite(self) -> None:
-"""
+"""Set up the test suite.
+
   Setup:
-Configure SUT ports and SUT to route traffic from if1 to if2.
+Bind the SUT ports to the OS driver, configure the ports and 
configure the SUT
+to route traffic from if1 to if2.
   """

-# This test uses kernel drivers
   self.sut_node.bind_ports_to_driver(for_dpdk=False)
   self.configure_testbed_ipv4()

   def test_os_udp(self) -> None:
-"""
+"""Basic UDP IPv4 traffic test case.
+
   Steps:
   Send a UDP packet.
   Verify:
   The packet with proper addresses arrives at the other TG port.
   """
-
   packet = Ether() / IP() / UDP()

   received_packets = self.send_packet_and_capture(packet)
@@ -40,7 +44,8 @@ def test_os_udp(self) -> None:
   self.verify_packets(expected_packet, received_packets)

   def tear_down_suite(self) -> None:
-"""
+"""Tear down the test suite.
+
   Teardown:
   Remove the SUT port configuration configured in setup.
   """
diff --git a/dts/tests/TestSuite_smoke_tests.py 
b/dts/tests/TestSuite_smoke_tests.py
index e8016d1b54..6fae099a0e 100644
--- a/dts/tests/TestSuite_smoke_tests.py
+++ b/dts/tests/TestSuite_smoke_tests.py
@@ -1,6 +1,17 @@
   # SPDX-License-Identifier: BSD-3-Clause
   # Copyright(c) 2023 University of New Ha

Re: [PATCH v7 01/21] dts: code adjustments for doc generation

2023-11-20 Thread Yoan Picchi
 function_bytes: xmlrpc.client.Binary) 
-> None:
  """Add a function to the server.
  
  This is meant to be executed remotely.

@@ -191,15 +190,9 @@ class ScapyTrafficGenerator(CapturingTrafficGenerator):
  session: PythonShell
  rpc_server_proxy: xmlrpc.client.ServerProxy
  _config: ScapyTrafficGeneratorConfig
-_tg_node: TGNode
-_logger: DTSLOG
-
-def __init__(self, tg_node: TGNode, config: ScapyTrafficGeneratorConfig):
-self._config = config
-self._tg_node = tg_node
-self._logger = getLogger(
-f"{self._tg_node.name} {self._config.traffic_generator_type}"
-)
+
+def __init__(self, tg_node: Node, config: ScapyTrafficGeneratorConfig):
+super().__init__(tg_node, config)
  
  assert (

  self._tg_node.config.os == OS.linux
@@ -235,7 +228,7 @@ def __init__(self, tg_node: TGNode, config: 
ScapyTrafficGeneratorConfig):
  function_bytes = marshal.dumps(function.__code__)
  self.rpc_server_proxy.add_rpc_function(function.__name__, 
function_bytes)
  
-def _start_xmlrpc_server_in_remote_python(self, listen_port: int):

+def _start_xmlrpc_server_in_remote_python(self, listen_port: int) -> None:
  # load the source of the function
  src = inspect.getsource(QuittableXMLRPCServer)
  # Lines with only whitespace break the repl if in the middle of a 
function
@@ -280,7 +273,7 @@ def _send_packets_and_capture(
  scapy_packets = [Ether(packet.data) for packet in xmlrpc_packets]
  return scapy_packets
  
-def close(self):

+def close(self) -> None:
  try:
  self.rpc_server_proxy.quit()
  except ConnectionRefusedError:
diff --git a/dts/framework/testbed_model/traffic_generator.py 
b/dts/framework/testbed_model/traffic_generator/traffic_generator.py
similarity index 80%
rename from dts/framework/testbed_model/traffic_generator.py
rename to dts/framework/testbed_model/traffic_generator/traffic_generator.py
index 28c35d3ce4..ea7c3963da 100644
--- a/dts/framework/testbed_model/traffic_generator.py
+++ b/dts/framework/testbed_model/traffic_generator/traffic_generator.py
@@ -12,11 +12,12 @@
  
  from scapy.packet import Packet  # type: ignore[import]
  
-from framework.logger import DTSLOG

+from framework.config import TrafficGeneratorConfig
+from framework.logger import DTSLOG, getLogger
+from framework.testbed_model.node import Node
+from framework.testbed_model.port import Port
  from framework.utils import get_packet_summaries
  
-from .hw.port import Port

-
  
  class TrafficGenerator(ABC):

  """The base traffic generator.
@@ -24,8 +25,17 @@ class TrafficGenerator(ABC):
  Defines the few basic methods that each traffic generator must implement.
  """
  
+_config: TrafficGeneratorConfig

+_tg_node: Node
  _logger: DTSLOG
  
+def __init__(self, tg_node: Node, config: TrafficGeneratorConfig):

+self._config = config
+self._tg_node = tg_node
+self._logger = getLogger(
+f"{self._tg_node.name} {self._config.traffic_generator_type}"
+)
+
  def send_packet(self, packet: Packet, port: Port) -> None:
  """Send a packet and block until it is fully sent.
  
diff --git a/dts/framework/testbed_model/hw/virtual_device.py b/dts/framework/testbed_model/virtual_device.py

similarity index 100%
rename from dts/framework/testbed_model/hw/virtual_device.py
rename to dts/framework/testbed_model/virtual_device.py
diff --git a/dts/framework/utils.py b/dts/framework/utils.py
index d27c2c5b5f..f0c916471c 100644
--- a/dts/framework/utils.py
+++ b/dts/framework/utils.py
@@ -7,7 +7,6 @@
  import json
  import os
  import subprocess
-import sys
  from enum import Enum
  from pathlib import Path
  from subprocess import SubprocessError
@@ -16,35 +15,7 @@
  
  from .exception import ConfigurationError
  
-

-class StrEnum(Enum):
-@staticmethod
-def _generate_next_value_(
-name: str, start: int, count: int, last_values: object
-) -> str:
-return name
-
-def __str__(self) -> str:
-return self.name
-
-
-REGEX_FOR_PCI_ADDRESS = 
"/[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}.[0-9]{1}/"
-
-
-def check_dts_python_version() -> None:
-if sys.version_info.major < 3 or (
-sys.version_info.major == 3 and sys.version_info.minor < 10
-):
-print(
-RED(
-(
-"WARNING: DTS execution node's python version is lower 
than"
-"python 3.10, is deprecated and will not work in future 
releases."
-)
-),
-file=sys.stderr,
-)
-print(RED("Ple

Re: [PATCH v7 02/21] dts: add docstring checker

2023-11-20 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Python docstrings are the in-code way to document the code. The
docstring checker of choice is pydocstyle which we're executing from
Pylama, but the current latest versions are not complatible due to [0],
so pin the pydocstyle version to the latest working version.

[0] https://github.com/klen/pylama/issues/232

Signed-off-by: Juraj Linkeš 
---
  dts/poetry.lock| 12 ++--
  dts/pyproject.toml |  6 +-
  2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/dts/poetry.lock b/dts/poetry.lock
index f7b3b6d602..a734fa71f0 100644
--- a/dts/poetry.lock
+++ b/dts/poetry.lock
@@ -489,20 +489,20 @@ files = [
  
  [[package]]

  name = "pydocstyle"
-version = "6.3.0"
+version = "6.1.1"
  description = "Python docstring style checker"
  optional = false
  python-versions = ">=3.6"
  files = [
-{file = "pydocstyle-6.3.0-py3-none-any.whl", hash = 
"sha256:118762d452a49d6b05e194ef344a55822987a462831ade91ec5c06fd2169d019"},
-{file = "pydocstyle-6.3.0.tar.gz", hash = 
"sha256:7ce43f0c0ac87b07494eb9c0b462c0b73e6ff276807f204d6b53edc72b7e44e1"},
+{file = "pydocstyle-6.1.1-py3-none-any.whl", hash = 
"sha256:6987826d6775056839940041beef5c08cc7e3d71d63149b48e36727f70144dc4"},
+{file = "pydocstyle-6.1.1.tar.gz", hash = 
"sha256:1d41b7c459ba0ee6c345f2eb9ae827cab14a7533a88c5c6f7e94923f72df92dc"},
  ]
  
  [package.dependencies]

-snowballstemmer = ">=2.2.0"
+snowballstemmer = "*"
  
  [package.extras]

-toml = ["tomli (>=1.2.3)"]
+toml = ["toml"]
  
  [[package]]

  name = "pyflakes"
@@ -837,4 +837,4 @@ jsonschema = ">=4,<5"
  [metadata]
  lock-version = "2.0"
  python-versions = "^3.10"
-content-hash = 
"0b1e4a1cb8323e17e5ee5951c97e74bde6e60d0413d7b25b1803d5b2bab39639"
+content-hash = 
"3501e97b3dadc19fe8ae179fe21b1edd2488001da9a8e86ff2bca0b86b99b89b"
diff --git a/dts/pyproject.toml b/dts/pyproject.toml
index 6762edfa6b..3943c87c87 100644
--- a/dts/pyproject.toml
+++ b/dts/pyproject.toml
@@ -25,6 +25,7 @@ PyYAML = "^6.0"
  types-PyYAML = "^6.0.8"
  fabric = "^2.7.1"
  scapy = "^2.5.0"
+pydocstyle = "6.1.1"
  
  [tool.poetry.group.dev.dependencies]

  mypy = "^0.961"
@@ -39,10 +40,13 @@ requires = ["poetry-core>=1.0.0"]
  build-backend = "poetry.core.masonry.api"
  
  [tool.pylama]

-linters = "mccabe,pycodestyle,pyflakes"
+linters = "mccabe,pycodestyle,pydocstyle,pyflakes"
  format = "pylint"
  max_line_length = 88 # 
https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length
  
+[tool.pylama.linter.pydocstyle]

+convention = "google"
+
  [tool.mypy]
  python_version = "3.10"
  enable_error_code = ["ignore-without-code"]

Reviewed-by: Yoan Picchi 


Re: [PATCH v7 03/21] dts: add basic developer docs

2023-11-20 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Expand the framework contribution guidelines and add how to document the
code with Python docstrings.

Signed-off-by: Juraj Linkeš 
---
  doc/guides/tools/dts.rst | 73 
  1 file changed, 73 insertions(+)

diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 32c18ee472..cd771a428c 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -264,6 +264,65 @@ which be changed with the ``--output-dir`` command line 
argument.
  The results contain basic statistics of passed/failed test cases and DPDK 
version.
  
  
+Contributing to DTS

+---
+
+There are two areas of contribution: The DTS framework and DTS test suites.
+
+The framework contains the logic needed to run test cases, such as connecting 
to nodes,
+running DPDK apps and collecting results.
+
+The test cases call APIs from the framework to test their scenarios. Adding 
test cases may
+require adding code to the framework as well.
+
+
+Framework Coding Guidelines
+~~~
+
+When adding code to the DTS framework, pay attention to the rest of the code
+and try not to divert much from it. The :ref:`DTS developer tools 
` will issue
+warnings when some of the basics are not met.
+
+The code must be properly documented with docstrings. The style must conform to
+the `Google style 
<https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
+See an example of the style
+`here 
<https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html>`_.
+For cases which are not covered by the Google style, refer
+to `PEP 257 <https://peps.python.org/pep-0257/>`_. There are some cases which 
are not covered by
+the two style guides, where we deviate or where some additional clarification 
is helpful:
+
+   * The __init__() methods of classes are documented separately from the 
docstring of the class
+ itself.
+   * The docstrigs of implemented abstract methods should refer to the 
superclass's definition
+ if there's no deviation.
+   * Instance variables/attributes should be documented in the docstring of 
the class
+ in the ``Attributes:`` section.
+   * The dataclass.dataclass decorator changes how the attributes are 
processed. The dataclass
+ attributes which result in instance variables/attributes should also be 
recorded
+ in the ``Attributes:`` section.
+   * Class variables/attributes, on the other hand, should be documented with 
``#:`` above
+ the type annotated line. The description may be omitted if the meaning is 
obvious.
+   * The Enum and TypedDict also process the attributes in particular ways and 
should be documented
+ with ``#:`` as well. This is mainly so that the autogenerated docs 
contain the assigned value.
+   * When referencing a parameter of a function or a method in their 
docstring, don't use
+ any articles and put the parameter into single backticks. This mimics the 
style of
+ `Python's documentation <https://docs.python.org/3/index.html>`_.
+   * When specifying a value, use double backticks::
+
+def foo(greet: bool) -> None:
+"""Demonstration of single and double backticks.
+
+`greet` controls whether ``Hello World`` is printed.
+
+Args:
+   greet: Whether to print the ``Hello World`` message.
+"""
+if greet:
+   print(f"Hello World")
+
+   * The docstring maximum line length is the same as the code maximum line 
length.
+
+
  How To Write a Test Suite
  -
  
@@ -293,6 +352,18 @@ There are four types of methods that comprise a test suite:

 | These methods don't need to be implemented if there's no need for them 
in a test suite.
   In that case, nothing will happen when they're is executed.
  
+#. **Configuration, traffic and other logic**

+
+   The ``TestSuite`` class contains a variety of methods for anything that
+   a test suite setup, a teardown, or a test case may need to do.
+
+   The test suites also frequently use a DPDK app, such as testpmd, in 
interactive mode
+   and use the interactive shell instances directly.
+
+   These are the two main ways to call the framework logic in test suites. If 
there's any
+   functionality or logic missing from the framework, it should be implemented 
so that
+   the test suites can use one of these two ways.
+
  #. **Test case verification**
  
 Test case verification should be done with the ``verify`` method, which records the result.

@@ -308,6 +379,8 @@ There are four types of methods that comprise a test suite:
 and used by the test suite via the ``sut_node`` field.
  
  
+.. _dts_dev_tools:

+
  DTS Developer Tools
  ---
  

Reviewed-by: Yoan Picchi 


Re: [PATCH v7 04/21] dts: exceptions docstring update

2023-11-20 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/__init__.py  |  12 -
  dts/framework/exception.py | 106 +
  2 files changed, 83 insertions(+), 35 deletions(-)

diff --git a/dts/framework/__init__.py b/dts/framework/__init__.py
index d551ad4bf0..662e6ccad2 100644
--- a/dts/framework/__init__.py
+++ b/dts/framework/__init__.py
@@ -1,3 +1,13 @@
  # SPDX-License-Identifier: BSD-3-Clause
-# Copyright(c) 2022 PANTHEON.tech s.r.o.
+# Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022 University of New Hampshire
+
+"""Libraries and utilities for running DPDK Test Suite (DTS).
+
+The various modules in the DTS framework offer:
+
+* Connections to nodes, both interactive and non-interactive,
+* A straightforward way to add support for different operating systems of 
remote nodes,
+* Test suite setup, execution and teardown, along with test case setup, 
execution and teardown,
+* Pre-test suite setup and post-test suite teardown.
+"""
diff --git a/dts/framework/exception.py b/dts/framework/exception.py
index 7489c03570..ee1562c672 100644
--- a/dts/framework/exception.py
+++ b/dts/framework/exception.py
@@ -3,8 +3,10 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022-2023 University of New Hampshire
  
-"""

-User-defined exceptions used across the framework.
+"""DTS exceptions.
+
+The exceptions all have different severities expressed as an integer.
+The highest severity of all raised exception is used as the exit code of DTS.


all raised exception*s*


  """
  
  from enum import IntEnum, unique

@@ -13,59 +15,79 @@
  
  @unique

  class ErrorSeverity(IntEnum):
-"""
-The severity of errors that occur during DTS execution.
+"""The severity of errors that occur during DTS execution.
+
  All exceptions are caught and the most severe error is used as return 
code.
  """
  
+#:

  NO_ERR = 0
+#:
  GENERIC_ERR = 1
+#:
  CONFIG_ERR = 2
+#:
  REMOTE_CMD_EXEC_ERR = 3
+#:
  SSH_ERR = 4
+#:
  DPDK_BUILD_ERR = 10
+#:
  TESTCASE_VERIFY_ERR = 20
+#:
  BLOCKING_TESTSUITE_ERR = 25
  
  
  class DTSError(Exception):

-"""
-The base exception from which all DTS exceptions are derived.
-Stores error severity.
+"""The base exception from which all DTS exceptions are subclassed.
+
+Do not use this exception, only use subclassed exceptions.
  """
  
+#:

  severity: ClassVar[ErrorSeverity] = ErrorSeverity.GENERIC_ERR
  
  
  class SSHTimeoutError(DTSError):

-"""
-Command execution timeout.
-"""
+"""The SSH execution of a command timed out."""
  
+#:

  severity: ClassVar[ErrorSeverity] = ErrorSeverity.SSH_ERR
  _command: str
  
  def __init__(self, command: str):

+"""Define the meaning of the first argument.
+
+Args:
+command: The executed command.
+"""
  self._command = command
  
  def __str__(self) -> str:

-return f"TIMEOUT on {self._command}"
+"""Add some context to the string representation."""
+return f"{self._command} execution timed out."
  
  
  class SSHConnectionError(DTSError):

-"""
-SSH connection error.
-"""
+"""An unsuccessful SSH connection."""
  
+#:

  severity: ClassVar[ErrorSeverity] = ErrorSeverity.SSH_ERR
  _host: str
  _errors: list[str]
  
  def __init__(self, host: str, errors: list[str] | None = None):

+"""Define the meaning of the first two arguments.
+
+Args:
+host: The hostname to which we're trying to connect.
+errors: Any errors that occurred during the connection attempt.
+"""
  self._host = host
  self._errors = [] if errors is None else errors
  
  def __str__(self) -> str:

+"""Include the errors in the string representation."""
  message = f"Error trying to connect with {self._host}."
  if self._errors:
  message += f" Errors encountered while retrying: {', 
'.join(self._errors)}"
@@ -74,43 +96,53 @@ def __str__(self) -> str:
  
  
  class SSHSessionDeadError(DTSError):

-"""
-SSH session is not alive.
-It can no longer be used.
-"""
+"""The SSH session is no longer alive."""
  
+#:

  severity: ClassVar[ErrorSeverity] = ErrorSeverity.SSH_ERR
  _host: str
  
  def __init__(self, host: str):

+"""Define the meaning of the first argument.
+
+Args:
+host: The hostname of the disconnected node.
+"""
  self._host = host
  
  def __str__(self) -> str:

-return f"SSH session with {self._host} has died"
+"""Add some context to the string representation."""
+return f"SSH session with {self._host} has died."
  
  
  class ConfigurationError(DTSError):

-   

Re: [PATCH v7 06/21] dts: logger and utils docstring update

2023-11-20 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/logger.py | 72 ++---
  dts/framework/utils.py  | 88 +
  2 files changed, 113 insertions(+), 47 deletions(-)

diff --git a/dts/framework/logger.py b/dts/framework/logger.py
index bb2991e994..d3eb75a4e4 100644
--- a/dts/framework/logger.py
+++ b/dts/framework/logger.py
@@ -3,9 +3,9 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022-2023 University of New Hampshire
  
-"""

-DTS logger module with several log level. DTS framework and TestSuite logs
-are saved in different log files.
+"""DTS logger module.
+
+DTS framework and TestSuite logs are saved in different log files.
  """
  
  import logging

@@ -18,19 +18,21 @@
  stream_fmt = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
  
  
-class LoggerDictType(TypedDict):

-logger: "DTSLOG"
-name: str
-node: str
-
+class DTSLOG(logging.LoggerAdapter):
+"""DTS logger adapter class for framework and testsuites.
  
-# List for saving all using loggers

-Loggers: list[LoggerDictType] = []
+The :option:`--verbose` command line argument and the 
:envvar:`DTS_VERBOSE` environment
+variable control the verbosity of output. If enabled, all messages will be 
emitted to the
+console.
  
+The :option:`--output` command line argument and the :envvar:`DTS_OUTPUT_DIR` environment

+variable modify the directory where the logs will be stored.
  
-class DTSLOG(logging.LoggerAdapter):

-"""
-DTS log class for framework and testsuite.
+Attributes:
+node: The additional identifier. Currently unused.
+sh: The handler which emits logs to console.
+fh: The handler which emits logs to a file.
+verbose_fh: Just as fh, but logs with a different, more verbose, 
format.
  """
  
  _logger: logging.Logger

@@ -40,6 +42,15 @@ class DTSLOG(logging.LoggerAdapter):
  verbose_fh: logging.FileHandler
  
  def __init__(self, logger: logging.Logger, node: str = "suite"):

+"""Extend the constructor with additional handlers.
+
+One handler logs to the console, the other one to a file, with either 
a regular or verbose
+format.
+
+Args:
+logger: The logger from which to create the logger adapter.
+node: An additional identifier. Currently unused.
+"""
  self._logger = logger
  # 1 means log everything, this will be used by file handlers if their 
level
  # is not set
@@ -92,26 +103,43 @@ def __init__(self, logger: logging.Logger, node: str = 
"suite"):
  super(DTSLOG, self).__init__(self._logger, dict(node=self.node))
  
  def logger_exit(self) -> None:

-"""
-Remove stream handler and logfile handler.
-"""
+"""Remove the stream handler and the logfile handler."""
  for handler in (self.sh, self.fh, self.verbose_fh):
  handler.flush()
  self._logger.removeHandler(handler)
  
  
+class _LoggerDictType(TypedDict):

+logger: DTSLOG
+name: str
+node: str
+
+
+# List for saving all loggers in use
+_Loggers: list[_LoggerDictType] = []
+
+
  def getLogger(name: str, node: str = "suite") -> DTSLOG:
+"""Get DTS logger adapter identified by name and node.
+
+An existing logger will be return if one with the exact name and node 
already exists.


An existing logger will be return*ed*


+A new one will be created and stored otherwise.
+
+Args:
+name: The name of the logger.
+node: An additional identifier for the logger.
+
+Returns:
+A logger uniquely identified by both name and node.
  """
-Get logger handler and if there's no handler for specified Node will 
create one.
-"""
-global Loggers
+global _Loggers
  # return saved logger
-logger: LoggerDictType
-for logger in Loggers:
+logger: _LoggerDictType
+for logger in _Loggers:
  if logger["name"] == name and logger["node"] == node:
  return logger["logger"]
  
  # return new logger

  dts_logger: DTSLOG = DTSLOG(logging.getLogger(name), node)
-Loggers.append({"logger": dts_logger, "name": name, "node": node})
+_Loggers.append({"logger": dts_logger, "name": name, "node": node})
  return dts_logger
diff --git a/dts/framework/utils.py b/dts/framework/utils.py
index f0c916471c..5016e3be10 100644
--- a/dts/framework/utils.py
+++ b/dts/framework/utils.py
@@ -3,6 +3,16 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022-2023 University of New Hampshire
  
+"""Various utility classes and functions.

+
+These are used in multiple modules across the framework. They're here because
+they provide some non-specific functionality, greatly simplify imports or just 
don't
+fit elsewhere.
+
+Attributes:
+

Re: [PATCH v7 07/21] dts: dts runner and main docstring update

2023-11-20 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/dts.py | 128 ---
  dts/main.py  |   8 ++-
  2 files changed, 112 insertions(+), 24 deletions(-)

diff --git a/dts/framework/dts.py b/dts/framework/dts.py
index 4c7fb0c40a..331fed7dc4 100644
--- a/dts/framework/dts.py
+++ b/dts/framework/dts.py
@@ -3,6 +3,33 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022-2023 University of New Hampshire
  
+r"""Test suite runner module.


Is the r before the docstring intended?


+
+A DTS run is split into stages:
+
+#. Execution stage,
+#. Build target stage,
+#. Test suite stage,
+#. Test case stage.
+
+The module is responsible for running tests on testbeds defined in the test 
run configuration.
+Each setup or teardown of each stage is recorded in a 
:class:`~framework.test_result.DTSResult` or
+one of its subclasses. The test case results are also recorded.
+
+If an error occurs, the current stage is aborted, the error is recorded and 
the run continues in
+the next iteration of the same stage. The return code is the highest 
`severity` of all
+:class:`~.framework.exception.DTSError`\s.


Is the . before the classname intended? considering the previous one 
doesn't have one. (I've not yet built the doc to check if it affect the 
rendered doc)



+
+Example:
+An error occurs in a build target setup. The current build target is 
aborted and the run
+continues with the next build target. If the errored build target was the 
last one in the given
+execution, the next execution begins.
+
+Attributes:
+dts_logger: The logger instance used in this module.
+result: The top level result used in the module.
+"""
+
  import sys
  
  from .config import (

@@ -23,9 +50,38 @@
  
  
  def run_all() -> None:

-"""
-The main process of DTS. Runs all build targets in all executions from the 
main
-config file.
+"""Run all build targets in all executions from the test run configuration.
+
+Before running test suites, executions and build targets are first set up.
+The executions and build targets defined in the test run configuration are 
iterated over.
+The executions define which tests to run and where to run them and build 
targets define
+the DPDK build setup.
+
+The tests suites are set up for each execution/build target tuple and each 
scheduled
+test case within the test suite is set up, executed and torn down. After 
all test cases
+have been executed, the test suite is torn down and the next build target 
will be tested.
+
+All the nested steps look like this:
+
+#. Execution setup
+
+#. Build target setup
+
+#. Test suite setup
+
+#. Test case setup
+#. Test case logic
+#. Test case teardown
+
+#. Test suite teardown
+
+#. Build target teardown
+
+#. Execution teardown
+
+The test cases are filtered according to the specification in the test run 
configuration and
+the :option:`--test-cases` command line argument or
+the :envvar:`DTS_TESTCASES` environment variable.
  """
  global dts_logger
  global result
@@ -87,6 +143,8 @@ def run_all() -> None:
  
  
  def _check_dts_python_version() -> None:

+"""Check the required Python version - v3.10."""
+
  def RED(text: str) -> str:
  return f"\u001B[31;1m{str(text)}\u001B[0m"
  
@@ -111,9 +169,16 @@ def _run_execution(

  execution: ExecutionConfiguration,
  result: DTSResult,
  ) -> None:
-"""
-Run the given execution. This involves running the execution setup as well 
as
-running all build targets in the given execution.
+"""Run the given execution.
+
+This involves running the execution setup as well as running all build 
targets
+in the given execution. After that, execution teardown is run.
+
+Args:
+sut_node: The execution's SUT node.
+tg_node: The execution's TG node.
+execution: An execution's test run configuration.
+result: The top level result object.
  """
  dts_logger.info(
  f"Running execution with SUT 
'{execution.system_under_test_node.name}'."
@@ -150,8 +215,18 @@ def _run_build_target(
  execution: ExecutionConfiguration,
  execution_result: ExecutionResult,
  ) -> None:
-"""
-Run the given build target.
+"""Run the given build target.
+
+This involves running the build target setup as well as running all test 
suites
+in the given execution the build target is defined in.
+After that, build target teardown is run.
+
+Args:
+sut_node: The execution's SUT node.
+tg_node: The execution's TG node.
+build_target: A build target's test run configuration.
+execution: The build target'

Re: [PATCH v7 10/21] dts: config docstring update

2023-11-21 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/config/__init__.py | 371 ++-
  dts/framework/config/types.py| 132 +++
  2 files changed, 446 insertions(+), 57 deletions(-)
  create mode 100644 dts/framework/config/types.py

diff --git a/dts/framework/config/__init__.py b/dts/framework/config/__init__.py
index 2044c82611..0aa149a53d 100644
--- a/dts/framework/config/__init__.py
+++ b/dts/framework/config/__init__.py
@@ -3,8 +3,34 @@
  # Copyright(c) 2022-2023 University of New Hampshire
  # Copyright(c) 2023 PANTHEON.tech s.r.o.
  
-"""

-Yaml config parsing methods
+"""Testbed configuration and test suite specification.
+
+This package offers classes that hold real-time information about the testbed, 
hold test run
+configuration describing the tested testbed and a loader function, 
:func:`load_config`, which loads
+the YAML test run configuration file
+and validates it according to :download:`the schema `.
+
+The YAML test run configuration file is parsed into a dictionary, parts of 
which are used throughout
+this package. The allowed keys and types inside this dictionary are defined in
+the :doc:`types ` module.
+
+The test run configuration has two main sections:
+
+* The :class:`ExecutionConfiguration` which defines what tests are going 
to be run
+  and how DPDK will be built. It also references the testbed where these 
tests and DPDK
+  are going to be run,
+* The nodes of the testbed are defined in the other section,
+  a :class:`list` of :class:`NodeConfiguration` objects.
+
+The real-time information about testbed is supposed to be gathered at runtime.
+
+The classes defined in this package make heavy use of :mod:`dataclasses`.
+All of them use slots and are frozen:
+
+* Slots enables some optimizations, by pre-allocating space for the defined
+  attributes in the underlying data structure,
+* Frozen makes the object immutable. This enables further optimizations,
+  and makes it thread safe should we every want to move in that direction.


every -> ever ?


  """
  
  import json

@@ -12,11 +38,20 @@
  import pathlib
  from dataclasses import dataclass
  from enum import auto, unique
-from typing import Any, TypedDict, Union
+from typing import Union
  
  import warlock  # type: ignore[import]

  import yaml
  
+from framework.config.types import (

+BuildTargetConfigDict,
+ConfigurationDict,
+ExecutionConfigDict,
+NodeConfigDict,
+PortConfigDict,
+TestSuiteConfigDict,
+TrafficGeneratorConfigDict,
+)
  from framework.exception import ConfigurationError
  from framework.settings import SETTINGS
  from framework.utils import StrEnum
@@ -24,55 +59,97 @@
  
  @unique

  class Architecture(StrEnum):
+r"""The supported architectures of 
:class:`~framework.testbed_model.node.Node`\s."""
+
+#:
  i686 = auto()
+#:
  x86_64 = auto()
+#:
  x86_32 = auto()
+#:
  arm64 = auto()
+#:
  ppc64le = auto()
  
  
  @unique

  class OS(StrEnum):
+r"""The supported operating systems of 
:class:`~framework.testbed_model.node.Node`\s."""
+
+#:
  linux = auto()
+#:
  freebsd = auto()
+#:
  windows = auto()
  
  
  @unique

  class CPUType(StrEnum):
+r"""The supported CPUs of :class:`~framework.testbed_model.node.Node`\s."""
+
+#:
  native = auto()
+#:
  armv8a = auto()
+#:
  dpaa2 = auto()
+#:
  thunderx = auto()
+#:
  xgene1 = auto()
  
  
  @unique

  class Compiler(StrEnum):
+r"""The supported compilers of 
:class:`~framework.testbed_model.node.Node`\s."""
+
+#:
  gcc = auto()
+#:
  clang = auto()
+#:
  icc = auto()
+#:
  msvc = auto()
  
  
  @unique

  class TrafficGeneratorType(StrEnum):
+"""The supported traffic generators."""
+
+#:
  SCAPY = auto()
  
  
-# Slots enables some optimizations, by pre-allocating space for the defined

-# attributes in the underlying data structure.
-#
-# Frozen makes the object immutable. This enables further optimizations,
-# and makes it thread safe should we every want to move in that direction.
  @dataclass(slots=True, frozen=True)
  class HugepageConfiguration:
+r"""The hugepage configuration of 
:class:`~framework.testbed_model.node.Node`\s.
+
+Attributes:
+amount: The number of hugepages.
+force_first_numa: If :data:`True`, the hugepages will be configured on 
the first NUMA node.
+"""
+
  amount: int
  force_first_numa: bool
  
  
  @dataclass(slots=True, frozen=True)

  class PortConfig:
+r"""The port configuration of 
:class:`~framework.testbed_model.node.Node`\s.
+
+Attributes:
+node: The :class:`~framework.testbed_model.node.Node` where this port 
exists.
+pci: The PCI address of the port.
+os_driver_for_dpdk: The operating sys

Re: [PATCH v7 11/21] dts: remote session docstring update

2023-11-21 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/remote_session/__init__.py  |  39 +-
  .../remote_session/remote_session.py  | 128 +-
  dts/framework/remote_session/ssh_session.py   |  16 +--
  3 files changed, 135 insertions(+), 48 deletions(-)

diff --git a/dts/framework/remote_session/__init__.py 
b/dts/framework/remote_session/__init__.py
index 5e7ddb2b05..51a01d6b5e 100644
--- a/dts/framework/remote_session/__init__.py
+++ b/dts/framework/remote_session/__init__.py
@@ -2,12 +2,14 @@
  # Copyright(c) 2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2023 University of New Hampshire
  
-"""

-The package provides modules for managing remote connections to a remote host 
(node),
-differentiated by OS.
-The package provides a factory function, create_session, that returns the 
appropriate
-remote connection based on the passed configuration. The differences are in the
-underlying transport protocol (e.g. SSH) and remote OS (e.g. Linux).
+"""Remote interactive and non-interactive sessions.
+
+This package provides modules for managing remote connections to a remote host 
(node).
+
+The non-interactive sessions send commands and return their output and exit 
code.
+
+The interactive sessions open an interactive shell which is continuously open,
+allowing it to send and receive data within that particular shell.
  """
  
  # pylama:ignore=W0611

@@ -26,10 +28,35 @@
  def create_remote_session(
  node_config: NodeConfiguration, name: str, logger: DTSLOG
  ) -> RemoteSession:
+"""Factory for non-interactive remote sessions.
+
+The function returns an SSH session, but will be extended if support
+for other protocols is added.
+
+Args:
+node_config: The test run configuration of the node to connect to.
+name: The name of the session.
+logger: The logger instance this session will use.
+
+Returns:
+The SSH remote session.
+"""
  return SSHSession(node_config, name, logger)
  
  
  def create_interactive_session(

  node_config: NodeConfiguration, logger: DTSLOG
  ) -> InteractiveRemoteSession:
+"""Factory for interactive remote sessions.
+
+The function returns an interactive SSH session, but will be extended if 
support
+for other protocols is added.
+
+Args:
+node_config: The test run configuration of the node to connect to.
+logger: The logger instance this session will use.
+
+Returns:
+The interactive SSH remote session.
+"""
  return InteractiveRemoteSession(node_config, logger)
diff --git a/dts/framework/remote_session/remote_session.py 
b/dts/framework/remote_session/remote_session.py
index 0647d93de4..629c2d7b9c 100644
--- a/dts/framework/remote_session/remote_session.py
+++ b/dts/framework/remote_session/remote_session.py
@@ -3,6 +3,13 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022-2023 University of New Hampshire
  
+"""Base remote session.

+
+This module contains the abstract base class for remote sessions and defines
+the structure of the result of a command execution.
+"""
+
+
  import dataclasses
  from abc import ABC, abstractmethod
  from pathlib import PurePath
@@ -15,8 +22,14 @@
  
  @dataclasses.dataclass(slots=True, frozen=True)

  class CommandResult:
-"""
-The result of remote execution of a command.
+"""The result of remote execution of a command.
+
+Attributes:
+name: The name of the session that executed the command.
+command: The executed command.
+stdout: The standard output the command produced.
+stderr: The standard error output the command produced.
+return_code: The return code the command exited with.
  """
  
  name: str

@@ -26,6 +39,7 @@ class CommandResult:
  return_code: int
  
  def __str__(self) -> str:

+"""Format the command outputs."""
  return (
  f"stdout: '{self.stdout}'\n"
  f"stderr: '{self.stderr}'\n"
@@ -34,13 +48,24 @@ def __str__(self) -> str:
  
  
  class RemoteSession(ABC):

-"""
-The base class for defining which methods must be implemented in order to 
connect
-to a remote host (node) and maintain a remote session. The derived classes 
are
-supposed to implement/use some underlying transport protocol (e.g. SSH) to
-implement the methods. On top of that, it provides some basic services 
common to
-all derived classes, such as keeping history and logging what's being 
executed
-on the remote node.
+"""Non-interactive remote session.
+
+The abstract methods must be implemented in order to connect to a remote 
host (node)
+and maintain a remote session.
+The subclasses must use (or implement) some underlying transport protocol 
(e.g. SSH)
+to implement the methods. On top of that, it provides some basic services 
common 

Re: [PATCH v7 19/21] dts: base traffic generators docstring update

2023-11-21 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  .../traffic_generator/__init__.py | 22 -
  .../capturing_traffic_generator.py| 46 +++
  .../traffic_generator/traffic_generator.py| 33 +++--
  3 files changed, 68 insertions(+), 33 deletions(-)

diff --git a/dts/framework/testbed_model/traffic_generator/__init__.py 
b/dts/framework/testbed_model/traffic_generator/__init__.py
index 11bfa1ee0f..51cca77da4 100644
--- a/dts/framework/testbed_model/traffic_generator/__init__.py
+++ b/dts/framework/testbed_model/traffic_generator/__init__.py
@@ -1,6 +1,19 @@
  # SPDX-License-Identifier: BSD-3-Clause
  # Copyright(c) 2023 PANTHEON.tech s.r.o.
  
+"""DTS traffic generators.

+
+A traffic generator is capable of generating traffic and then monitor 
returning traffic.
+A traffic generator may just count the number of received packets
+and it may additionally capture individual packets.


The sentence feels odd. Isn't it supposed to be "or" here? and no need 
for that early of a line break



+
+A traffic generator may be software running on generic hardware or it could be 
specialized hardware.
+
+The traffic generators that only count the number of received packets are 
suitable only for
+performance testing. In functional testing, we need to be able to dissect each 
arrived packet
+and a capturing traffic generator is required.
+"""
+
  from framework.config import ScapyTrafficGeneratorConfig, TrafficGeneratorType
  from framework.exception import ConfigurationError
  from framework.testbed_model.node import Node
@@ -12,8 +25,15 @@
  def create_traffic_generator(
  tg_node: Node, traffic_generator_config: ScapyTrafficGeneratorConfig
  ) -> CapturingTrafficGenerator:
-"""A factory function for creating traffic generator object from user 
config."""
+"""The factory function for creating traffic generator objects from the 
test run configuration.
+
+Args:
+tg_node: The traffic generator node where the created traffic 
generator will be running.
+traffic_generator_config: The traffic generator config.
  
+Returns:

+A traffic generator capable of capturing received packets.
+"""
  match traffic_generator_config.traffic_generator_type:
  case TrafficGeneratorType.SCAPY:
  return ScapyTrafficGenerator(tg_node, traffic_generator_config)
diff --git 
a/dts/framework/testbed_model/traffic_generator/capturing_traffic_generator.py 
b/dts/framework/testbed_model/traffic_generator/capturing_traffic_generator.py
index e521211ef0..b0a43ad003 100644
--- 
a/dts/framework/testbed_model/traffic_generator/capturing_traffic_generator.py
+++ 
b/dts/framework/testbed_model/traffic_generator/capturing_traffic_generator.py
@@ -23,19 +23,22 @@
  
  
  def _get_default_capture_name() -> str:

-"""
-This is the function used for the default implementation of capture names.
-"""
  return str(uuid.uuid4())
  
  
  class CapturingTrafficGenerator(TrafficGenerator):

  """Capture packets after sending traffic.
  
-A mixin interface which enables a packet generator to declare that it can capture

+The intermediary interface which enables a packet generator to declare 
that it can capture
  packets and return them to the user.
  
+Similarly to

+
:class:`~framework.testbed_model.traffic_generator.traffic_generator.TrafficGenerator`,
+this class exposes the public methods specific to capturing traffic 
generators and defines
+a private method that must implement the traffic generation and capturing 
logic in subclasses.
+
  The methods of capturing traffic generators obey the following workflow:
+
  1. send packets
  2. capture packets
  3. write the capture to a .pcap file
@@ -44,6 +47,7 @@ class CapturingTrafficGenerator(TrafficGenerator):
  
  @property

  def is_capturing(self) -> bool:
+"""This traffic generator can capture traffic."""
  return True
  
  def send_packet_and_capture(

@@ -54,11 +58,12 @@ def send_packet_and_capture(
  duration: float,
  capture_name: str = _get_default_capture_name(),
  ) -> list[Packet]:
-"""Send a packet, return received traffic.
+"""Send `packet` and capture received traffic.
+
+Send `packet` on `send_port` and then return all traffic captured
+on `receive_port` for the given `duration`.
  
-Send a packet on the send_port and then return all traffic captured

-on the receive_port for the given duration. Also record the captured 
traffic
-in a pcap file.
+The captured traffic is recorded in the `capture_name`.pcap file.
  
  Args:

  packet: The packet to send.
@@ -68,7 +73,7 @@ def send_packet_and_capture(
  capture_name: The name of the .pcap file where to store th

Re: [PATCH v7 20/21] dts: scapy tg docstring update

2023-11-21 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  .../testbed_model/traffic_generator/scapy.py  | 91 +++
  1 file changed, 54 insertions(+), 37 deletions(-)

diff --git a/dts/framework/testbed_model/traffic_generator/scapy.py 
b/dts/framework/testbed_model/traffic_generator/scapy.py
index 51864b6e6b..ed4f879925 100644
--- a/dts/framework/testbed_model/traffic_generator/scapy.py
+++ b/dts/framework/testbed_model/traffic_generator/scapy.py
@@ -2,14 +2,15 @@
  # Copyright(c) 2022 University of New Hampshire
  # Copyright(c) 2023 PANTHEON.tech s.r.o.
  
-"""Scapy traffic generator.

+"""The Scapy traffic generator.
  
-Traffic generator used for functional testing, implemented using the Scapy library.

+A traffic generator used for functional testing, implemented with
+`the Scapy library `_.
  The traffic generator uses an XML-RPC server to run Scapy on the remote TG 
node.
  
-The XML-RPC server runs in an interactive remote SSH session running Python console,

-where we start the server. The communication with the server is facilitated 
with
-a local server proxy.
+The traffic generator uses the :mod:`xmlrpc.server` module to run an XML-RPC 
server
+in an interactive remote Python SSH session. The communication with the server 
is facilitated
+with a local server proxy from the :mod:`xmlrpc.client` module.
  """
  
  import inspect

@@ -69,20 +70,20 @@ def scapy_send_packets_and_capture(
  recv_iface: str,
  duration: float,
  ) -> list[bytes]:
-"""RPC function to send and capture packets.
+"""The RPC function to send and capture packets.
  
-The function is meant to be executed on the remote TG node.

+The function is meant to be executed on the remote TG node via the server 
proxy.
  
  Args:

  xmlrpc_packets: The packets to send. These need to be converted to
-xmlrpc.client.Binary before sending to the remote server.
+:class:`~xmlrpc.client.Binary` objects before sending to the 
remote server.


The string is not raw and no \s. As per you explanation a few commits 
earlier this might cause an issue with the tilda ?
Looking around I see it also happen several time here and also in the 
previous commit.



  send_iface: The logical name of the egress interface.
  recv_iface: The logical name of the ingress interface.
  duration: Capture for this amount of time, in seconds.
  
  Returns:

  A list of bytes. Each item in the list represents one packet, which 
needs
-to be converted back upon transfer from the remote node.
+to be converted back upon transfer from the remote node.
  """
  scapy_packets = [scapy.all.Packet(packet.data) for packet in 
xmlrpc_packets]
  sniffer = scapy.all.AsyncSniffer(
@@ -98,19 +99,15 @@ def scapy_send_packets_and_capture(
  def scapy_send_packets(
  xmlrpc_packets: list[xmlrpc.client.Binary], send_iface: str
  ) -> None:
-"""RPC function to send packets.
+"""The RPC function to send packets.
  
-The function is meant to be executed on the remote TG node.

-It doesn't return anything, only sends packets.
+The function is meant to be executed on the remote TG node via the server 
proxy.
+It only sends `xmlrpc_packets`, without capturing them.
  
  Args:

  xmlrpc_packets: The packets to send. These need to be converted to
-xmlrpc.client.Binary before sending to the remote server.
+:class:`~xmlrpc.client.Binary` objects before sending to the 
remote server.
  send_iface: The logical name of the egress interface.
-
-Returns:
-A list of bytes. Each item in the list represents one packet, which 
needs
-to be converted back upon transfer from the remote node.
  """
  scapy_packets = [scapy.all.Packet(packet.data) for packet in 
xmlrpc_packets]
  scapy.all.sendp(scapy_packets, iface=send_iface, realtime=True, 
verbose=True)
@@ -130,11 +127,19 @@ def scapy_send_packets(
  
  
  class QuittableXMLRPCServer(SimpleXMLRPCServer):

-"""Basic XML-RPC server that may be extended
-by functions serializable by the marshal module.
+r"""Basic XML-RPC server.


But you have a raw string here, and I don't see the need why.


+
+The server may be augmented by functions serializable by the 
:mod:`marshal` module.
  """
  
  def __init__(self, *args, **kwargs):

+"""Extend the XML-RPC server initialization.
+
+Args:
+args: The positional arguments that will be passed to the 
superclass's constructor.
+kwargs: The keyword arguments that will be passed to the 
superclass's constructor.
+The `allow_none` argument will be set to :data:`True`.
+"""
  kwargs["allow_none"] = True
  super().__init__(*args, **kwargs)

Re: [PATCH v7 14/21] dts: cpu docstring update

2023-11-21 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/testbed_model/cpu.py | 196 +
  1 file changed, 144 insertions(+), 52 deletions(-)

diff --git a/dts/framework/testbed_model/cpu.py 
b/dts/framework/testbed_model/cpu.py
index 8fe785dfe4..4edeb4a7c2 100644
--- a/dts/framework/testbed_model/cpu.py
+++ b/dts/framework/testbed_model/cpu.py
@@ -1,6 +1,22 @@
  # SPDX-License-Identifier: BSD-3-Clause
  # Copyright(c) 2023 PANTHEON.tech s.r.o.
  
+"""CPU core representation and filtering.

+
+This module provides a unified representation of logical CPU cores along
+with filtering capabilities.
+
+When symmetric multiprocessing (SMP or multithreading) is enabled on a server,
+the physical CPU cores are split into logical CPU cores with different IDs.
+
+:class:`LogicalCoreCountFilter` filters by the number of logical cores. It's 
possible to specify
+the socket from which to filter the number of logical cores. It's also 
possible to not use all
+logical CPU cores from each physical core (e.g. only the first logical core of 
each physical core).
+
+:class:`LogicalCoreListFilter` filters by logical core IDs. This mostly checks 
that
+the logical cores are actually present on the server.
+"""
+
  import dataclasses
  from abc import ABC, abstractmethod
  from collections.abc import Iterable, ValuesView
@@ -11,9 +27,17 @@
  
  @dataclass(slots=True, frozen=True)

  class LogicalCore(object):
-"""
-Representation of a CPU core. A physical core is represented in OS
-by multiple logical cores (lcores) if CPU multithreading is enabled.
+"""Representation of a logical CPU core.
+
+A physical core is represented in OS by multiple logical cores (lcores)
+if CPU multithreading is enabled. When multithreading is disabled, their 
IDs are the same.
+
+Attributes:
+lcore: The logical core ID of a CPU core. It's the same as `core` with
+disabled multithreading.
+core: The physical core ID of a CPU core.
+socket: The physical socket ID where the CPU resides.
+node: The NUMA node ID where the CPU resides.
  """
  
  lcore: int

@@ -22,27 +46,36 @@ class LogicalCore(object):
  node: int
  
  def __int__(self) -> int:

+"""The CPU is best represented by the logical core, as that's what we configure in 
EAL."""
  return self.lcore
  
  
  class LogicalCoreList(object):

-"""
-Convert these options into a list of logical core ids.
-lcore_list=[LogicalCore1, LogicalCore2] - a list of LogicalCores
-lcore_list=[0,1,2,3] - a list of int indices
-lcore_list=['0','1','2-3'] - a list of str indices; ranges are supported
-lcore_list='0,1,2-3' - a comma delimited str of indices; ranges are 
supported
-
-The class creates a unified format used across the framework and allows
-the user to use either a str representation (using str(instance) or 
directly
-in f-strings) or a list representation (by accessing instance.lcore_list).
-Empty lcore_list is allowed.
+r"""A unified way to store :class:`LogicalCore`\s.
+
+Create a unified format used across the framework and allow the user to use
+either a :class:`str` representation (using ``str(instance)`` or directly 
in f-strings)
+or a :class:`list` representation (by accessing the `lcore_list` property,
+which stores logical core IDs).
  """
  
  _lcore_list: list[int]

  _lcore_str: str
  
  def __init__(self, lcore_list: list[int] | list[str] | list[LogicalCore] | str):

+"""Process `lcore_list`, then sort.
+
+There are four supported logical core list formats::
+
+lcore_list=[LogicalCore1, LogicalCore2]  # a list of LogicalCores
+lcore_list=[0,1,2,3]# a list of int indices
+lcore_list=['0','1','2-3']  # a list of str indices; ranges are 
supported
+lcore_list='0,1,2-3'# a comma delimited str of indices; 
ranges are supported
+
+Args:
+lcore_list: Various ways to represent multiple logical cores.
+Empty `lcore_list` is allowed.
+"""
  self._lcore_list = []
  if isinstance(lcore_list, str):
  lcore_list = lcore_list.split(",")
@@ -60,6 +93,7 @@ def __init__(self, lcore_list: list[int] | list[str] | 
list[LogicalCore] | str):
  
  @property

  def lcore_list(self) -> list[int]:
+"""The logical core IDs."""
  return self._lcore_list
  
  def _get_consecutive_lcores_range(self, lcore_ids_list: list[int]) -> list[str]:

@@ -89,28 +123,30 @@ def _get_consecutive_lcores_range(self, lcore_ids_list: 
list[int]) -> list[str]:
  return formatted_core_list
  
  def __str__(self) -> str:

+"""The consecutive ranges of logical core IDs."""
  return self._lcore_str
  
  
  @dataclasses.dataclass(

Re: [PATCH v7 11/21] dts: remote session docstring update

2023-11-22 Thread Yoan Picchi

On 11/22/23 11:13, Juraj Linkeš wrote:

On Tue, Nov 21, 2023 at 4:36 PM Yoan Picchi  wrote:


On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
   dts/framework/remote_session/__init__.py  |  39 +-
   .../remote_session/remote_session.py  | 128 +-
   dts/framework/remote_session/ssh_session.py   |  16 +--
   3 files changed, 135 insertions(+), 48 deletions(-)

diff --git a/dts/framework/remote_session/__init__.py 
b/dts/framework/remote_session/__init__.py
index 5e7ddb2b05..51a01d6b5e 100644
--- a/dts/framework/remote_session/__init__.py
+++ b/dts/framework/remote_session/__init__.py
@@ -2,12 +2,14 @@
   # Copyright(c) 2023 PANTHEON.tech s.r.o.
   # Copyright(c) 2023 University of New Hampshire

-"""
-The package provides modules for managing remote connections to a remote host 
(node),
-differentiated by OS.
-The package provides a factory function, create_session, that returns the 
appropriate
-remote connection based on the passed configuration. The differences are in the
-underlying transport protocol (e.g. SSH) and remote OS (e.g. Linux).
+"""Remote interactive and non-interactive sessions.
+
+This package provides modules for managing remote connections to a remote host 
(node).
+
+The non-interactive sessions send commands and return their output and exit 
code.
+
+The interactive sessions open an interactive shell which is continuously open,
+allowing it to send and receive data within that particular shell.
   """

   # pylama:ignore=W0611
@@ -26,10 +28,35 @@
   def create_remote_session(
   node_config: NodeConfiguration, name: str, logger: DTSLOG
   ) -> RemoteSession:
+"""Factory for non-interactive remote sessions.
+
+The function returns an SSH session, but will be extended if support
+for other protocols is added.
+
+Args:
+node_config: The test run configuration of the node to connect to.
+name: The name of the session.
+logger: The logger instance this session will use.
+
+Returns:
+The SSH remote session.
+"""
   return SSHSession(node_config, name, logger)


   def create_interactive_session(
   node_config: NodeConfiguration, logger: DTSLOG
   ) -> InteractiveRemoteSession:
+"""Factory for interactive remote sessions.
+
+The function returns an interactive SSH session, but will be extended if 
support
+for other protocols is added.
+
+Args:
+node_config: The test run configuration of the node to connect to.
+logger: The logger instance this session will use.
+
+Returns:
+The interactive SSH remote session.
+"""
   return InteractiveRemoteSession(node_config, logger)
diff --git a/dts/framework/remote_session/remote_session.py 
b/dts/framework/remote_session/remote_session.py
index 0647d93de4..629c2d7b9c 100644
--- a/dts/framework/remote_session/remote_session.py
+++ b/dts/framework/remote_session/remote_session.py
@@ -3,6 +3,13 @@
   # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
   # Copyright(c) 2022-2023 University of New Hampshire

+"""Base remote session.
+
+This module contains the abstract base class for remote sessions and defines
+the structure of the result of a command execution.
+"""
+
+
   import dataclasses
   from abc import ABC, abstractmethod
   from pathlib import PurePath
@@ -15,8 +22,14 @@

   @dataclasses.dataclass(slots=True, frozen=True)
   class CommandResult:
-"""
-The result of remote execution of a command.
+"""The result of remote execution of a command.
+
+Attributes:
+name: The name of the session that executed the command.
+command: The executed command.
+stdout: The standard output the command produced.
+stderr: The standard error output the command produced.
+return_code: The return code the command exited with.
   """

   name: str
@@ -26,6 +39,7 @@ class CommandResult:
   return_code: int

   def __str__(self) -> str:
+"""Format the command outputs."""
   return (
   f"stdout: '{self.stdout}'\n"
   f"stderr: '{self.stderr}'\n"
@@ -34,13 +48,24 @@ def __str__(self) -> str:


   class RemoteSession(ABC):
-"""
-The base class for defining which methods must be implemented in order to 
connect
-to a remote host (node) and maintain a remote session. The derived classes 
are
-supposed to implement/use some underlying transport protocol (e.g. SSH) to
-implement the methods. On top of that, it provides some basic services 
common to
-all derived classes, such as keeping history and logging what's being 

Re: [PATCH v7 15/21] dts: os session docstring update

2023-11-22 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/testbed_model/os_session.py | 275 --
  1 file changed, 208 insertions(+), 67 deletions(-)

diff --git a/dts/framework/testbed_model/os_session.py 
b/dts/framework/testbed_model/os_session.py
index 76e595a518..72b9193a61 100644
--- a/dts/framework/testbed_model/os_session.py
+++ b/dts/framework/testbed_model/os_session.py
@@ -2,6 +2,29 @@
  # Copyright(c) 2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2023 University of New Hampshire
  
+"""OS-aware remote session.

+
+DPDK supports multiple different operating systems, meaning it can run on 
these different operating
+systems. This module defines the common API that OS-unaware layers use and 
translates the API into
+OS-aware calls/utility usage.
+
+Note:
+Running commands with administrative privileges requires OS awareness. 
This is the only layer
+that's aware of OS differences, so this is where non-privileged command 
get converted
+to privileged commands.
+
+Example:
+A user wishes to remove a directory on
+a remote :class:`~framework.testbed_model.sut_node.SutNode`.
+The :class:`~framework.testbed_model.sut_node.SutNode` object isn't aware 
what OS the node
+is running - it delegates the OS translation logic
+to :attr:`~framework.testbed_model.node.Node.main_session`. The SUT node 
calls
+:meth:`~OSSession.remove_remote_dir` with a generic, OS-unaware path and
+the :attr:`~framework.testbed_model.node.Node.main_session` translates that
+to ``rm -rf`` if the node's OS is Linux and other commands for other OSs.
+It also translates the path to match the underlying OS.
+"""
+
  from abc import ABC, abstractmethod
  from collections.abc import Iterable
  from ipaddress import IPv4Interface, IPv6Interface
@@ -28,10 +51,16 @@
  
  
  class OSSession(ABC):

-"""
-The OS classes create a DTS node remote session and implement OS specific
+"""OS-unaware to OS-aware translation API definition.
+
+The OSSession classes create a remote session to a DTS node and implement 
OS specific
  behavior. There a few control methods implemented by the base class, the 
rest need
-to be implemented by derived classes.
+to be implemented by subclasses.
+
+Attributes:
+name: The name of the session.
+remote_session: The remote session maintaining the connection to the 
node.
+interactive_session: The interactive remote session maintaining the 
connection to the node.
  """
  
  _config: NodeConfiguration

@@ -46,6 +75,15 @@ def __init__(
  name: str,
  logger: DTSLOG,
  ):
+"""Initialize the OS-aware session.
+
+Connect to the node right away and also create an interactive remote 
session.
+
+Args:
+node_config: The test run configuration of the node to connect to.
+name: The name of the session.
+logger: The logger instance this session will use.
+"""
  self._config = node_config
  self.name = name
  self._logger = logger
@@ -53,15 +91,15 @@ def __init__(
  self.interactive_session = create_interactive_session(node_config, 
logger)
  
  def close(self, force: bool = False) -> None:

-"""
-Close the remote session.
+"""Close the underlying remote session.
+
+Args:
+force: Force the closure of the connection.
  """
  self.remote_session.close(force)
  
  def is_alive(self) -> bool:

-"""
-Check whether the remote session is still responding.
-"""
+"""Check whether the underlying remote session is still responding."""
  return self.remote_session.is_alive()
  
  def send_command(

@@ -72,10 +110,23 @@ def send_command(
  verify: bool = False,
  env: dict | None = None,
  ) -> CommandResult:
-"""
-An all-purpose API in case the command to be executed is already
-OS-agnostic, such as when the path to the executed command has been
-constructed beforehand.
+"""An all-purpose API for OS-agnostic commands.
+
+This can be used for an execution of a portable command that's 
executed the same way
+on all operating systems, such as Python.
+
+The :option:`--timeout` command line argument and the 
:envvar:`DTS_TIMEOUT`
+environment variable configure the timeout of command execution.
+
+Args:
+command: The command to execute.
+timeout: Wait at most this long in seconds to execute the command.


confusing start/end of execution


+privileged: Whether to run the command with administrative 
privileges.
+verify: If :data:`True`, will check the exit code of the command.
+env: A dictionary with environment variables to be u

Re: [PATCH v7 19/21] dts: base traffic generators docstring update

2023-11-22 Thread Yoan Picchi

On 11/22/23 11:38, Juraj Linkeš wrote:

On Tue, Nov 21, 2023 at 5:20 PM Yoan Picchi  wrote:


On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
   .../traffic_generator/__init__.py | 22 -
   .../capturing_traffic_generator.py| 46 +++
   .../traffic_generator/traffic_generator.py| 33 +++--
   3 files changed, 68 insertions(+), 33 deletions(-)

diff --git a/dts/framework/testbed_model/traffic_generator/__init__.py 
b/dts/framework/testbed_model/traffic_generator/__init__.py
index 11bfa1ee0f..51cca77da4 100644
--- a/dts/framework/testbed_model/traffic_generator/__init__.py
+++ b/dts/framework/testbed_model/traffic_generator/__init__.py
@@ -1,6 +1,19 @@
   # SPDX-License-Identifier: BSD-3-Clause
   # Copyright(c) 2023 PANTHEON.tech s.r.o.

+"""DTS traffic generators.
+
+A traffic generator is capable of generating traffic and then monitor 
returning traffic.
+A traffic generator may just count the number of received packets
+and it may additionally capture individual packets.


The sentence feels odd. Isn't it supposed to be "or" here? and no need
for that early of a line break



There are two mays, so there probably should be an or. But I'd like to
reword it to this:

All traffic generators count the number of received packets, and they
may additionally
capture individual packets.

What do you think?


I think it's better with the new sentence. But I think it'd be even 
better to split into two sentences to highlight the must/may:

All traffic generators must count the number of received packets. Some
may additionally capture individual packets.




+
+A traffic generator may be software running on generic hardware or it could be 
specialized hardware.
+
+The traffic generators that only count the number of received packets are 
suitable only for
+performance testing. In functional testing, we need to be able to dissect each 
arrived packet
+and a capturing traffic generator is required.
+"""
+
   from framework.config import ScapyTrafficGeneratorConfig, 
TrafficGeneratorType
   from framework.exception import ConfigurationError
   from framework.testbed_model.node import Node
@@ -12,8 +25,15 @@
   def create_traffic_generator(
   tg_node: Node, traffic_generator_config: ScapyTrafficGeneratorConfig
   ) -> CapturingTrafficGenerator:
-"""A factory function for creating traffic generator object from user 
config."""
+"""The factory function for creating traffic generator objects from the 
test run configuration.
+
+Args:
+tg_node: The traffic generator node where the created traffic 
generator will be running.
+traffic_generator_config: The traffic generator config.

+Returns:
+A traffic generator capable of capturing received packets.
+"""
   match traffic_generator_config.traffic_generator_type:
   case TrafficGeneratorType.SCAPY:
   return ScapyTrafficGenerator(tg_node, traffic_generator_config)
diff --git 
a/dts/framework/testbed_model/traffic_generator/capturing_traffic_generator.py 
b/dts/framework/testbed_model/traffic_generator/capturing_traffic_generator.py
index e521211ef0..b0a43ad003 100644
--- 
a/dts/framework/testbed_model/traffic_generator/capturing_traffic_generator.py
+++ 
b/dts/framework/testbed_model/traffic_generator/capturing_traffic_generator.py
@@ -23,19 +23,22 @@


   def _get_default_capture_name() -> str:
-"""
-This is the function used for the default implementation of capture names.
-"""
   return str(uuid.uuid4())


   class CapturingTrafficGenerator(TrafficGenerator):
   """Capture packets after sending traffic.

-A mixin interface which enables a packet generator to declare that it can 
capture
+The intermediary interface which enables a packet generator to declare 
that it can capture
   packets and return them to the user.

+Similarly to
+
:class:`~framework.testbed_model.traffic_generator.traffic_generator.TrafficGenerator`,
+this class exposes the public methods specific to capturing traffic 
generators and defines
+a private method that must implement the traffic generation and capturing 
logic in subclasses.
+
   The methods of capturing traffic generators obey the following workflow:
+
   1. send packets
   2. capture packets
   3. write the capture to a .pcap file
@@ -44,6 +47,7 @@ class CapturingTrafficGenerator(TrafficGenerator):

   @property
   def is_capturing(self) -> bool:
+"""This traffic generator can capture traffic."""
   return True

   def send_packet_and_capture(
@@ -54,11 +58,12 @@ def send_packet_and_capture(
   durat

Re: [PATCH v7 17/21] dts: node docstring update

2023-11-22 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/testbed_model/node.py | 191 +++-
  1 file changed, 131 insertions(+), 60 deletions(-)

diff --git a/dts/framework/testbed_model/node.py 
b/dts/framework/testbed_model/node.py
index fa5b143cdd..f93b4acecd 100644
--- a/dts/framework/testbed_model/node.py
+++ b/dts/framework/testbed_model/node.py
@@ -3,8 +3,13 @@
  # Copyright(c) 2022-2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2022-2023 University of New Hampshire
  
-"""

-A node is a generic host that DTS connects to and manages.
+"""Common functionality for node management.
+
+A node is any host/server DTS connects to.
+
+The base class, :class:`Node`, provides functionality common to all nodes and 
is supposed
+to be extended by subclasses with functionality specific to each node type.


functionality -> functionalities


+The decorator :func:`Node.skip_setup` can be used without subclassing.
  """
  
  from abc import ABC

@@ -35,10 +40,22 @@
  
  
  class Node(ABC):

-"""
-Basic class for node management. This class implements methods that
-manage a node, such as information gathering (of CPU/PCI/NIC) and
-environment setup.
+"""The base class for node management.
+
+It shouldn't be instantiated, but rather subclassed.
+It implements common methods to manage any node:
+
+* Connection to the node,
+* Hugepages setup.
+
+Attributes:
+main_session: The primary OS-aware remote session used to communicate 
with the node.
+config: The node configuration.
+name: The name of the node.
+lcores: The list of logical cores that DTS can use on the node.
+It's derived from logical cores present on the node and the test 
run configuration.
+ports: The ports of this node specified in the test run configuration.
+virtual_devices: The virtual devices used on the node.
  """
  
  main_session: OSSession

@@ -52,6 +69,17 @@ class Node(ABC):
  virtual_devices: list[VirtualDevice]
  
  def __init__(self, node_config: NodeConfiguration):

+"""Connect to the node and gather info during initialization.
+
+Extra gathered information:
+
+* The list of available logical CPUs. This is then filtered by
+  the ``lcores`` configuration in the YAML test run configuration file,
+* Information about ports from the YAML test run configuration file.
+
+Args:
+node_config: The node's test run configuration.
+"""
  self.config = node_config
  self.name = node_config.name
  self._logger = getLogger(self.name)
@@ -60,7 +88,7 @@ def __init__(self, node_config: NodeConfiguration):
  self._logger.info(f"Connected to node: {self.name}")
  
  self._get_remote_cpus()

-# filter the node lcores according to user config
+# filter the node lcores according to the test run configuration
  self.lcores = LogicalCoreListFilter(
  self.lcores, LogicalCoreList(self.config.lcores)
  ).filter()
@@ -76,9 +104,14 @@ def _init_ports(self) -> None:
  self.configure_port_state(port)
  
  def set_up_execution(self, execution_config: ExecutionConfiguration) -> None:

-"""
-Perform the execution setup that will be done for each execution
-this node is part of.
+"""Execution setup steps.
+
+Configure hugepages and call :meth:`_set_up_execution` where
+the rest of the configuration steps (if any) are implemented.
+
+Args:
+execution_config: The execution test run configuration according 
to which
+the setup steps will be taken.
  """
  self._setup_hugepages()
  self._set_up_execution(execution_config)
@@ -87,58 +120,74 @@ def set_up_execution(self, execution_config: 
ExecutionConfiguration) -> None:
  self.virtual_devices.append(VirtualDevice(vdev))
  
  def _set_up_execution(self, execution_config: ExecutionConfiguration) -> None:

-"""
-This method exists to be optionally overwritten by derived classes and
-is not decorated so that the derived class doesn't have to use the 
decorator.
+"""Optional additional execution setup steps for subclasses.
+
+Subclasses should override this if they need to add additional 
execution setup steps.
  """
  
  def tear_down_execution(self) -> None:

-"""
-Perform the execution teardown that will be done after each execution
-this node is part of concludes.
+"""Execution teardown steps.
+
+There are currently no common execution teardown steps common to all 
DTS node types.
  """
  self.virtual_devices = []
  self._tear_down_execution()
  
  def _tear_down_execution(self) -

Re: [PATCH v7 18/21] dts: sut and tg nodes docstring update

2023-11-22 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/testbed_model/sut_node.py | 224 
  dts/framework/testbed_model/tg_node.py  |  42 +++--
  2 files changed, 173 insertions(+), 93 deletions(-)

diff --git a/dts/framework/testbed_model/sut_node.py 
b/dts/framework/testbed_model/sut_node.py
index 17deea06e2..123b16fee0 100644
--- a/dts/framework/testbed_model/sut_node.py
+++ b/dts/framework/testbed_model/sut_node.py
@@ -3,6 +3,14 @@
  # Copyright(c) 2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2023 University of New Hampshire
  
+"""System under test (DPDK + hardware) node.

+
+A system under test (SUT) is the combination of DPDK
+and the hardware we're testing with DPDK (NICs, crypto and other devices).
+An SUT node is where this SUT runs.
+"""
+
+
  import os
  import tarfile
  import time
@@ -26,6 +34,11 @@
  
  
  class EalParameters(object):

+"""The environment abstraction layer parameters.
+
+The string representation can be created by converting the instance to a 
string.
+"""
+
  def __init__(
  self,
  lcore_list: LogicalCoreList,
@@ -35,21 +48,23 @@ def __init__(
  vdevs: list[VirtualDevice],
  other_eal_param: str,
  ):
-"""
-Generate eal parameters character string;
-:param lcore_list: the list of logical cores to use.
-:param memory_channels: the number of memory channels to use.
-:param prefix: set file prefix string, eg:
-prefix='vf'
-:param no_pci: switch of disable PCI bus eg:
-no_pci=True
-:param vdevs: virtual device list, eg:
-vdevs=[
-VirtualDevice('net_ring0'),
-VirtualDevice('net_ring1')
-]
-:param other_eal_param: user defined DPDK eal parameters, eg:
-other_eal_param='--single-file-segments'
+"""Initialize the parameters according to inputs.
+
+Process the parameters into the format used on the command line.
+
+Args:
+lcore_list: The list of logical cores to use.
+memory_channels: The number of memory channels to use.
+prefix: Set the file prefix string with which to start DPDK, e.g.: 
``prefix='vf'``.
+no_pci: Switch to disable PCI bus e.g.: ``no_pci=True``.
+vdevs: Virtual devices, e.g.::
+
+vdevs=[
+VirtualDevice('net_ring0'),
+VirtualDevice('net_ring1')
+]
+other_eal_param: user defined DPDK EAL parameters, e.g.:
+``other_eal_param='--single-file-segments'``
  """
  self._lcore_list = f"-l {lcore_list}"
  self._memory_channels = f"-n {memory_channels}"
@@ -61,6 +76,7 @@ def __init__(
  self._other_eal_param = other_eal_param
  
  def __str__(self) -> str:

+"""Create the EAL string."""
  return (
  f"{self._lcore_list} "
  f"{self._memory_channels} "
@@ -72,11 +88,21 @@ def __str__(self) -> str:
  
  
  class SutNode(Node):

-"""
-A class for managing connections to the System under Test, providing
-methods that retrieve the necessary information about the node (such as
-CPU, memory and NIC details) and configuration capabilities.
-Another key capability is building DPDK according to given build target.
+"""The system under test node.
+
+The SUT node extends :class:`Node` with DPDK specific features:
+
+* DPDK build,
+* Gathering of DPDK build info,
+* The running of DPDK apps, interactively or one-time execution,
+* DPDK apps cleanup.
+
+The :option:`--tarball` command line argument and the 
:envvar:`DTS_DPDK_TARBALL`
+environment variable configure the path to the DPDK tarball
+or the git commit ID, tag ID or tree ID to test.


I just want to make sure. We use the --tarball option also to set a git 
commit id instead of a tarball as the source?



+
+Attributes:
+config: The SUT node configuration
  """
  
  config: SutNodeConfiguration

@@ -94,6 +120,11 @@ class SutNode(Node):
  _path_to_devbind_script: PurePath | None
  
  def __init__(self, node_config: SutNodeConfiguration):

+"""Extend the constructor with SUT node specifics.
+
+Args:
+node_config: The SUT node's test run configuration.
+"""
  super(SutNode, self).__init__(node_config)
  self._dpdk_prefix_list = []
  self._build_target_config = None
@@ -113,6 +144,12 @@ def __init__(self, node_config: SutNodeConfiguration):
  
  @property

  def _remote_dpdk_dir(self) -> PurePath:
+"""The remote DPDK dir.
+
+This internal property should be set after extracting the DP

Re: [PATCH v7 16/21] dts: posix and linux sessions docstring update

2023-11-22 Thread Yoan Picchi

On 11/15/23 13:09, Juraj Linkeš wrote:

Format according to the Google format and PEP257, with slight
deviations.

Signed-off-by: Juraj Linkeš 
---
  dts/framework/testbed_model/linux_session.py | 63 ++-
  dts/framework/testbed_model/posix_session.py | 81 +---
  2 files changed, 113 insertions(+), 31 deletions(-)

diff --git a/dts/framework/testbed_model/linux_session.py 
b/dts/framework/testbed_model/linux_session.py
index f472bb8f0f..279954ff63 100644
--- a/dts/framework/testbed_model/linux_session.py
+++ b/dts/framework/testbed_model/linux_session.py
@@ -2,6 +2,13 @@
  # Copyright(c) 2023 PANTHEON.tech s.r.o.
  # Copyright(c) 2023 University of New Hampshire
  
+"""Linux OS translator.

+
+Translate OS-unaware calls into Linux calls/utilities. Most of Linux 
distributions are mostly
+compliant with POSIX standards, so this module only implements the parts that 
aren't.
+This intermediate module implements the common parts of mostly POSIX compliant 
distributions.
+"""
+
  import json
  from ipaddress import IPv4Interface, IPv6Interface
  from typing import TypedDict, Union
@@ -17,43 +24,51 @@
  
  
  class LshwConfigurationOutput(TypedDict):

+"""The relevant parts of ``lshw``'s ``configuration`` section."""
+
+#:
  link: str
  
  
  class LshwOutput(TypedDict):

-"""
-A model of the relevant information from json lshw output, e.g.:
-{
-...
-"businfo" : "pci@:08:00.0",
-"logicalname" : "enp8s0",
-"version" : "00",
-"serial" : "52:54:00:59:e1:ac",
-...
-"configuration" : {
-  ...
-  "link" : "yes",
-  ...
-},
-...
+"""A model of the relevant information from ``lshw``'s json output.
+
+e.g.::
+
+{
+...
+"businfo" : "pci@:08:00.0",
+"logicalname" : "enp8s0",
+"version" : "00",
+"serial" : "52:54:00:59:e1:ac",
+...
+"configuration" : {
+  ...
+  "link" : "yes",
+  ...
+},
+...
  """
  
+#:

  businfo: str
+#:
  logicalname: NotRequired[str]
+#:
  serial: NotRequired[str]
+#:
  configuration: LshwConfigurationOutput
  
  
  class LinuxSession(PosixSession):

-"""
-The implementation of non-Posix compliant parts of Linux remote sessions.
-"""
+"""The implementation of non-Posix compliant parts of Linux."""
  
  @staticmethod

  def _get_privileged_command(command: str) -> str:
  return f"sudo -- sh -c '{command}'"
  
  def get_remote_cpus(self, use_first_core: bool) -> list[LogicalCore]:

+"""Overrides :meth:`~.os_session.OSSession.get_remote_cpus`."""
  cpu_info = self.send_command("lscpu -p=CPU,CORE,SOCKET,NODE|grep -v 
\\#").stdout
  lcores = []
  for cpu_line in cpu_info.splitlines():
@@ -65,18 +80,20 @@ def get_remote_cpus(self, use_first_core: bool) -> 
list[LogicalCore]:
  return lcores
  
  def get_dpdk_file_prefix(self, dpdk_prefix: str) -> str:

+"""Overrides :meth:`~.os_session.OSSession.get_dpdk_file_prefix`."""
  return dpdk_prefix
  
-def setup_hugepages(self, hugepage_amount: int, force_first_numa: bool) -> None:

+def setup_hugepages(self, hugepage_count: int, force_first_numa: bool) -> 
None:
+"""Overrides :meth:`~.os_session.OSSession.setup_hugepages`."""
  self._logger.info("Getting Hugepage information.")
  hugepage_size = self._get_hugepage_size()
  hugepages_total = self._get_hugepages_total()
  self._numa_nodes = self._get_numa_nodes()
  
-if force_first_numa or hugepages_total != hugepage_amount:

+if force_first_numa or hugepages_total != hugepage_count:
  # when forcing numa, we need to clear existing hugepages 
regardless
  # of size, so they can be moved to the first numa node
-self._configure_huge_pages(hugepage_amount, hugepage_size, 
force_first_numa)
+self._configure_huge_pages(hugepage_count, hugepage_size, 
force_first_numa)
  else:
  self._logger.info("Hugepages already configured.")
  self._mount_huge_pages()
@@ -140,6 +157,7 @@ def _configure_huge_pages(
  )
  
  def update_ports(self, ports: list[Port]) -> None:

+"""Overrides :meth:`~.os_session.OSSession.update_ports`."""
  self._logger.debug("Gathering port info.")
  for port in ports:
  assert (
@@ -178,6 +196,7 @@ def _update_port_attr(
  )
  
  def configure_port_state(self, port: Port, enable: bool) -> None:

+"""Overrides :meth:`~.os_session.OSSession.configure_port_state`."""
  state = "up" if enable else "down"
  self.send_command(
  f"ip link set dev {port.logical_name} {state}", privileged=True
@@ -189,6 +208,7 @@ def configure_port_ip_address(
  port: Port,
  delete: bool,
  ) -> None:
+"""Overrides 

Re: [PATCH v8 00/21] dts: docstrings update

2023-12-01 Thread Yoan Picchi
  |  19 +-
  dts/poetry.lock   |  12 +-
  dts/pyproject.toml|   6 +-
  dts/tests/TestSuite_hello_world.py|  16 +-
  dts/tests/TestSuite_os_udp.py |  20 +-
  dts/tests/TestSuite_smoke_tests.py|  61 ++-
  48 files changed, 3506 insertions(+), 1683 deletions(-)
  create mode 100644 dts/framework/config/types.py
  rename dts/framework/remote_session/{remote => 
}/interactive_remote_session.py (76%)
  create mode 100644 dts/framework/remote_session/interactive_shell.py
  delete mode 100644 dts/framework/remote_session/os_session.py
  create mode 100644 dts/framework/remote_session/python_shell.py
  delete mode 100644 dts/framework/remote_session/remote/__init__.py
  delete mode 100644 dts/framework/remote_session/remote/interactive_shell.py
  delete mode 100644 dts/framework/remote_session/remote/python_shell.py
  delete mode 100644 dts/framework/remote_session/remote/remote_session.py
  delete mode 100644 dts/framework/remote_session/remote/testpmd_shell.py
  create mode 100644 dts/framework/remote_session/remote_session.py
  rename dts/framework/remote_session/{remote => }/ssh_session.py (82%)
  create mode 100644 dts/framework/remote_session/testpmd_shell.py
  rename dts/framework/testbed_model/{hw => }/cpu.py (50%)
  delete mode 100644 dts/framework/testbed_model/hw/__init__.py
  delete mode 100644 dts/framework/testbed_model/hw/port.py
  delete mode 100644 dts/framework/testbed_model/hw/virtual_device.py
  rename dts/framework/{remote_session => testbed_model}/linux_session.py (77%)
  create mode 100644 dts/framework/testbed_model/os_session.py
  create mode 100644 dts/framework/testbed_model/port.py
  rename dts/framework/{remote_session => testbed_model}/posix_session.py (73%)
  delete mode 100644 dts/framework/testbed_model/traffic_generator.py
  create mode 100644 dts/framework/testbed_model/traffic_generator/__init__.py
  rename dts/framework/testbed_model/{ => 
traffic_generator}/capturing_traffic_generator.py (68%)
  rename dts/framework/testbed_model/{ => traffic_generator}/scapy.py (71%)
  create mode 100644 
dts/framework/testbed_model/traffic_generator/traffic_generator.py
  create mode 100644 dts/framework/testbed_model/virtual_device.py


Reviewed-by: Yoan Picchi 


Re: [PATCH v5 4/4] hash: add SVE support for bulk key lookup

2024-03-05 Thread Yoan Picchi

On 3/4/24 13:35, Konstantin Ananyev wrote:




- Implemented SVE code for comparing signatures in bulk lookup.
- Added Defines in code for SVE code support.
- Optimise NEON code
- New SVE code is ~5% slower than optimized NEON for N2 processor.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
   lib/hash/rte_cuckoo_hash.c | 196 -
   lib/hash/rte_cuckoo_hash.h |   1 +
   2 files changed, 151 insertions(+), 46 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index a07dd3a28d..231d6d6ded 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -442,8 +442,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
   #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
+   }
else
   #endif
h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
@@ -1860,37 +1863,103 @@ rte_hash_free_key_with_position(const struct rte_hash 
*h,
   #if defined(__ARM_NEON)

   static inline void
-compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t 
*sec_hash_matches,
-   const struct rte_hash_bucket *prim_bkt,
-   const struct rte_hash_bucket *sec_bkt,
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
uint16_t sig,
enum rte_hash_sig_compare_function sig_cmp_fn)
   {
unsigned int i;

+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept the whole hitmask if it 
is dense");
+
/* For match mask every bits indicates the match */
switch (sig_cmp_fn) {


Can I ask to move arch specific comparison code into some arch-specific headers 
or so?
It is getting really hard to read and understand the generic code with all 
these ifdefs and arch specific instructions...



Hi, apologies for long delay in response.

  

I can easily enough move the compare_signatures into an arm/x86
directory, and have a default version in the code.


Yes, that's what I thought about.
  

The problem would be for bulk lookup. The function is already duplicated
   2 times (the l and lf version). If I remove the #ifdefs, I'll need to
duplicate them again into 4 nearly identical versions (dense and
sparse). The only third options I see would be some preprocessor macro
to patch the function, but that looks even dirtier to me.


Not sure I understood you here: from looking at the code I don't see any
arch specific ifdefs in bulk_lookup() routines.
What I am missing here?
  


Most if not all of those #if are architecture specific. For instance:
#if defined(__ARM_NEON)
#if defined(RTE_HAS_SVE_ACLE)

The main reason there's some #if in bulk lookup is to handle whether the 
function run with dense hitmask or a sparse hitmask.
x86 only support the sparse hitmask version (1 bit data, 1 bit padding) 
but arm support the dense hitmask (every bit count). The later ends up 
being faster.
Splitting bulk_lookup into its sparse and dense variant would be a lot 
of code duplication that I'd prefer to avoid.


What I might be able to do would be move compare_signatures into some 
arch specific version. The function are different enough that it 
wouldn't be too much of a code duplication. I'd argue though that the 
#ifded for NEON and SSE were already there and I only added the SVE variant.





I think duplicating the code would be bad, but I can do it if you want.
Unless you have a better solution?


+#if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, x;
+   uint16x8_t vmat, hit1, hit2;
const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);

/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)prim_bkt->sig_current));
-   x = vandq_u16(vmat, mask);
-   *prim_hash_matches = (uint32_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const 
*)sec_bkt->sig_current));
-   x = vandq_u16

[PATCH v6 0/4] hash: add SVE support for bulk key lookup

2024-03-11 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

V4->V5:
  Commit message

V5->V6:
  Move the arch-specific code into new arch-specific files
  Isolate the data struture refactor from adding SVE

Yoan Picchi (4):
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap  |   2 +
 app/test/test_hash.c  |  99 ---
 lib/hash/arch/arm/compare_signatures.h| 117 +
 lib/hash/arch/common/compare_signatures.h |  38 +
 lib/hash/arch/x86/compare_signatures.h|  53 ++
 lib/hash/rte_cuckoo_hash.c| 197 --
 lib/hash/rte_cuckoo_hash.h|   1 +
 7 files changed, 392 insertions(+), 115 deletions(-)
 create mode 100644 lib/hash/arch/arm/compare_signatures.h
 create mode 100644 lib/hash/arch/common/compare_signatures.h
 create mode 100644 lib/hash/arch/x86/compare_signatures.h

-- 
2.25.1



[PATCH v6 1/4] hash: pack the hitmask for hash in bulk lookup

2024-03-11 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.
In addition, the new dense hitmask interweave the primary
and secondary matches which allow a better cache usage and
enable future improvements for the SIMD implementations

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap  |   2 +
 lib/hash/arch/arm/compare_signatures.h|  61 +++
 lib/hash/arch/common/compare_signatures.h |  38 +
 lib/hash/arch/x86/compare_signatures.h|  53 ++
 lib/hash/rte_cuckoo_hash.c| 195 --
 lib/hash/rte_cuckoo_hash.h|   1 +
 6 files changed, 258 insertions(+), 92 deletions(-)
 create mode 100644 lib/hash/arch/arm/compare_signatures.h
 create mode 100644 lib/hash/arch/common/compare_signatures.h
 create mode 100644 lib/hash/arch/x86/compare_signatures.h

diff --git a/.mailmap b/.mailmap
index 66ebc20666..00b50414d3 100644
--- a/.mailmap
+++ b/.mailmap
@@ -494,6 +494,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
@@ -1633,6 +1634,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
new file mode 100644
index 00..1af6ba8190
--- /dev/null
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * Arm's version uses a densely packed hitmask buffer:
+ * Every bit is in use.
+ */
+
+#include 
+#include 
+#include 
+#include "rte_cuckoo_hash.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept the whole hitmask if 
it is dense");
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+#if RTE_HASH_BUCKET_ENTRIES <= 8
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16_t low, high;
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   low = (uint16_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   high = (uint16_t)(vaddvq_u16(x));
+   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+
+   }
+   break;
+#endif
+   default:
+   for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *hitmask_buffer |=
+   ((sig == prim_bucket_sigs[i]) << i);
+   *hitmask_buffer |=
+   ((sig == sec_bucket_sigs[i]) << i) << 
RTE_HASH_BUCKET_ENTRIES;
+   }
+   }
+}
diff --git a/lib/hash/arch/common/compare_signatures.h 
b/lib/hash/arch/common/compare_signatures.h
new file mode 100644
index 00..dcf9444032
--- /dev/null
+++ b/lib/hash/arch/common/compare_signatures.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * The generic version could use either a dense or sparsely packed hitmask 
buffer,
+ * but the dense one is slightly faster.
+ */
+
+#include 
+#include 
+#include 
+#include "rte_cuckoo_hash.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   (void) sig_cmp_fn;
+
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   &quo

[PATCH v6 2/4] hash: optimize compare signature for NEON

2024-03-11 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/arch/arm/compare_signatures.h | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
index 1af6ba8190..b5a457f936 100644
--- a/lib/hash/arch/arm/compare_signatures.h
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -30,23 +30,21 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
switch (sig_cmp_fn) {
 #if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
-   uint16_t low, high;
+   uint16x8_t vmat, hit1, hit2;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   low = (uint16_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   high = (uint16_t)(vaddvq_u16(x));
-   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
 
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
}
break;
 #endif
-- 
2.25.1



[PATCH v6 3/4] test/hash: check bulk lookup of keys after collision

2024-03-11 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 app/test/test_hash.c | 99 ++--
 1 file changed, 76 insertions(+), 23 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index d586878a22..4f871b3499 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | (3 << 16);
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+   RETURN_IF_ERROR(pos[KEY_PER_BUCKET] < 0,
+   "failed to add key (pos[%d]=%d)", KEY_PER_BUCKET, 
pos[KEY_PER_BUCKET]);
+   expected_pos[KEY_PER_BUCKET] = pos[KEY_PER_BUCKET];
 
/* Lookup */
-   for (i = 0; i < 5; i++) {
+   for (i = 0; i <

[PATCH v6 4/4] hash: add SVE support for bulk key lookup

2024-03-11 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- Added Defines in code for SVE code support.
- Optimise NEON code
- New SVE code is ~5% slower than optimized NEON for N2 processor.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
 lib/hash/arch/arm/compare_signatures.h | 58 ++
 lib/hash/rte_cuckoo_hash.c |  2 +
 2 files changed, 60 insertions(+)

diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
index b5a457f936..8a0627e119 100644
--- a/lib/hash/arch/arm/compare_signatures.h
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
*hitmask_buffer = vaddvq_u16(hit2);
}
break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparison at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred,
+   primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, sv_matches, 
shift);
+   *hitmask_buffer = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   } else {
+   do {
+   pred = svwhilelt_b16(i, 
RTE_HASH_BUCKET_ENTRIES);
+   uint16_t lower_half = 0;
+   uint16_t upper_half = 0;
+   /* Compare all signatures in the primary bucket 
*/
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &prim_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   lower_half = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   /* Compare all signatures in the secondary 
bucket */
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &sec_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   upper_half = svorv_u16(svptrue_b16(), 
sv_matches)
+   << RTE_HASH_BUCKET_ENTRIES;
+   }
+   hitmask_buffer[i/8] = upper_half | lower_half;
+   i += vl;
+   } while (i < RTE_HASH_BUCKET_ENTRIES);
+   }
+   }
+   break;
 #endif
default:
for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index e41f03270a..7a474267f0 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -452,6 +452,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 #elif defined(RTE_ARCH_ARM64)
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
}
else
 #endif
-- 
2.25.1



Re: [PATCH v6 4/4] hash: add SVE support for bulk key lookup

2024-03-12 Thread Yoan Picchi

On 3/12/24 03:57, fengchengwen wrote:

Hi Yoan,

On 2024/3/12 7:21, Yoan Picchi wrote:

- Implemented SVE code for comparing signatures in bulk lookup.
- Added Defines in code for SVE code support.
- Optimise NEON code


This commit does not include this part. Pls only describe the content in this 
commit.


Thank you. I forgot to edit that out after moving commit around.




- New SVE code is ~5% slower than optimized NEON for N2 processor.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
  lib/hash/arch/arm/compare_signatures.h | 58 ++
  lib/hash/rte_cuckoo_hash.c |  2 +
  2 files changed, 60 insertions(+)

diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
index b5a457f936..8a0627e119 100644
--- a/lib/hash/arch/arm/compare_signatures.h
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
*hitmask_buffer = vaddvq_u16(hit2);
}
break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {


...


  #endif
default:
for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index e41f03270a..7a474267f0 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -452,6 +452,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
  #elif defined(RTE_ARCH_ARM64)
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;


The RTE_HASH_COMPARE_SVE was defined in "PATCH v6 1/4] hash: pack the hitmask for 
hash in bulk lookup",
but its first use is in this commit, so I think it should defined in this 
commit.

If RTE_CPUFLAG_SVE and RTE_HAS_SVE_ACLE both set, then SVE impl will be chosen.
If RTE_CPUFLAG_SVE defined, but RTE_HAS_SVE_ACLE was not, then scalar will be 
chosen. --- in this case we could back to NEON impl.
So I suggest direct use "#if defined(RTE_HAS_SVE_ACLE)" here.


Sounds fair. I'll do it.




}
else
  #endif



Plus:
I notice the commit log said the SVE performance is slower than NEON.

And I also notice other platform SVE also lower than NEON,
1. b4ee9c07bd config/arm: disable SVE ACLE for CN10K
2. 4eea7c6461 config/arm: add SVE ACLE control flag

So maybe we should disable RTE_HAS_SVE_ACLE default by:
diff --git a/config/arm/meson.build b/config/arm/meson.build
index 9d6fb87d7f..a5b890d100 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -875,7 +875,7 @@ endif

  if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
  compile_time_cpuflags += ['RTE_CPUFLAG_SVE']
-if (cc.check_header('arm_sve.h') and soc_config.get('sve_acle', true))
+if (cc.check_header('arm_sve.h') and soc_config.get('sve_acle', false))
  dpdk_conf.set('RTE_HAS_SVE_ACLE', 1)
  endif
  endif

If the platform verify SVE has higher performance, then it could enable SVE by add 
"sve_acle: true" in soc_xxx config.

Thanks


Here I kinda disagree. In this particular instance, SVE is a bit slower 
with narrow vectors (128b), but could be faster with some wider vector 
sizes.
Even in general SVE 128b is not just slower than neon. It's a case by 
case basis. Sometime it's slower, sometime it's faster, so I don't think 
we should just disable it by default. In any case, disabling it should 
be its own patch with much discussion, not just a offhand thing we 
include in the middle of this patch.
This SVE version is still faster than the upstream neon version. I just 
happen to have improved the neon version even more.


[PATCH v7 0/4] hash: add SVE support for bulk key lookup

2024-03-12 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

V4->V5:
  Commit message

V5->V6:
  Move the arch-specific code into new arch-specific files
  Isolate the data struture refactor from adding SVE

V6->V7:
  Commit message
  Moved RTE_HASH_COMPARE_SVE to the last commit of the chain

Yoan Picchi (4):
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap  |   2 +
 app/test/test_hash.c  |  99 ---
 lib/hash/arch/arm/compare_signatures.h| 117 +
 lib/hash/arch/common/compare_signatures.h |  38 +
 lib/hash/arch/x86/compare_signatures.h|  53 ++
 lib/hash/rte_cuckoo_hash.c| 199 --
 lib/hash/rte_cuckoo_hash.h|   1 +
 7 files changed, 394 insertions(+), 115 deletions(-)
 create mode 100644 lib/hash/arch/arm/compare_signatures.h
 create mode 100644 lib/hash/arch/common/compare_signatures.h
 create mode 100644 lib/hash/arch/x86/compare_signatures.h

-- 
2.25.1



[PATCH v7 1/4] hash: pack the hitmask for hash in bulk lookup

2024-03-12 Thread Yoan Picchi
Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.
In addition, the new dense hitmask interweave the primary
and secondary matches which allow a better cache usage and
enable future improvements for the SIMD implementations

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 .mailmap  |   2 +
 lib/hash/arch/arm/compare_signatures.h|  61 +++
 lib/hash/arch/common/compare_signatures.h |  38 +
 lib/hash/arch/x86/compare_signatures.h|  53 ++
 lib/hash/rte_cuckoo_hash.c| 192 --
 5 files changed, 255 insertions(+), 91 deletions(-)
 create mode 100644 lib/hash/arch/arm/compare_signatures.h
 create mode 100644 lib/hash/arch/common/compare_signatures.h
 create mode 100644 lib/hash/arch/x86/compare_signatures.h

diff --git a/.mailmap b/.mailmap
index 66ebc20666..00b50414d3 100644
--- a/.mailmap
+++ b/.mailmap
@@ -494,6 +494,7 @@ Hari Kumar Vemula 
 Harini Ramakrishnan 
 Hariprasad Govindharajan 
 Harish Patil  
+Harjot Singh 
 Harman Kalra 
 Harneet Singh 
 Harold Huang 
@@ -1633,6 +1634,7 @@ Yixue Wang 
 Yi Yang  
 Yi Zhang 
 Yoann Desmouceaux 
+Yoan Picchi 
 Yogesh Jangra 
 Yogev Chaimovich 
 Yongjie Gu 
diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
new file mode 100644
index 00..1af6ba8190
--- /dev/null
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * Arm's version uses a densely packed hitmask buffer:
+ * Every bit is in use.
+ */
+
+#include 
+#include 
+#include 
+#include "rte_cuckoo_hash.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept the whole hitmask if 
it is dense");
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+#if RTE_HASH_BUCKET_ENTRIES <= 8
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16_t low, high;
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   low = (uint16_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   high = (uint16_t)(vaddvq_u16(x));
+   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+
+   }
+   break;
+#endif
+   default:
+   for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *hitmask_buffer |=
+   ((sig == prim_bucket_sigs[i]) << i);
+   *hitmask_buffer |=
+   ((sig == sec_bucket_sigs[i]) << i) << 
RTE_HASH_BUCKET_ENTRIES;
+   }
+   }
+}
diff --git a/lib/hash/arch/common/compare_signatures.h 
b/lib/hash/arch/common/compare_signatures.h
new file mode 100644
index 00..dcf9444032
--- /dev/null
+++ b/lib/hash/arch/common/compare_signatures.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * The generic version could use either a dense or sparsely packed hitmask 
buffer,
+ * but the dense one is slightly faster.
+ */
+
+#include 
+#include 
+#include 
+#include "rte_cuckoo_hash.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   (void) sig_cmp_fn;
+
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept 

[PATCH v7 2/4] hash: optimize compare signature for NEON

2024-03-12 Thread Yoan Picchi
Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 lib/hash/arch/arm/compare_signatures.h | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
index 1af6ba8190..b5a457f936 100644
--- a/lib/hash/arch/arm/compare_signatures.h
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -30,23 +30,21 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
switch (sig_cmp_fn) {
 #if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
-   uint16_t low, high;
+   uint16x8_t vmat, hit1, hit2;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 
0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const *)&sig);
 
-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   low = (uint16_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
-   high = (uint16_t)(vaddvq_u16(x));
-   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);
 
+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);
}
break;
 #endif
-- 
2.25.1



[PATCH v7 3/4] test/hash: check bulk lookup of keys after collision

2024-03-12 Thread Yoan Picchi
This patch adds unit test for rte_hash_lookup_bulk().
It also update the test_full_bucket test to the current number of entries
in a hash bucket.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
 app/test/test_hash.c | 99 ++--
 1 file changed, 76 insertions(+), 23 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index d586878a22..4f871b3499 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -95,7 +95,7 @@ static uint32_t pseudo_hash(__rte_unused const void *keys,
__rte_unused uint32_t key_len,
__rte_unused uint32_t init_val)
 {
-   return 3;
+   return 3 | (3 << 16);
 }
 
 RTE_LOG_REGISTER(hash_logtype_test, test.hash, INFO);
@@ -115,8 +115,10 @@ static void print_key_info(const char *msg, const struct 
flow_key *key,
rte_log(RTE_LOG_DEBUG, hash_logtype_test, " @ pos %d\n", pos);
 }
 
+#define KEY_PER_BUCKET 8
+
 /* Keys used by unit test functions */
-static struct flow_key keys[5] = { {
+static struct flow_key keys[KEY_PER_BUCKET+1] = { {
.ip_src = RTE_IPV4(0x03, 0x02, 0x01, 0x00),
.ip_dst = RTE_IPV4(0x07, 0x06, 0x05, 0x04),
.port_src = 0x0908,
@@ -146,6 +148,30 @@ static struct flow_key keys[5] = { {
.port_src = 0x4948,
.port_dst = 0x4b4a,
.proto = 0x4c,
+}, {
+   .ip_src = RTE_IPV4(0x53, 0x52, 0x51, 0x50),
+   .ip_dst = RTE_IPV4(0x57, 0x56, 0x55, 0x54),
+   .port_src = 0x5958,
+   .port_dst = 0x5b5a,
+   .proto = 0x5c,
+}, {
+   .ip_src = RTE_IPV4(0x63, 0x62, 0x61, 0x60),
+   .ip_dst = RTE_IPV4(0x67, 0x66, 0x65, 0x64),
+   .port_src = 0x6968,
+   .port_dst = 0x6b6a,
+   .proto = 0x6c,
+}, {
+   .ip_src = RTE_IPV4(0x73, 0x72, 0x71, 0x70),
+   .ip_dst = RTE_IPV4(0x77, 0x76, 0x75, 0x74),
+   .port_src = 0x7978,
+   .port_dst = 0x7b7a,
+   .proto = 0x7c,
+}, {
+   .ip_src = RTE_IPV4(0x83, 0x82, 0x81, 0x80),
+   .ip_dst = RTE_IPV4(0x87, 0x86, 0x85, 0x84),
+   .port_src = 0x8988,
+   .port_dst = 0x8b8a,
+   .proto = 0x8c,
 } };
 
 /* Parameters used for hash table in unit test functions. Name set later. */
@@ -783,13 +809,15 @@ static int test_five_keys(void)
 
 /*
  * Add keys to the same bucket until bucket full.
- * - add 5 keys to the same bucket (hash created with 4 keys per bucket):
- *   first 4 successful, 5th successful, pushing existing item in bucket
- * - lookup the 5 keys: 5 hits
- * - add the 5 keys again: 5 OK
- * - lookup the 5 keys: 5 hits (updated data)
- * - delete the 5 keys: 5 OK
- * - lookup the 5 keys: 5 misses
+ * - add 9 keys to the same bucket (hash created with 8 keys per bucket):
+ *   first 8 successful, 9th successful, pushing existing item in bucket
+ * - lookup the 9 keys: 9 hits
+ * - bulk lookup for all the 9 keys: 9 hits
+ * - add the 9 keys again: 9 OK
+ * - lookup the 9 keys: 9 hits (updated data)
+ * - delete the 9 keys: 9 OK
+ * - lookup the 9 keys: 9 misses
+ * - bulk lookup for all the 9 keys: 9 misses
  */
 static int test_full_bucket(void)
 {
@@ -801,16 +829,17 @@ static int test_full_bucket(void)
.hash_func_init_val = 0,
.socket_id = 0,
};
+   const void *key_array[KEY_PER_BUCKET+1] = {0};
struct rte_hash *handle;
-   int pos[5];
-   int expected_pos[5];
+   int pos[KEY_PER_BUCKET+1];
+   int expected_pos[KEY_PER_BUCKET+1];
unsigned i;
-
+   int ret;
handle = rte_hash_create(¶ms_pseudo_hash);
RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
/* Fill bucket */
-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < KEY_PER_BUCKET; i++) {
pos[i] = rte_hash_add_key(handle, &keys[i]);
print_key_info("Add", &keys[i], pos[i]);
RETURN_IF_ERROR(pos[i] < 0,
@@ -821,22 +850,36 @@ static int test_full_bucket(void)
 * This should work and will push one of the items
 * in the bucket because it is full
 */
-   pos[4] = rte_hash_add_key(handle, &keys[4]);
-   print_key_info("Add", &keys[4], pos[4]);
-   RETURN_IF_ERROR(pos[4] < 0,
-   "failed to add key (pos[4]=%d)", pos[4]);
-   expected_pos[4] = pos[4];
+   pos[KEY_PER_BUCKET] = rte_hash_add_key(handle, &keys[KEY_PER_BUCKET]);
+   print_key_info("Add", &keys[KEY_PER_BUCKET], pos[KEY_PER_BUCKET]);
+   RETURN_IF_ERROR(pos[KEY_PER_BUCKET] < 0,
+   "failed to add key (pos[%d]=%d)", KEY_PER_BUCKET, 
pos[KEY_PER_BUCKET]);
+   expected_pos[KEY_PER_BUCKET] = pos[KEY_PER_BUCKET];
 
/* Lookup */
-   for (i = 0; i < 5; i++) {
+   for (i = 0; i <

[PATCH v7 4/4] hash: add SVE support for bulk key lookup

2024-03-12 Thread Yoan Picchi
- Implemented SVE code for comparing signatures in bulk lookup.
- New SVE code is ~5% slower than optimized NEON for N2 processor for
128b vectors.

Signed-off-by: Yoan Picchi 
Signed-off-by: Harjot Singh 
Reviewed-by: Nathan Brown 
Reviewed-by: Ruifeng Wang 
---
 lib/hash/arch/arm/compare_signatures.h | 58 ++
 lib/hash/rte_cuckoo_hash.c |  7 +++-
 lib/hash/rte_cuckoo_hash.h |  1 +
 3 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
index b5a457f936..8a0627e119 100644
--- a/lib/hash/arch/arm/compare_signatures.h
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
*hitmask_buffer = vaddvq_u16(hit2);
}
break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+   case RTE_HASH_COMPARE_SVE: {
+   svuint16_t vsign, shift, sv_matches;
+   svbool_t pred, match, bucket_wide_pred;
+   int i = 0;
+   uint64_t vl = svcnth();
+
+   vsign = svdup_u16(sig);
+   shift = svindex_u16(0, 1);
+
+   if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && 
RTE_HASH_BUCKET_ENTRIES <= 8) {
+   svuint16_t primary_array_vect, secondary_array_vect;
+   bucket_wide_pred = svwhilelt_b16(0, 
RTE_HASH_BUCKET_ENTRIES);
+   primary_array_vect = svld1_u16(bucket_wide_pred, 
prim_bucket_sigs);
+   secondary_array_vect = svld1_u16(bucket_wide_pred, 
sec_bucket_sigs);
+
+   /* We merged the two vectors so we can do both 
comparison at once */
+   primary_array_vect = svsplice_u16(bucket_wide_pred,
+   primary_array_vect,
+   secondary_array_vect);
+   pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
+
+   /* Compare all signatures in the buckets */
+   match = svcmpeq_u16(pred, vsign, primary_array_vect);
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, sv_matches, 
shift);
+   *hitmask_buffer = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   } else {
+   do {
+   pred = svwhilelt_b16(i, 
RTE_HASH_BUCKET_ENTRIES);
+   uint16_t lower_half = 0;
+   uint16_t upper_half = 0;
+   /* Compare all signatures in the primary bucket 
*/
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &prim_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   lower_half = svorv_u16(svptrue_b16(), 
sv_matches);
+   }
+   /* Compare all signatures in the secondary 
bucket */
+   match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
+   &sec_bucket_sigs[i]));
+   if (svptest_any(svptrue_b16(), match)) {
+   sv_matches = svdup_u16(1);
+   sv_matches = svlsl_u16_z(match, 
sv_matches, shift);
+   upper_half = svorv_u16(svptrue_b16(), 
sv_matches)
+   << RTE_HASH_BUCKET_ENTRIES;
+   }
+   hitmask_buffer[i/8] = upper_half | lower_half;
+   i += vl;
+   } while (i < RTE_HASH_BUCKET_ENTRIES);
+   }
+   }
+   break;
 #endif
default:
for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 0697743cdf..75f555ba2c 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -450,8 +450,13 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #elif defined(RTE_ARCH_ARM64)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+#if defined(RTE_HAS_SVE_ACLE)
+   if (rte_cpu_get_flag_enabled(RTE

Re: [PATCH v7 1/4] hash: pack the hitmask for hash in bulk lookup

2024-03-19 Thread Yoan Picchi

On 3/19/24 10:41, Konstantin Ananyev wrote:


Hi,


Current hitmask includes padding due to Intel's SIMD
implementation detail. This patch allows non Intel SIMD
implementations to benefit from a dense hitmask.
In addition, the new dense hitmask interweave the primary
and secondary matches which allow a better cache usage and
enable future improvements for the SIMD implementations

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
  .mailmap  |   2 +
  lib/hash/arch/arm/compare_signatures.h|  61 +++
  lib/hash/arch/common/compare_signatures.h |  38 +
  lib/hash/arch/x86/compare_signatures.h|  53 ++
  lib/hash/rte_cuckoo_hash.c| 192 --
  5 files changed, 255 insertions(+), 91 deletions(-)
  create mode 100644 lib/hash/arch/arm/compare_signatures.h
  create mode 100644 lib/hash/arch/common/compare_signatures.h
  create mode 100644 lib/hash/arch/x86/compare_signatures.h

diff --git a/.mailmap b/.mailmap
index 66ebc20666..00b50414d3 100644
--- a/.mailmap
+++ b/.mailmap
@@ -494,6 +494,7 @@ Hari Kumar Vemula 
  Harini Ramakrishnan 
  Hariprasad Govindharajan 
  Harish Patil  
+Harjot Singh 
  Harman Kalra 
  Harneet Singh 
  Harold Huang 
@@ -1633,6 +1634,7 @@ Yixue Wang 
  Yi Yang  
  Yi Zhang 
  Yoann Desmouceaux 
+Yoan Picchi 
  Yogesh Jangra 
  Yogev Chaimovich 
  Yongjie Gu 
diff --git a/lib/hash/arch/arm/compare_signatures.h 
b/lib/hash/arch/arm/compare_signatures.h
new file mode 100644
index 00..1af6ba8190
--- /dev/null
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * Arm's version uses a densely packed hitmask buffer:
+ * Every bit is in use.
+ */
+
+#include 
+#include 
+#include 
+#include "rte_cuckoo_hash.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
+   "The hitmask must be exactly wide enough to accept the whole hitmask if it 
is dense");
+
+   /* For match mask every bits indicates the match */
+   switch (sig_cmp_fn) {
+#if RTE_HASH_BUCKET_ENTRIES <= 8
+   case RTE_HASH_COMPARE_NEON: {
+   uint16x8_t vmat, vsig, x;
+   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
+   uint16_t low, high;
+
+   vsig = vld1q_dup_u16((uint16_t const *)&sig);
+   /* Compare all signatures in the primary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)prim_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   low = (uint16_t)(vaddvq_u16(x));
+   /* Compare all signatures in the secondary bucket */
+   vmat = vceqq_u16(vsig,
+   vld1q_u16((uint16_t const *)sec_bucket_sigs));
+   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
+   high = (uint16_t)(vaddvq_u16(x));
+   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+
+   }
+   break;
+#endif
+   default:
+   for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *hitmask_buffer |=
+   ((sig == prim_bucket_sigs[i]) << i);
+   *hitmask_buffer |=
+   ((sig == sec_bucket_sigs[i]) << i) << 
RTE_HASH_BUCKET_ENTRIES;
+   }
+   }
+}
diff --git a/lib/hash/arch/common/compare_signatures.h 
b/lib/hash/arch/common/compare_signatures.h
new file mode 100644
index 00..dcf9444032
--- /dev/null
+++ b/lib/hash/arch/common/compare_signatures.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2018-2024 Arm Limited
+ */
+
+/*
+ * The generic version could use either a dense or sparsely packed hitmask 
buffer,
+ * but the dense one is slightly faster.
+ */
+
+#include 
+#include 
+#include 
+#include "rte_cuckoo_hash.h"
+
+#define DENSE_HASH_BULK_LOOKUP 1
+
+static inline void
+compare_signatures_dense(uint16_t *hitmask_buffer,
+   const uint16_t *prim_bucket_sigs,
+   const uint16_t *sec_bucket_sigs,
+   uint16_t sig,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   (void) sig_cmp_fn;
+
+   static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_

Re: [EXTERNAL] [PATCH v7 2/4] hash: optimize compare signature for NEON

2024-04-11 Thread Yoan Picchi

On 3/20/24 07:37, Pavan Nikhilesh Bhagavatula wrote:

Upon a successful comparison, NEON sets all the bits in the lane to 1
We can skip shifting by simply masking with specific masks.

Signed-off-by: Yoan Picchi 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Nathan Brown 
---
  lib/hash/arch/arm/compare_signatures.h | 24 +++-
  1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/lib/hash/arch/arm/compare_signatures.h
b/lib/hash/arch/arm/compare_signatures.h
index 1af6ba8190..b5a457f936 100644
--- a/lib/hash/arch/arm/compare_signatures.h
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -30,23 +30,21 @@ compare_signatures_dense(uint16_t
*hitmask_buffer,
switch (sig_cmp_fn) {
  #if RTE_HASH_BUCKET_ENTRIES <= 8
case RTE_HASH_COMPARE_NEON: {
-   uint16x8_t vmat, vsig, x;
-   int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
-   uint16_t low, high;
+   uint16x8_t vmat, hit1, hit2;
+   const uint16x8_t mask = {0x1, 0x2, 0x4, 0x8, 0x10, 0x20,
0x40, 0x80};
+   const uint16x8_t vsig = vld1q_dup_u16((uint16_t const
*)&sig);

-   vsig = vld1q_dup_u16((uint16_t const *)&sig);
/* Compare all signatures in the primary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)prim_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)),
shift);
-   low = (uint16_t)(vaddvq_u16(x));
+   vmat = vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs));
+   hit1 = vandq_u16(vmat, mask);
+
/* Compare all signatures in the secondary bucket */
-   vmat = vceqq_u16(vsig,
-   vld1q_u16((uint16_t const *)sec_bucket_sigs));
-   x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)),
shift);
-   high = (uint16_t)(vaddvq_u16(x));
-   *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
+   vmat = vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs));
+   hit2 = vandq_u16(vmat, mask);

+   hit2 = vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES);
+   hit2 = vorrq_u16(hit1, hit2);
+   *hitmask_buffer = vaddvq_u16(hit2);


Since vaddv is expensive could you convert it to vshrn?

https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon

https://github.com/DPDK/dpdk/blob/main/examples/l3fwd/l3fwd_neon.h#L226


Thank you for those links, it was a good read.
Unfortunatly I don't think it is a good use case here. A decent part of 
the speedup I get is by using a dense hitmask: ie every bit count with 
no padding. Using the vshrn would have 4 bits of padding, and stripping 
them would be more expensive than using a regular reduce.





}
break;
  #endif
--
2.25.1






[PATCH v8 0/4] hash: add SVE support for bulk key lookup

2024-04-17 Thread Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

V4->V5:
  Commit message

V5->V6:
  Move the arch-specific code into new arch-specific files
  Isolate the data struture refactor from adding SVE

V6->V7:
  Commit message
  Moved RTE_HASH_COMPARE_SVE to the last commit of the chain

V7->V8:
  Commit message
  Typos and missing spaces

Yoan Picchi (4):
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap  |   2 +
 app/test/test_hash.c  |  99 ---
 lib/hash/arch/arm/compare_signatures.h| 117 +
 lib/hash/arch/common/compare_signatures.h |  38 +
 lib/hash/arch/x86/compare_signatures.h|  53 ++
 lib/hash/rte_cuckoo_hash.c| 199 --
 lib/hash/rte_cuckoo_hash.h|   1 +
 7 files changed, 394 insertions(+), 115 deletions(-)
 create mode 100644 lib/hash/arch/arm/compare_signatures.h
 create mode 100644 lib/hash/arch/common/compare_signatures.h
 create mode 100644 lib/hash/arch/x86/compare_signatures.h

-- 
2.25.1



  1   2   >