[dpdk-dev] [PATCH v7 18/18] acl: add checks for max SIMD bitwidth

Ciara Power Thu, 15 Oct 2020 08:31:31 -0700

When choosing a vector path to take, an extra condition must be
satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
path. These checks are added in the check alg helper functions.


Cc: Konstantin Ananyev <konstantin.anan...@intel.com>

Signed-off-by: Ciara Power <ciara.po...@intel.com>

---
v7:
  - Removed global variable for max SIMD bitwidth.
  - Added helper function for checking AVX512 cpu flags.
  - Separated condition checking for the AVX512 algorithms to allow for
    checking 256/512 max SIMD bitwidth, respectively.
  - Added to docs to reflect the added changes in algorithm selection.
---
 .../prog_guide/packet_classif_access_ctrl.rst | 14 ++++--
 lib/librte_acl/rte_acl.c                      | 48 ++++++++++++++-----
 lib/librte_acl/rte_acl.h                      |  1 +
 3 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/doc/guides/prog_guide/packet_classif_access_ctrl.rst 
b/doc/guides/prog_guide/packet_classif_access_ctrl.rst
index 7659af8eb5..72c193b17f 100644
--- a/doc/guides/prog_guide/packet_classif_access_ctrl.rst
+++ b/doc/guides/prog_guide/packet_classif_access_ctrl.rst
@@ -368,34 +368,40 @@ After rte_acl_build() over given AC context has finished 
successfully, it can be
 There are several implementations of classify algorithm:
 
 *   **RTE_ACL_CLASSIFY_SCALAR**: generic implementation, doesn't require any 
specific HW support.
+    Requires max SIMD bitwidth to be at least 64.
 
 *   **RTE_ACL_CLASSIFY_SSE**: vector implementation, can process up to 8 flows 
in parallel. Requires SSE 4.1 support.
+    Requires max SIMD bitwidth to be at least 128.
 
 *   **RTE_ACL_CLASSIFY_AVX2**: vector implementation, can process up to 16 
flows in parallel. Requires AVX2 support.
+    Requires max SIMD bitwidth to be at least 256.
 
 *   **RTE_ACL_CLASSIFY_NEON**: vector implementation, can process up to 8 flows
-    in parallel. Requires NEON support.
+    in parallel. Requires NEON support. Requires max SIMD bitwidth to be at 
least 128.
 
 *   **RTE_ACL_CLASSIFY_ALTIVEC**: vector implementation, can process up to 8
-    flows in parallel. Requires ALTIVEC support.
+    flows in parallel. Requires ALTIVEC support. Requires max SIMD bitwidth to 
be at least 128.
 
 *   **RTE_ACL_CLASSIFY_AVX512X16**: vector implementation, can process up to 16
     flows in parallel. Uses 256-bit width SIMD registers.
-    Requires AVX512 support.
+    Requires AVX512 support. Requires max SIMD bitwidth to be at least 256.
 
 *   **RTE_ACL_CLASSIFY_AVX512X32**: vector implementation, can process up to 32
     flows in parallel. Uses 512-bit width SIMD registers.
-    Requires AVX512 support.
+    Requires AVX512 support. Requires max SIMD bitwidth to be at least 512.
 
 It is purely a runtime decision which method to choose, there is no build-time 
difference.
 All implementations operates over the same internal RT structures and use 
similar principles. The main difference is that vector implementations can 
manually exploit IA SIMD instructions and process several input data flows in 
parallel.
 At startup ACL library determines the highest available classify method for 
the given platform and sets it as default one. Though the user has an ability 
to override the default classifier function for a given ACL context or perform 
particular search using non-default classify method. In that case it is user 
responsibility to make sure that given platform supports selected classify 
implementation.
+The max SIMD bitwidth value set in EAL is also taken into consideration when 
determining if a classify method is supported, see :ref:`max_simd_bitwidth` for 
more information.
 
 .. note::
 
      Right now ``RTE_ACL_CLASSIFY_AVX512X32`` is not selected by default
      (due to possible frequency level change), but it can be selected at
      runtime by apps through the use of ACL API: ``rte_acl_set_ctx_classify``.
+     The max SIMD bitwidth value will also need to be set to 512 to enable 
this classify method.
+     See :doc:`../howto/avx512` for more information about setting this value.
 
 Application Programming Interface (API) Usage
 ---------------------------------------------
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index 7c2f60b2d6..026d2e7430 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -114,9 +114,13 @@ acl_check_alg_arm(enum rte_acl_classify_alg alg)
 {
        if (alg == RTE_ACL_CLASSIFY_NEON) {
 #if defined(RTE_ARCH_ARM64)
-               return 0;
+               if (rte_get_max_simd_bitwidth() >= RTE_SIMD_128)
+                       return 0;
+               else
+                       return -ENOTSUP;
 #elif defined(RTE_ARCH_ARM)
-               if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+               if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON) &&
+                               rte_get_max_simd_bitwidth() >= RTE_SIMD_128)
                        return 0;
                return -ENOTSUP;
 #else
@@ -136,7 +140,10 @@ acl_check_alg_ppc(enum rte_acl_classify_alg alg)
 {
        if (alg == RTE_ACL_CLASSIFY_ALTIVEC) {
 #if defined(RTE_ARCH_PPC_64)
-               return 0;
+               if (rte_get_max_simd_bitwidth() >= RTE_SIMD_128)
+                       return 0;
+               else
+                       return -ENOTSUP;
 #else
                return -ENOTSUP;
 #endif
@@ -145,6 +152,17 @@ acl_check_alg_ppc(enum rte_acl_classify_alg alg)
        return -EINVAL;
 }
 
+#ifdef CC_AVX512_SUPPORT
+static int
+acl_check_avx512_cpu_flags(void)
+{
+       return (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) &&
+                       rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) &&
+                       rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512CD) &&
+                       rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW));
+}
+#endif
+
 /*
  * Helper function for acl_check_alg.
  * Check support for x86 specific classify methods.
@@ -152,13 +170,19 @@ acl_check_alg_ppc(enum rte_acl_classify_alg alg)
 static int
 acl_check_alg_x86(enum rte_acl_classify_alg alg)
 {
-       if (alg == RTE_ACL_CLASSIFY_AVX512X16 ||
-                       alg == RTE_ACL_CLASSIFY_AVX512X32) {
+       if (alg == RTE_ACL_CLASSIFY_AVX512X32) {
 #ifdef CC_AVX512_SUPPORT
-               if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) &&
-                       rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) &&
-                       rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512CD) &&
-                       rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW))
+               if (acl_check_avx512_cpu_flags() != 0 &&
+                       rte_get_max_simd_bitwidth() >= RTE_SIMD_512)
+                       return 0;
+#endif
+               return -ENOTSUP;
+       }
+
+       if (alg == RTE_ACL_CLASSIFY_AVX512X16) {
+#ifdef CC_AVX512_SUPPORT
+               if (acl_check_avx512_cpu_flags() != 0 &&
+                       rte_get_max_simd_bitwidth() >= RTE_SIMD_256)
                        return 0;
 #endif
                return -ENOTSUP;
@@ -166,7 +190,8 @@ acl_check_alg_x86(enum rte_acl_classify_alg alg)
 
        if (alg == RTE_ACL_CLASSIFY_AVX2) {
 #ifdef CC_AVX2_SUPPORT
-               if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
+               if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) &&
+                               rte_get_max_simd_bitwidth() >= RTE_SIMD_256)
                        return 0;
 #endif
                return -ENOTSUP;
@@ -174,7 +199,8 @@ acl_check_alg_x86(enum rte_acl_classify_alg alg)
 
        if (alg == RTE_ACL_CLASSIFY_SSE) {
 #ifdef RTE_ARCH_X86
-               if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
+               if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1) &&
+                               rte_get_max_simd_bitwidth() >= RTE_SIMD_128)
                        return 0;
 #endif
                return -ENOTSUP;
diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h
index 1bfed00743..f7f5f08701 100644
--- a/lib/librte_acl/rte_acl.h
+++ b/lib/librte_acl/rte_acl.h
@@ -329,6 +329,7 @@ rte_acl_classify_alg(const struct rte_acl_ctx *ctx,
  *   New default classify algorithm for given ACL context.
  *   It is the caller responsibility to ensure that the value refers to the
  *   existing algorithm, and that it could be run on the given CPU.
+ *   The max SIMD bitwidth value in EAL must also allow for the chosen 
algorithm.
  * @return
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOTSUP requested algorithm is not supported by given platform.
-- 
2.22.0

[dpdk-dev] [PATCH v7 18/18] acl: add checks for max SIMD bitwidth

Reply via email to