On 10/9/2024 8:48 AM, Kusztal, ArkadiuszX wrote: > > >> -----Original Message----- >> From: Ferruh Yigit <ferruh.yi...@amd.com> >> Sent: Wednesday, October 9, 2024 3:03 AM >> To: Kusztal, ArkadiuszX <arkadiuszx.kusz...@intel.com>; Marchand, David >> <david.march...@redhat.com> >> Cc: dev@dpdk.org; Ji, Kai <kai...@intel.com>; Dooley, Brian >> <brian.doo...@intel.com> >> Subject: Re: [PATCH v2 1/3] net: add thread-safe crc api >> >> On 10/8/2024 9:51 PM, Kusztal, ArkadiuszX wrote: >>> Hi Ferruh, >>> Thanks for the review, comments inline, >>> >>>> -----Original Message----- >>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>> Sent: Tuesday, October 8, 2024 5:43 AM >>>> To: Kusztal, ArkadiuszX <arkadiuszx.kusz...@intel.com>; Marchand, >>>> David <david.march...@redhat.com> >>>> Cc: dev@dpdk.org; Ji, Kai <kai...@intel.com>; Dooley, Brian >>>> <brian.doo...@intel.com> >>>> Subject: Re: [PATCH v2 1/3] net: add thread-safe crc api >>>> >>>> On 10/1/2024 7:11 PM, Arkadiusz Kusztal wrote: >>>>> The current net CRC API is not thread-safe, this patch solves this >>>>> by adding another, thread-safe API functions. >>>>> This API is also safe to use across multiple processes, yet with >>>>> limitations on max-simd-bitwidth, which will be checked only by the >>>>> process that created the CRC context; all other processes will use >>>>> the same CRC function when used with the same CRC context. >>>>> It is an undefined behavior when process binaries are compiled with >>>>> different SIMD capabilities when the same CRC context is used. >>>>> >>>>> Signed-off-by: Arkadiusz Kusztal <arkadiuszx.kusz...@intel.com> >>>> >>>> <...> >>>> >>>>> +static struct >>>>> +{ >>>>> + uint32_t (*f[RTE_NET_CRC_REQS]) >>>>> + (const uint8_t *data, uint32_t data_len); >>>>> >>>> >>>> It increases readability to typedef function pointers. >>> >>> Agree, though this typedef would be used here only, that’s why I left it >>> out. >>> But I can add it then. >>> >>>> >>>> >>>>> +} handlers[RTE_NET_CRC_AVX512 + 1]; >>>>> >>>>> -/** >>>>> - * Reflect the bits about the middle >>>>> - * >>>>> - * @param val >>>>> - * value to be reflected >>>>> - * >>>>> - * @return >>>>> - * reflected value >>>>> - */ >>>>> -static uint32_t >>>>> +static inline uint32_t >>>>> >>>> >>>> Does changing to 'inline' required, as function is static compiler >>>> can do the same. >>> >>> True, though it may be more readable sometimes. >>> Of course there is no way that in O3 these functions would not be inlined by >> the compiler, regardless if the inline hint is present or not. >>> >>>> >>>>> reflect_32bits(uint32_t val) >>>>> { >>>>> uint32_t i, res = 0; >>>>> @@ -99,26 +43,7 @@ reflect_32bits(uint32_t val) >>>>> return res; >>>>> } >>>>> >>>>> -static void >>>>> -crc32_eth_init_lut(uint32_t poly, >>>>> - uint32_t *lut) >>>>> -{ >>>>> - uint32_t i, j; >>>>> - >>>>> - for (i = 0; i < CRC_LUT_SIZE; i++) { >>>>> - uint32_t crc = reflect_32bits(i); >>>>> - >>>>> - for (j = 0; j < 8; j++) { >>>>> - if (crc & 0x80000000L) >>>>> - crc = (crc << 1) ^ poly; >>>>> - else >>>>> - crc <<= 1; >>>>> - } >>>>> - lut[i] = reflect_32bits(crc); >>>>> - } >>>>> -} >>>>> - >>>>> -static __rte_always_inline uint32_t >>>>> +static inline uint32_t >>>>> >>>> >>>> Why not forcing inline anymore? >>>> Are these inline changes related to the thread-safety? >>> >>> O3 will inline it anyway, and with always_inline it will be inline even in >>> debug >> mode. I just see no reason forcing it upon the compiler. >>> >>>> >>>>> crc32_eth_calc_lut(const uint8_t *data, >>>>> uint32_t data_len, >>>>> uint32_t crc, >>>>> @@ -130,20 +55,9 @@ crc32_eth_calc_lut(const uint8_t *data, >>>>> return crc; >>>>> } >>>>> >>>>> -static void >>>>> -rte_net_crc_scalar_init(void) >>>>> -{ >>>>> - /* 32-bit crc init */ >>>>> - crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut); >>>>> - >>>>> - /* 16-bit CRC init */ >>>>> - crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut); >>>>> -} >>>>> - >>>>> static inline uint32_t >>>>> -rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len) >>>>> +crc16_ccitt(const uint8_t *data, uint32_t data_len) >>>>> { >>>>> - /* return 16-bit CRC value */ >>>>> >>>> >>>> Why not keep comments? Are they wrong? >>> >>> Functions names are very self-explanatory, that’s why I dropped comments. I >> can add comments if needed. >>> >> >> I am for restricting changes to the target of the patch which is making CRC >> calculation thread safe, unless code changes invalidates the comments, lets >> keep them. Same goes with inline related modifications. >> >>>> >>>> <...> >>>> >>>>> +static void >>>>> +crc_scalar_init(void) >>>>> +{ >>>>> + crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut); >>>>> + crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut); >>>>> + >>>>> + handlers[RTE_NET_CRC_SCALAR].f[RTE_NET_CRC16_CCITT] = >>>> crc16_ccitt; >>>>> + handlers[RTE_NET_CRC_SCALAR].f[RTE_NET_CRC32_ETH] = crc32_eth; >>>>> >>>> >>>> +1 to remove global handlers pointer and add context, >>>> >>>> But current handlers array content is static, it can be set when >>>> defined, instead of functions. >>> >>> Can do it for scalars, but for SIMD there is this runtime check like this: >>> if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) { So compiled binary on >> AVX512 >>> machine could filter it out on the machine which does not support it. >>> There is no NULL check in crc function so it would not change much when >> called -> Invalid address vs Invalid instruction. >>> There are not many checks there, as this is CRC after all, it should be a >>> small as >> possible, yet probably NULL check could be advisable in crc function then. >>> >> >> There is already AVX512_VPCLMULQDQ_CPU_SUPPORTED etc checks in >> 'rte_net_crc_set()'. >> So does it work to update 'handlers' statically, without condition, but have >> conditions when use them. >> >>> >>>> >>>> <...> >>>> >>>>> -static uint32_t >>>>> -rte_crc32_eth_default_handler(const uint8_t *data, uint32_t >>>>> data_len) >>>>> +struct rte_net_crc rte_net_crc_set(enum rte_net_crc_alg alg, >>>>> + enum rte_net_crc_type type) >>>>> { >>>>> - handlers = NULL; >>>>> - if (max_simd_bitwidth == 0) >>>>> - max_simd_bitwidth = rte_vect_get_max_simd_bitwidth(); >>>>> - >>>>> - handlers = avx512_vpclmulqdq_get_handlers(); >>>>> - if (handlers != NULL) >>>>> - return handlers[RTE_NET_CRC32_ETH](data, data_len); >>>>> - handlers = sse42_pclmulqdq_get_handlers(); >>>>> - if (handlers != NULL) >>>>> - return handlers[RTE_NET_CRC32_ETH](data, data_len); >>>>> - handlers = neon_pmull_get_handlers(); >>>>> - if (handlers != NULL) >>>>> - return handlers[RTE_NET_CRC32_ETH](data, data_len); >>>>> - handlers = handlers_scalar; >>>>> - return handlers[RTE_NET_CRC32_ETH](data, data_len); >>>>> -} >>>>> + uint16_t max_simd_bitwidth; >>>>> >>>>> -/* Public API */ >>>>> - >>>>> -void >>>>> -rte_net_crc_set_alg(enum rte_net_crc_alg alg) -{ >>>>> - handlers = NULL; >>>>> - if (max_simd_bitwidth == 0) >>>>> - max_simd_bitwidth = rte_vect_get_max_simd_bitwidth(); >>>>> + max_simd_bitwidth = rte_vect_get_max_simd_bitwidth(); >>>>> >>>>> switch (alg) { >>>>> case RTE_NET_CRC_AVX512: >>>>> - handlers = avx512_vpclmulqdq_get_handlers(); >>>>> - if (handlers != NULL) >>>>> - break; >>>>> +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT >>>>> + if (AVX512_VPCLMULQDQ_CPU_SUPPORTED && >>>>> + max_simd_bitwidth >= RTE_VECT_SIMD_512) { >>>>> + return (struct rte_net_crc){ RTE_NET_CRC_AVX512, >>>> type }; >>>>> + } >>>>> +#endif >>>>> /* fall-through */ >>>>> case RTE_NET_CRC_SSE42: >>>>> - handlers = sse42_pclmulqdq_get_handlers(); >>>>> - break; /* for x86, always break here */ >>>>> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT >>>>> + if (SSE42_PCLMULQDQ_CPU_SUPPORTED && >>>>> + max_simd_bitwidth >= RTE_VECT_SIMD_128) { >>>>> + return (struct rte_net_crc){ RTE_NET_CRC_SSE42, type >>>> }; >>>>> + } >>>>> +#endif >>>>> + break; >>>>> case RTE_NET_CRC_NEON: >>>>> - handlers = neon_pmull_get_handlers(); >>>>> - /* fall-through */ >>>>> - case RTE_NET_CRC_SCALAR: >>>>> - /* fall-through */ >>>>> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT >>>>> + if (NEON_PMULL_CPU_SUPPORTED && >>>>> + max_simd_bitwidth >= RTE_VECT_SIMD_128) { >>>>> + return (struct rte_net_crc){ RTE_NET_CRC_NEON, type >>>> }; >>>>> + } >>>>> +#endif >>>>> >>>> >>>> Is it more readable as following, up to you: >>> >>> Agree, I will change it. >>> >>>> >>>> ``` >>>> rte_net_crc_set(alg, type) { >>>> enum rte_net_crc_alg new_alg = RTE_NET_CRC_SCALAR; >>>> switch (alg) { >>>> case AVX512: >>>> new_alg = .. >>>> case NEON: >>>> new_alg = .. >>>> } >>>> return struct rte_net_crc){ new_alg, type }; ``` >>>> >>>> >>>> >>>> >>>>> + break; >>>>> default: >>>>> break; >>>>> } >>>>> - >>>>> - if (handlers == NULL) >>>>> - handlers = handlers_scalar; >>>>> + return (struct rte_net_crc){ RTE_NET_CRC_SCALAR, type }; >>>>> } >>>>> >>>>> -uint32_t >>>>> -rte_net_crc_calc(const void *data, >>>>> - uint32_t data_len, >>>>> - enum rte_net_crc_type type) >>>>> +uint32_t rte_net_crc(const struct rte_net_crc *ctx, >>>>> + const void *data, const uint32_t data_len) >>>>> { >>>>> - uint32_t ret; >>>>> - rte_net_crc_handler f_handle; >>>>> - >>>>> - f_handle = handlers[type]; >>>>> - ret = f_handle(data, data_len); >>>>> - >>>>> - return ret; >>>>> + return handlers[ctx->alg].f[ctx->type](data, data_len); >>>>> >>>> >>>> 'rte_net_crc()' gets input from user and "struct rte_net_crc" is not >>>> opaque, so user can provide invalid input, ctx->alg & ctx->type. >>>> To protect against it input values should be checked before using. >>>> >>>> Or I think user not need to know the details of the "struct >>>> rte_net_crc", so it can be an opaque variable for user. >>> >>> I would love it to be opaque, but then I would have to return a pointer, >>> which >> then would involve some allocations/deallocations and I wanted to keep is as >> simple as possible. >>> So probably the checks would be a way to go. >>> >> >> True, +1 for simplicity. >> >>>> >>>> <...> >>>> >>>>> -/** >>>>> - * CRC compute API >>>>> - * >>>>> - * @param data >>>>> - * Pointer to the packet data for CRC computation >>>>> - * @param data_len >>>>> - * Data length for CRC computation >>>>> - * @param type >>>>> - * CRC type (enum rte_net_crc_type) >>>>> - * >>>>> - * @return >>>>> - * CRC value >>>>> - */ >>>>> -uint32_t >>>>> -rte_net_crc_calc(const void *data, >>>>> - uint32_t data_len, >>>>> +struct rte_net_crc rte_net_crc_set(enum rte_net_crc_alg alg, >>>>> enum rte_net_crc_type type); >>>>> >>>>> +uint32_t rte_net_crc(const struct rte_net_crc *ctx, >>>>> + const void *data, const uint32_t data_len); >>>>> + >>>>> >>>> >>>> As these are APIs, can you please add doxygen comments to them? >>> +1 >>> I think this change could be deferred to 25.03. >>> Adding this API without removing the old one should be possible without any >> unwanted consequences? >>> >> >> This is not a new functionality but replacement of existing one, so it will >> be >> confusing to have two set of APIs for same functionality with similar names: >> rte_net_crc_calc() and rte_net_crc() >> rte_net_crc_set_alg() and rte_net_crc_set() >> >> Also there are some internal functions used by these APIs and supporting both >> new and old may force to have two version of these internal functions and it >> will >> create complexity/noise. >> >> As this release is ABI break release, it is easier to update APIs (although >> deprecation notice is missing for this change). >> >> >> As an alternative option, do you think applying ABI versioning in 25.03 >> works for >> these APIs? >> If so, old version can be cleaned in v25.11. > > I would go with versioning. I can preserve old function names as it's > basically the same functionality. >
ack