> -----Original Message----- > From: Ferruh Yigit <ferruh.yi...@amd.com> > Sent: Wednesday, October 9, 2024 3:03 AM > To: Kusztal, ArkadiuszX <arkadiuszx.kusz...@intel.com>; Marchand, David > <david.march...@redhat.com> > Cc: dev@dpdk.org; Ji, Kai <kai...@intel.com>; Dooley, Brian > <brian.doo...@intel.com> > Subject: Re: [PATCH v2 1/3] net: add thread-safe crc api > > On 10/8/2024 9:51 PM, Kusztal, ArkadiuszX wrote: > > Hi Ferruh, > > Thanks for the review, comments inline, > > > >> -----Original Message----- > >> From: Ferruh Yigit <ferruh.yi...@amd.com> > >> Sent: Tuesday, October 8, 2024 5:43 AM > >> To: Kusztal, ArkadiuszX <arkadiuszx.kusz...@intel.com>; Marchand, > >> David <david.march...@redhat.com> > >> Cc: dev@dpdk.org; Ji, Kai <kai...@intel.com>; Dooley, Brian > >> <brian.doo...@intel.com> > >> Subject: Re: [PATCH v2 1/3] net: add thread-safe crc api > >> > >> On 10/1/2024 7:11 PM, Arkadiusz Kusztal wrote: > >>> The current net CRC API is not thread-safe, this patch solves this > >>> by adding another, thread-safe API functions. > >>> This API is also safe to use across multiple processes, yet with > >>> limitations on max-simd-bitwidth, which will be checked only by the > >>> process that created the CRC context; all other processes will use > >>> the same CRC function when used with the same CRC context. > >>> It is an undefined behavior when process binaries are compiled with > >>> different SIMD capabilities when the same CRC context is used. > >>> > >>> Signed-off-by: Arkadiusz Kusztal <arkadiuszx.kusz...@intel.com> > >> > >> <...> > >> > >>> +static struct > >>> +{ > >>> + uint32_t (*f[RTE_NET_CRC_REQS]) > >>> + (const uint8_t *data, uint32_t data_len); > >>> > >> > >> It increases readability to typedef function pointers. > > > > Agree, though this typedef would be used here only, that’s why I left it > > out. > > But I can add it then. > > > >> > >> > >>> +} handlers[RTE_NET_CRC_AVX512 + 1]; > >>> > >>> -/** > >>> - * Reflect the bits about the middle > >>> - * > >>> - * @param val > >>> - * value to be reflected > >>> - * > >>> - * @return > >>> - * reflected value > >>> - */ > >>> -static uint32_t > >>> +static inline uint32_t > >>> > >> > >> Does changing to 'inline' required, as function is static compiler > >> can do the same. > > > > True, though it may be more readable sometimes. > > Of course there is no way that in O3 these functions would not be inlined by > the compiler, regardless if the inline hint is present or not. > > > >> > >>> reflect_32bits(uint32_t val) > >>> { > >>> uint32_t i, res = 0; > >>> @@ -99,26 +43,7 @@ reflect_32bits(uint32_t val) > >>> return res; > >>> } > >>> > >>> -static void > >>> -crc32_eth_init_lut(uint32_t poly, > >>> - uint32_t *lut) > >>> -{ > >>> - uint32_t i, j; > >>> - > >>> - for (i = 0; i < CRC_LUT_SIZE; i++) { > >>> - uint32_t crc = reflect_32bits(i); > >>> - > >>> - for (j = 0; j < 8; j++) { > >>> - if (crc & 0x80000000L) > >>> - crc = (crc << 1) ^ poly; > >>> - else > >>> - crc <<= 1; > >>> - } > >>> - lut[i] = reflect_32bits(crc); > >>> - } > >>> -} > >>> - > >>> -static __rte_always_inline uint32_t > >>> +static inline uint32_t > >>> > >> > >> Why not forcing inline anymore? > >> Are these inline changes related to the thread-safety? > > > > O3 will inline it anyway, and with always_inline it will be inline even in > > debug > mode. I just see no reason forcing it upon the compiler. > > > >> > >>> crc32_eth_calc_lut(const uint8_t *data, > >>> uint32_t data_len, > >>> uint32_t crc, > >>> @@ -130,20 +55,9 @@ crc32_eth_calc_lut(const uint8_t *data, > >>> return crc; > >>> } > >>> > >>> -static void > >>> -rte_net_crc_scalar_init(void) > >>> -{ > >>> - /* 32-bit crc init */ > >>> - crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut); > >>> - > >>> - /* 16-bit CRC init */ > >>> - crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut); > >>> -} > >>> - > >>> static inline uint32_t > >>> -rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len) > >>> +crc16_ccitt(const uint8_t *data, uint32_t data_len) > >>> { > >>> - /* return 16-bit CRC value */ > >>> > >> > >> Why not keep comments? Are they wrong? > > > > Functions names are very self-explanatory, that’s why I dropped comments. I > can add comments if needed. > > > > I am for restricting changes to the target of the patch which is making CRC > calculation thread safe, unless code changes invalidates the comments, lets > keep them. Same goes with inline related modifications. > > >> > >> <...> > >> > >>> +static void > >>> +crc_scalar_init(void) > >>> +{ > >>> + crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut); > >>> + crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut); > >>> + > >>> + handlers[RTE_NET_CRC_SCALAR].f[RTE_NET_CRC16_CCITT] = > >> crc16_ccitt; > >>> + handlers[RTE_NET_CRC_SCALAR].f[RTE_NET_CRC32_ETH] = crc32_eth; > >>> > >> > >> +1 to remove global handlers pointer and add context, > >> > >> But current handlers array content is static, it can be set when > >> defined, instead of functions. > > > > Can do it for scalars, but for SIMD there is this runtime check like this: > > if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) { So compiled binary on > AVX512 > > machine could filter it out on the machine which does not support it. > > There is no NULL check in crc function so it would not change much when > called -> Invalid address vs Invalid instruction. > > There are not many checks there, as this is CRC after all, it should be a > > small as > possible, yet probably NULL check could be advisable in crc function then. > > > > There is already AVX512_VPCLMULQDQ_CPU_SUPPORTED etc checks in > 'rte_net_crc_set()'. > So does it work to update 'handlers' statically, without condition, but have > conditions when use them. > > > > >> > >> <...> > >> > >>> -static uint32_t > >>> -rte_crc32_eth_default_handler(const uint8_t *data, uint32_t > >>> data_len) > >>> +struct rte_net_crc rte_net_crc_set(enum rte_net_crc_alg alg, > >>> + enum rte_net_crc_type type) > >>> { > >>> - handlers = NULL; > >>> - if (max_simd_bitwidth == 0) > >>> - max_simd_bitwidth = rte_vect_get_max_simd_bitwidth(); > >>> - > >>> - handlers = avx512_vpclmulqdq_get_handlers(); > >>> - if (handlers != NULL) > >>> - return handlers[RTE_NET_CRC32_ETH](data, data_len); > >>> - handlers = sse42_pclmulqdq_get_handlers(); > >>> - if (handlers != NULL) > >>> - return handlers[RTE_NET_CRC32_ETH](data, data_len); > >>> - handlers = neon_pmull_get_handlers(); > >>> - if (handlers != NULL) > >>> - return handlers[RTE_NET_CRC32_ETH](data, data_len); > >>> - handlers = handlers_scalar; > >>> - return handlers[RTE_NET_CRC32_ETH](data, data_len); > >>> -} > >>> + uint16_t max_simd_bitwidth; > >>> > >>> -/* Public API */ > >>> - > >>> -void > >>> -rte_net_crc_set_alg(enum rte_net_crc_alg alg) -{ > >>> - handlers = NULL; > >>> - if (max_simd_bitwidth == 0) > >>> - max_simd_bitwidth = rte_vect_get_max_simd_bitwidth(); > >>> + max_simd_bitwidth = rte_vect_get_max_simd_bitwidth(); > >>> > >>> switch (alg) { > >>> case RTE_NET_CRC_AVX512: > >>> - handlers = avx512_vpclmulqdq_get_handlers(); > >>> - if (handlers != NULL) > >>> - break; > >>> +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT > >>> + if (AVX512_VPCLMULQDQ_CPU_SUPPORTED && > >>> + max_simd_bitwidth >= RTE_VECT_SIMD_512) { > >>> + return (struct rte_net_crc){ RTE_NET_CRC_AVX512, > >> type }; > >>> + } > >>> +#endif > >>> /* fall-through */ > >>> case RTE_NET_CRC_SSE42: > >>> - handlers = sse42_pclmulqdq_get_handlers(); > >>> - break; /* for x86, always break here */ > >>> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > >>> + if (SSE42_PCLMULQDQ_CPU_SUPPORTED && > >>> + max_simd_bitwidth >= RTE_VECT_SIMD_128) { > >>> + return (struct rte_net_crc){ RTE_NET_CRC_SSE42, type > >> }; > >>> + } > >>> +#endif > >>> + break; > >>> case RTE_NET_CRC_NEON: > >>> - handlers = neon_pmull_get_handlers(); > >>> - /* fall-through */ > >>> - case RTE_NET_CRC_SCALAR: > >>> - /* fall-through */ > >>> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > >>> + if (NEON_PMULL_CPU_SUPPORTED && > >>> + max_simd_bitwidth >= RTE_VECT_SIMD_128) { > >>> + return (struct rte_net_crc){ RTE_NET_CRC_NEON, type > >> }; > >>> + } > >>> +#endif > >>> > >> > >> Is it more readable as following, up to you: > > > > Agree, I will change it. > > > >> > >> ``` > >> rte_net_crc_set(alg, type) { > >> enum rte_net_crc_alg new_alg = RTE_NET_CRC_SCALAR; > >> switch (alg) { > >> case AVX512: > >> new_alg = .. > >> case NEON: > >> new_alg = .. > >> } > >> return struct rte_net_crc){ new_alg, type }; ``` > >> > >> > >> > >> > >>> + break; > >>> default: > >>> break; > >>> } > >>> - > >>> - if (handlers == NULL) > >>> - handlers = handlers_scalar; > >>> + return (struct rte_net_crc){ RTE_NET_CRC_SCALAR, type }; > >>> } > >>> > >>> -uint32_t > >>> -rte_net_crc_calc(const void *data, > >>> - uint32_t data_len, > >>> - enum rte_net_crc_type type) > >>> +uint32_t rte_net_crc(const struct rte_net_crc *ctx, > >>> + const void *data, const uint32_t data_len) > >>> { > >>> - uint32_t ret; > >>> - rte_net_crc_handler f_handle; > >>> - > >>> - f_handle = handlers[type]; > >>> - ret = f_handle(data, data_len); > >>> - > >>> - return ret; > >>> + return handlers[ctx->alg].f[ctx->type](data, data_len); > >>> > >> > >> 'rte_net_crc()' gets input from user and "struct rte_net_crc" is not > >> opaque, so user can provide invalid input, ctx->alg & ctx->type. > >> To protect against it input values should be checked before using. > >> > >> Or I think user not need to know the details of the "struct > >> rte_net_crc", so it can be an opaque variable for user. > > > > I would love it to be opaque, but then I would have to return a pointer, > > which > then would involve some allocations/deallocations and I wanted to keep is as > simple as possible. > > So probably the checks would be a way to go. > > > > True, +1 for simplicity. > > >> > >> <...> > >> > >>> -/** > >>> - * CRC compute API > >>> - * > >>> - * @param data > >>> - * Pointer to the packet data for CRC computation > >>> - * @param data_len > >>> - * Data length for CRC computation > >>> - * @param type > >>> - * CRC type (enum rte_net_crc_type) > >>> - * > >>> - * @return > >>> - * CRC value > >>> - */ > >>> -uint32_t > >>> -rte_net_crc_calc(const void *data, > >>> - uint32_t data_len, > >>> +struct rte_net_crc rte_net_crc_set(enum rte_net_crc_alg alg, > >>> enum rte_net_crc_type type); > >>> > >>> +uint32_t rte_net_crc(const struct rte_net_crc *ctx, > >>> + const void *data, const uint32_t data_len); > >>> + > >>> > >> > >> As these are APIs, can you please add doxygen comments to them? > > +1 > > I think this change could be deferred to 25.03. > > Adding this API without removing the old one should be possible without any > unwanted consequences? > > > > This is not a new functionality but replacement of existing one, so it will be > confusing to have two set of APIs for same functionality with similar names: > rte_net_crc_calc() and rte_net_crc() > rte_net_crc_set_alg() and rte_net_crc_set() > > Also there are some internal functions used by these APIs and supporting both > new and old may force to have two version of these internal functions and it > will > create complexity/noise. > > As this release is ABI break release, it is easier to update APIs (although > deprecation notice is missing for this change). > > > As an alternative option, do you think applying ABI versioning in 25.03 works > for > these APIs? > If so, old version can be cleaned in v25.11.
I would go with versioning. I can preserve old function names as it's basically the same functionality. > > > > I still have some second thoughts about this max-simd-width. DPDK does > > not impose any restrictions on this parameter in the multi-process usage, > > there > may be some room to alter some things there. > > > >> > >>> #ifdef __cplusplus > >>> } > >>> #endif > >>> diff --git a/lib/net/version.map b/lib/net/version.map index > >>> bec4ce23ea..47daf1464a 100644 > >>> --- a/lib/net/version.map > >>> +++ b/lib/net/version.map > >>> @@ -4,11 +4,25 @@ DPDK_25 { > >>> rte_eth_random_addr; > >>> rte_ether_format_addr; > >>> rte_ether_unformat_addr; > >>> - rte_net_crc_calc; > >>> - rte_net_crc_set_alg; > >>> rte_net_get_ptype; > >>> rte_net_make_rarp_packet; > >>> rte_net_skip_ip6_ext; > >>> + rte_net_crc; > >>> + rte_net_crc_set; > >>> > >>> local: *; > >>> }; > >>> + > >>> +INTERNAL { > >>> + global: > >>> + > >>> + rte_net_crc_sse42_init; > >>> + rte_crc16_ccitt_sse42_handler; > >>> + rte_crc32_eth_sse42_handler; > >>> + rte_net_crc_avx512_init; > >>> + rte_crc16_ccitt_avx512_handler; > >>> + rte_crc32_eth_avx512_handler; > >>> + rte_net_crc_neon_init; > >>> + rte_crc16_ccitt_neon_handler; > >>> + rte_crc32_eth_neon_handler; > >>> +}; > >>> > >> > >> +1 to David's comment, these are used only within component, no need > >> +to > >> export. > >> > >