On 10/9/2024 8:48 AM, Kusztal, ArkadiuszX wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yi...@amd.com>
>> Sent: Wednesday, October 9, 2024 3:03 AM
>> To: Kusztal, ArkadiuszX <arkadiuszx.kusz...@intel.com>; Marchand, David
>> <david.march...@redhat.com>
>> Cc: dev@dpdk.org; Ji, Kai <kai...@intel.com>; Dooley, Brian
>> <brian.doo...@intel.com>
>> Subject: Re: [PATCH v2 1/3] net: add thread-safe crc api
>>
>> On 10/8/2024 9:51 PM, Kusztal, ArkadiuszX wrote:
>>> Hi Ferruh,
>>> Thanks for the review, comments inline,
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yi...@amd.com>
>>>> Sent: Tuesday, October 8, 2024 5:43 AM
>>>> To: Kusztal, ArkadiuszX <arkadiuszx.kusz...@intel.com>; Marchand,
>>>> David <david.march...@redhat.com>
>>>> Cc: dev@dpdk.org; Ji, Kai <kai...@intel.com>; Dooley, Brian
>>>> <brian.doo...@intel.com>
>>>> Subject: Re: [PATCH v2 1/3] net: add thread-safe crc api
>>>>
>>>> On 10/1/2024 7:11 PM, Arkadiusz Kusztal wrote:
>>>>> The current net CRC API is not thread-safe, this patch solves this
>>>>> by adding another, thread-safe API functions.
>>>>> This API is also safe to use across multiple processes, yet with
>>>>> limitations on max-simd-bitwidth, which will be checked only by the
>>>>> process that created the CRC context; all other processes will use
>>>>> the same CRC function when used with the same CRC context.
>>>>> It is an undefined behavior when process binaries are compiled with
>>>>> different SIMD capabilities when the same CRC context is used.
>>>>>
>>>>> Signed-off-by: Arkadiusz Kusztal <arkadiuszx.kusz...@intel.com>
>>>>
>>>> <...>
>>>>
>>>>> +static struct
>>>>> +{
>>>>> + uint32_t (*f[RTE_NET_CRC_REQS])
>>>>> +         (const uint8_t *data, uint32_t data_len);
>>>>>
>>>>
>>>> It increases readability to typedef function pointers.
>>>
>>> Agree, though this typedef would be used here only, that’s why I left it 
>>> out.
>>> But I can add it then.
>>>
>>>>
>>>>
>>>>> +} handlers[RTE_NET_CRC_AVX512 + 1];
>>>>>
>>>>> -/**
>>>>> - * Reflect the bits about the middle
>>>>> - *
>>>>> - * @param val
>>>>> - *   value to be reflected
>>>>> - *
>>>>> - * @return
>>>>> - *   reflected value
>>>>> - */
>>>>> -static uint32_t
>>>>> +static inline uint32_t
>>>>>
>>>>
>>>> Does changing to 'inline' required, as function is static compiler
>>>> can do the same.
>>>
>>> True, though it may be more readable sometimes.
>>> Of course there is no way that in O3 these functions would not be inlined by
>> the compiler, regardless if the inline hint is present or not.
>>>
>>>>
>>>>>  reflect_32bits(uint32_t val)
>>>>>  {
>>>>>   uint32_t i, res = 0;
>>>>> @@ -99,26 +43,7 @@ reflect_32bits(uint32_t val)
>>>>>   return res;
>>>>>  }
>>>>>
>>>>> -static void
>>>>> -crc32_eth_init_lut(uint32_t poly,
>>>>> - uint32_t *lut)
>>>>> -{
>>>>> - uint32_t i, j;
>>>>> -
>>>>> - for (i = 0; i < CRC_LUT_SIZE; i++) {
>>>>> -         uint32_t crc = reflect_32bits(i);
>>>>> -
>>>>> -         for (j = 0; j < 8; j++) {
>>>>> -                 if (crc & 0x80000000L)
>>>>> -                         crc = (crc << 1) ^ poly;
>>>>> -                 else
>>>>> -                         crc <<= 1;
>>>>> -         }
>>>>> -         lut[i] = reflect_32bits(crc);
>>>>> - }
>>>>> -}
>>>>> -
>>>>> -static __rte_always_inline uint32_t
>>>>> +static inline uint32_t
>>>>>
>>>>
>>>> Why not forcing inline anymore?
>>>> Are these inline changes related to the thread-safety?
>>>
>>> O3 will inline it anyway, and with always_inline it will be inline even in 
>>> debug
>> mode. I just see no reason forcing it upon the compiler.
>>>
>>>>
>>>>>  crc32_eth_calc_lut(const uint8_t *data,
>>>>>   uint32_t data_len,
>>>>>   uint32_t crc,
>>>>> @@ -130,20 +55,9 @@ crc32_eth_calc_lut(const uint8_t *data,
>>>>>   return crc;
>>>>>  }
>>>>>
>>>>> -static void
>>>>> -rte_net_crc_scalar_init(void)
>>>>> -{
>>>>> - /* 32-bit crc init */
>>>>> - crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
>>>>> -
>>>>> - /* 16-bit CRC init */
>>>>> - crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
>>>>> -}
>>>>> -
>>>>>  static inline uint32_t
>>>>> -rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
>>>>> +crc16_ccitt(const uint8_t *data, uint32_t data_len)
>>>>>  {
>>>>> - /* return 16-bit CRC value */
>>>>>
>>>>
>>>> Why not keep comments? Are they wrong?
>>>
>>> Functions names are very self-explanatory, that’s why I dropped comments. I
>> can add comments if needed.
>>>
>>
>> I am for restricting changes to the target of the patch which is making CRC
>> calculation thread safe, unless code changes invalidates the comments, lets
>> keep them. Same goes with inline related modifications.
>>
>>>>
>>>> <...>
>>>>
>>>>> +static void
>>>>> +crc_scalar_init(void)
>>>>> +{
>>>>> + crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
>>>>> + crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
>>>>> +
>>>>> + handlers[RTE_NET_CRC_SCALAR].f[RTE_NET_CRC16_CCITT] =
>>>> crc16_ccitt;
>>>>> + handlers[RTE_NET_CRC_SCALAR].f[RTE_NET_CRC32_ETH] = crc32_eth;
>>>>>
>>>>
>>>> +1 to remove global handlers pointer and add context,
>>>>
>>>> But current handlers array content is static, it can be set when
>>>> defined, instead of functions.
>>>
>>> Can do it for scalars, but for SIMD there is this runtime check like this:
>>>     if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) { So compiled binary on
>> AVX512
>>> machine could filter it out on the machine which does not support it.
>>> There is no NULL check in crc function so it would not change much when
>> called -> Invalid address vs Invalid instruction.
>>> There are not many checks there, as this is CRC after all, it should be a 
>>> small as
>> possible, yet probably NULL check could be advisable in crc function then.
>>>
>>
>> There is already AVX512_VPCLMULQDQ_CPU_SUPPORTED etc checks in
>> 'rte_net_crc_set()'.
>> So does it work to update 'handlers' statically, without condition, but have
>> conditions when use them.
>>
>>>
>>>>
>>>> <...>
>>>>
>>>>> -static uint32_t
>>>>> -rte_crc32_eth_default_handler(const uint8_t *data, uint32_t
>>>>> data_len)
>>>>> +struct rte_net_crc rte_net_crc_set(enum rte_net_crc_alg alg,
>>>>> + enum rte_net_crc_type type)
>>>>>  {
>>>>> - handlers = NULL;
>>>>> - if (max_simd_bitwidth == 0)
>>>>> -         max_simd_bitwidth = rte_vect_get_max_simd_bitwidth();
>>>>> -
>>>>> - handlers = avx512_vpclmulqdq_get_handlers();
>>>>> - if (handlers != NULL)
>>>>> -         return handlers[RTE_NET_CRC32_ETH](data, data_len);
>>>>> - handlers = sse42_pclmulqdq_get_handlers();
>>>>> - if (handlers != NULL)
>>>>> -         return handlers[RTE_NET_CRC32_ETH](data, data_len);
>>>>> - handlers = neon_pmull_get_handlers();
>>>>> - if (handlers != NULL)
>>>>> -         return handlers[RTE_NET_CRC32_ETH](data, data_len);
>>>>> - handlers = handlers_scalar;
>>>>> - return handlers[RTE_NET_CRC32_ETH](data, data_len);
>>>>> -}
>>>>> + uint16_t max_simd_bitwidth;
>>>>>
>>>>> -/* Public API */
>>>>> -
>>>>> -void
>>>>> -rte_net_crc_set_alg(enum rte_net_crc_alg alg) -{
>>>>> - handlers = NULL;
>>>>> - if (max_simd_bitwidth == 0)
>>>>> -         max_simd_bitwidth = rte_vect_get_max_simd_bitwidth();
>>>>> + max_simd_bitwidth = rte_vect_get_max_simd_bitwidth();
>>>>>
>>>>>   switch (alg) {
>>>>>   case RTE_NET_CRC_AVX512:
>>>>> -         handlers = avx512_vpclmulqdq_get_handlers();
>>>>> -         if (handlers != NULL)
>>>>> -                 break;
>>>>> +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
>>>>> +         if (AVX512_VPCLMULQDQ_CPU_SUPPORTED &&
>>>>> +                         max_simd_bitwidth >= RTE_VECT_SIMD_512) {
>>>>> +                 return (struct rte_net_crc){ RTE_NET_CRC_AVX512,
>>>> type };
>>>>> +         }
>>>>> +#endif
>>>>>           /* fall-through */
>>>>>   case RTE_NET_CRC_SSE42:
>>>>> -         handlers = sse42_pclmulqdq_get_handlers();
>>>>> -         break; /* for x86, always break here */
>>>>> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
>>>>> +         if (SSE42_PCLMULQDQ_CPU_SUPPORTED &&
>>>>> +                         max_simd_bitwidth >= RTE_VECT_SIMD_128) {
>>>>> +                 return (struct rte_net_crc){ RTE_NET_CRC_SSE42, type
>>>> };
>>>>> +         }
>>>>> +#endif
>>>>> +         break;
>>>>>   case RTE_NET_CRC_NEON:
>>>>> -         handlers = neon_pmull_get_handlers();
>>>>> -         /* fall-through */
>>>>> - case RTE_NET_CRC_SCALAR:
>>>>> -         /* fall-through */
>>>>> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
>>>>> +         if (NEON_PMULL_CPU_SUPPORTED &&
>>>>> +                         max_simd_bitwidth >= RTE_VECT_SIMD_128) {
>>>>> +                 return (struct rte_net_crc){ RTE_NET_CRC_NEON, type
>>>> };
>>>>> +         }
>>>>> +#endif
>>>>>
>>>>
>>>> Is it more readable as following, up to you:
>>>
>>> Agree, I will change it.
>>>
>>>>
>>>> ```
>>>> rte_net_crc_set(alg, type) {
>>>>   enum rte_net_crc_alg new_alg = RTE_NET_CRC_SCALAR;
>>>>   switch (alg) {
>>>>   case AVX512:
>>>>     new_alg = ..
>>>>   case NEON:
>>>>     new_alg = ..
>>>>   }
>>>>   return struct rte_net_crc){ new_alg, type }; ```
>>>>
>>>>
>>>>
>>>>
>>>>> +         break;
>>>>>   default:
>>>>>           break;
>>>>>   }
>>>>> -
>>>>> - if (handlers == NULL)
>>>>> -         handlers = handlers_scalar;
>>>>> + return (struct rte_net_crc){ RTE_NET_CRC_SCALAR, type };
>>>>>  }
>>>>>
>>>>> -uint32_t
>>>>> -rte_net_crc_calc(const void *data,
>>>>> - uint32_t data_len,
>>>>> - enum rte_net_crc_type type)
>>>>> +uint32_t rte_net_crc(const struct rte_net_crc *ctx,
>>>>> + const void *data, const uint32_t data_len)
>>>>>  {
>>>>> - uint32_t ret;
>>>>> - rte_net_crc_handler f_handle;
>>>>> -
>>>>> - f_handle = handlers[type];
>>>>> - ret = f_handle(data, data_len);
>>>>> -
>>>>> - return ret;
>>>>> + return handlers[ctx->alg].f[ctx->type](data, data_len);
>>>>>
>>>>
>>>> 'rte_net_crc()' gets input from user and "struct rte_net_crc" is not
>>>> opaque, so user can provide invalid input, ctx->alg & ctx->type.
>>>> To protect against it input values should be checked before using.
>>>>
>>>> Or I think user not need to know the details of the "struct
>>>> rte_net_crc", so it can be an opaque variable for user.
>>>
>>> I would love it to be opaque, but then I would have to return a pointer, 
>>> which
>> then would involve some allocations/deallocations and I wanted to keep is as
>> simple as possible.
>>> So probably the checks would be a way to go.
>>>
>>
>> True, +1 for simplicity.
>>
>>>>
>>>> <...>
>>>>
>>>>> -/**
>>>>> - * CRC compute API
>>>>> - *
>>>>> - * @param data
>>>>> - *   Pointer to the packet data for CRC computation
>>>>> - * @param data_len
>>>>> - *   Data length for CRC computation
>>>>> - * @param type
>>>>> - *   CRC type (enum rte_net_crc_type)
>>>>> - *
>>>>> - * @return
>>>>> - *   CRC value
>>>>> - */
>>>>> -uint32_t
>>>>> -rte_net_crc_calc(const void *data,
>>>>> - uint32_t data_len,
>>>>> +struct rte_net_crc rte_net_crc_set(enum rte_net_crc_alg alg,
>>>>>   enum rte_net_crc_type type);
>>>>>
>>>>> +uint32_t rte_net_crc(const struct rte_net_crc *ctx,
>>>>> + const void *data, const uint32_t data_len);
>>>>> +
>>>>>
>>>>
>>>> As these are APIs, can you please add doxygen comments to them?
>>> +1
>>> I think this change could be deferred to 25.03.
>>> Adding this API without removing the old one should be possible without any
>> unwanted consequences?
>>>
>>
>> This is not a new functionality but replacement of existing one, so it will 
>> be
>> confusing to have two set of APIs for same functionality with similar names:
>> rte_net_crc_calc()    and   rte_net_crc()
>> rte_net_crc_set_alg() and   rte_net_crc_set()
>>
>> Also there are some internal functions used by these APIs and supporting both
>> new and old may force to have two version of these internal functions and it 
>> will
>> create complexity/noise.
>>
>> As this release is ABI break release, it is easier to update APIs (although
>> deprecation notice is missing for this change).
>>
>>
>> As an alternative option, do you think applying ABI versioning in 25.03 
>> works for
>> these APIs?
>> If so, old version can be cleaned in v25.11.
> 
> I would go with versioning. I can preserve old function names as it's 
> basically the same functionality. 
> 

ack

Reply via email to