Hi Stephen, Thanks for reviewing
On 15/10/2021 18:58, Stephen Hemminger wrote:
On Fri, 15 Oct 2021 10:30:02 +0100 Vladimir Medvedkin <vladimir.medved...@intel.com> wrote:+ m[i * 8 + j] = (rss_key[i] << j)| + (uint8_t)((uint16_t)(rss_key[i + 1]) >> + (8 - j)); + }This ends up being harder than necessary to read. Maybe split into multiple statements and/or use temporary variable.+RTE_INIT(rte_thash_gfni_init) +{ + rte_thash_gfni_supported = 0;Not necessary in C globals are initialized to zero by default. By removing that the constructor can be totally behind #ifdef+__rte_internal +static inline __m512i +__rte_thash_gfni(const uint64_t *mtrx, const uint8_t *tuple, + const uint8_t *secondary_tuple, int len) +{ + __m512i permute_idx = _mm512_set_epi8(7, 6, 5, 4, 7, 6, 5, 4, + 6, 5, 4, 3, 6, 5, 4, 3, + 5, 4, 3, 2, 5, 4, 3, 2, + 4, 3, 2, 1, 4, 3, 2, 1, + 3, 2, 1, 0, 3, 2, 1, 0, + 2, 1, 0, -1, 2, 1, 0, -1, + 1, 0, -1, -2, 1, 0, -1, -2, + 0, -1, -2, -3, 0, -1, -2, -3);NAK Please don't put the implementation in an inline. This makes it harder to support (API/ABI) and blocks other architectures from implementing same thing with different instructions.
By making this function not inline, its performance drops by about 2 times. Compiler optimization (at least with respect to the len argument) helps a lot in the implementation.
-- Regards, Vladimir