> On Fri, 15 Oct 2021 10:30:02 +0100 > Vladimir Medvedkin <vladimir.medved...@intel.com> wrote: > > > + m[i * 8 + j] = (rss_key[i] << j)| > > + (uint8_t)((uint16_t)(rss_key[i + 1]) >> > > + (8 - j)); > > + } > > This ends up being harder than necessary to read. Maybe split into > multiple statements and/or use temporary variable. > > > +RTE_INIT(rte_thash_gfni_init) > > +{ > > + rte_thash_gfni_supported = 0; > > Not necessary in C globals are initialized to zero by default. > > By removing that the constructor can be totally behind #ifdef > > > +__rte_internal > > +static inline __m512i > > +__rte_thash_gfni(const uint64_t *mtrx, const uint8_t *tuple, > > + const uint8_t *secondary_tuple, int len) > > +{ > > + __m512i permute_idx = _mm512_set_epi8(7, 6, 5, 4, 7, 6, 5, 4, > > + 6, 5, 4, 3, 6, 5, 4, 3, > > + 5, 4, 3, 2, 5, 4, 3, 2, > > + 4, 3, 2, 1, 4, 3, 2, 1, > > + 3, 2, 1, 0, 3, 2, 1, 0, > > + 2, 1, 0, -1, 2, 1, 0, -1, > > + 1, 0, -1, -2, 1, 0, -1, -2, > > + 0, -1, -2, -3, 0, -1, -2, -3); > > NAK > > Please don't put the implementation in an inline. This makes it harder > to support (API/ABI) and blocks other architectures from implementing > same thing with different instructions.
I don't really understand your reasoning here. rte_thash_gfni.h is an arch-specific header, which provides arch-specific optimizations for RSS hash calculation (Vladimir pls correct me if I am wrong here). We do have dozens of inline functions that do use arch-specific instructions (both x86 and arm) for different purposes: sync primitives, memory-ordering, cache manipulations, LPM lookup, TSX, power-saving, etc. That's a usual trade-off taken for performance reasons, when extra function call costs too much comparing to the operation itself. Why it suddenly became a problem for that particular case and how exactly it blocks other architectures? Also I don't understand how it makes things harder in terms of API/ABI stability. As I can see this patch doesn't introduce any public structs/unions. All functions take as arguments just raw data buffers and length. To summarize - in general, I don't see any good reason why this patch shouldn't be allowed. Konstantin