> >> +/** > >> + * Calculate Toeplitz hash. > >> + * > >> + * @warning > >> + * @b EXPERIMENTAL: this API may change without prior notice. > >> + * > >> + * @param m > >> + * Pointer to the matrices generated from the corresponding > >> + * RSS hash key using rte_thash_complete_matrix(). > >> + * @param tuple > >> + * Pointer to the data to be hashed. Data must be in network byte order. > >> + * @param len > >> + * Length of the data to be hashed. > >> + * @return > >> + * Calculated Toeplitz hash value. > >> + */ > >> +__rte_experimental > >> +static inline uint32_t > >> +rte_thash_gfni(uint64_t *m, uint8_t *tuple, int len) > >> +{ > >> + uint32_t val, val_zero; > >> + > >> + __m512i xor_acc = __rte_thash_gfni(m, tuple, NULL, len); > >> + __rte_thash_xor_reduce(xor_acc, &val, &val_zero); > >> + > >> + return val; > >> +} > >> + > >> +/** > >> + * Calculate Toeplitz hash for two independent data buffers. > >> + * > >> + * @warning > >> + * @b EXPERIMENTAL: this API may change without prior notice. > >> + * > >> + * @param m > >> + * Pointer to the matrices generated from the corresponding > >> + * RSS hash key using rte_thash_complete_matrix(). > >> + * @param tuple_1 > >> + * Pointer to the data to be hashed. Data must be in network byte order. > >> + * @param tuple_2 > >> + * Pointer to the data to be hashed. Data must be in network byte order. > >> + * @param len > >> + * Length of the largest data buffer to be hashed. > >> + * @param val_1 > >> + * Pointer to uint32_t where to put calculated Toeplitz hash value for > >> + * the first tuple. > >> + * @param val_2 > >> + * Pointer to uint32_t where to put calculated Toeplitz hash value for > >> + * the second tuple. > >> + */ > >> +__rte_experimental > >> +static inline void > >> +rte_thash_gfni_x2(uint64_t *mtrx, uint8_t *tuple_1, uint8_t *tuple_2, int > >> len, > >> + uint32_t *val_1, uint32_t *val_2) > > > > Why just two? > > Why not uint8_t *tuple[] > > ? > > > > x2 version was added because there was unused space inside the ZMM which > holds input key (input tuple) bytes for a second input key, so it helps > to improve performance in some cases. > Bulk version wasn't added because for the vast majority of cases it will > be used with a single input key. > Hiding this function inside .c will greatly affect performance, because > it takes just a few cycles to calculate the hash for the most popular > key sizes.
Ok, but it still unclear to me why for 2 only? What stops you from doing: static inline void rte_thash_gfni_bulk(const uint64_t *mtrx, uint32_t len, uint8_t *tuple[], uint32_t val[], uint32_t num) { for (i = 0; i != (num & ~1); i += 2) { xor_acc = __rte_thash_gfni(mtrx, tuple[i], tuple[i+ 1], len); __rte_thash_xor_reduce(xor_acc, val + i, val + i + 1); } If (num & 1) { xor_acc = __rte_thash_gfni(mtrx, tuple[i], NULL, len); __rte_thash_xor_reduce(xor_acc, val + i, &val_zero); } } ?