[PATCH] net/vmxnet3: added checks for TCP for RSS Configuration
Added checks for TCP in vmxnet3_rss_configure() This check ensures the hashType for RSS, when enabled just for UDP, is not NONE. Signed-off-by: Raghav Roy --- drivers/net/vmxnet3/vmxnet3_rxtx.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c index a875ffec07..e8407c9b2e 100644 --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c @@ -1412,6 +1412,13 @@ vmxnet3_rss_configure(struct rte_eth_dev *dev) dev_rss_conf = hw->rss_conf; port_rss_conf = &dev->data->dev_conf.rx_adv_conf.rss_conf; + if ((port_rss_conf->rss_hf & VMXNET3_MANDATORY_V4_RSS) != + VMXNET3_MANDATORY_V4_RSS) { + PMD_INIT_LOG(WARNING, "RSS: IPv4/6 TCP is required for vmxnet3 RSS, " +"automatically setting it"); + port_rss_conf->rss_hf |= VMXNET3_MANDATORY_V4_RSS; + } + /* loading hashFunc */ dev_rss_conf->hashFunc = VMXNET3_RSS_HASH_FUNC_TOEPLITZ; /* loading hashKeySize */ @@ -1419,6 +1426,7 @@ vmxnet3_rss_configure(struct rte_eth_dev *dev) /* loading indTableSize: Must not exceed VMXNET3_RSS_MAX_IND_TABLE_SIZE (128)*/ dev_rss_conf->indTableSize = (uint16_t)((MAX_RX_QUEUES(hw)) * 4); + if (port_rss_conf->rss_key == NULL) { /* Default hash key */ port_rss_conf->rss_key = rss_intel_key; @@ -1446,6 +1454,5 @@ vmxnet3_rss_configure(struct rte_eth_dev *dev) dev_rss_conf->hashType |= VMXNET3_RSS_HASH_TYPE_IPV6; if (rss_hf & RTE_ETH_RSS_NONFRAG_IPV6_TCP) dev_rss_conf->hashType |= VMXNET3_RSS_HASH_TYPE_TCP_IPV6; - return VMXNET3_SUCCESS; } -- 2.17.1
[PATCH v3] mempool: micro-optimize put function
Micro-optimization: Reduced the most likely code path in the generic put function by moving an unlikely check out of the most likely code path and further down. Also updated the comments in the function. v3 (feedback from Konstantin Ananyev): * Removed assertion and comment about the invariant preventing overflow in the comparison. They were more confusing than enlightening. v2 (feedback from Andrew Rybchenko): * Modified comparison to prevent overflow if n is really huge and len is non-zero. * Added assertion about the invariant preventing overflow in the comparison. * Crossing the threshold is not extremely unlikely, so removed likely() from that comparison. The compiler will generate code with optimal static branch prediction here anyway. Signed-off-by: Morten Brørup Acked-by: Konstantin Ananyev --- lib/mempool/rte_mempool.h | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h index 9f530db24b..61ca0c6b65 100644 --- a/lib/mempool/rte_mempool.h +++ b/lib/mempool/rte_mempool.h @@ -1364,32 +1364,33 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table, { void **cache_objs; - /* No cache provided */ + /* No cache provided? */ if (unlikely(cache == NULL)) goto driver_enqueue; - /* increment stat now, adding in mempool always success */ + /* Increment stats now, adding in mempool always succeeds. */ RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1); RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n); - /* The request itself is too big for the cache */ - if (unlikely(n > cache->flushthresh)) - goto driver_enqueue_stats_incremented; - - /* -* The cache follows the following algorithm: -* 1. If the objects cannot be added to the cache without crossing -* the flush threshold, flush the cache to the backend. -* 2. Add the objects to the cache. -*/ - - if (cache->len + n <= cache->flushthresh) { + if (n <= cache->flushthresh - cache->len) { + /* +* The objects can be added to the cache without crossing the +* flush threshold. +*/ cache_objs = &cache->objs[cache->len]; cache->len += n; - } else { + } else if (likely(n <= cache->flushthresh)) { + /* +* The request itself fits into the cache. +* But first, the cache must be flushed to the backend, so +* adding the objects does not cross the flush threshold. +*/ cache_objs = &cache->objs[0]; rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len); cache->len = n; + } else { + /* The request itself is too big for the cache. */ + goto driver_enqueue_stats_incremented; } /* Add the objects to the cache. */ @@ -1399,13 +1400,13 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table, driver_enqueue: - /* increment stat now, adding in mempool always success */ + /* Increment stats now, adding in mempool always succeeds. */ RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1); RTE_MEMPOOL_STAT_ADD(mp, put_objs, n); driver_enqueue_stats_incremented: - /* push objects to the backend */ + /* Push the objects to the backend. */ rte_mempool_ops_enqueue_bulk(mp, obj_table, n); } -- 2.17.1
mailmap causing checkpatch warnings
Bruce, David, I just submitted a patch with an Acked-by from Konstantin, using his current (Huawei) email address, and it triggered the following checkpatch warning: Konstantin Ananyev mail differs from primary mail, please fix the commit message or update .mailmap. Some people have more than one active email address, so this checkpatch behavior is wrong. Med venlig hilsen / Kind regards, -Morten Brørup
[PATCH v3] mempool cache: add zero-copy get and put functions
Zero-copy access to mempool caches is beneficial for PMD performance, and must be provided by the mempool library to fix [Bug 1052] without a performance regression. [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052 v3: * Bugfix: Respect the cache size; compare to the flush threshold instead of RTE_MEMPOOL_CACHE_MAX_SIZE. * Added 'rewind' function for incomplete 'put' operations. (Konstantin) * Replace RTE_ASSERTs with runtime checks of the request size. Instead of failing, return NULL if the request is too big. (Konstantin) * Modified comparison to prevent overflow if n is really huge and len is non-zero. * Updated the comments in the code. v2: * Fix checkpatch warnings. * Fix missing registration of trace points. * The functions are inline, so they don't go into the map file. v1 changes from the RFC: * Removed run-time parameter checks. (Honnappa) This is a hot fast path function; requiring correct application behaviour, i.e. function parameters must be valid. * Added RTE_ASSERT for parameters instead. Code for this is only generated if built with RTE_ENABLE_ASSERT. * Removed fallback when 'cache' parameter is not set. (Honnappa) * Chose the simple get function; i.e. do not move the existing objects in the cache to the top of the new stack, just leave them at the bottom. * Renamed the functions. Other suggestions are welcome, of course. ;-) * Updated the function descriptions. * Added the functions to trace_fp and version.map. Signed-off-by: Morten Brørup --- lib/mempool/mempool_trace_points.c | 9 ++ lib/mempool/rte_mempool.h | 165 + lib/mempool/rte_mempool_trace_fp.h | 23 lib/mempool/version.map| 5 + 4 files changed, 202 insertions(+) diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c index 4ad76deb34..83d353a764 100644 --- a/lib/mempool/mempool_trace_points.c +++ b/lib/mempool/mempool_trace_points.c @@ -77,3 +77,12 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free, RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname, lib.mempool.set.ops.byname) + +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk, + lib.mempool.cache.zc.put.bulk) + +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind, + lib.mempool.cache.zc.put.rewind) + +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk, + lib.mempool.cache.zc.get.bulk) diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h index 9f530db24b..17a90b3ba1 100644 --- a/lib/mempool/rte_mempool.h +++ b/lib/mempool/rte_mempool.h @@ -47,6 +47,7 @@ #include #include #include +#include #include "rte_mempool_trace_fp.h" @@ -1346,6 +1347,170 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache, cache->len = 0; } +/** + * @warning + * @b EXPERIMENTAL: This API may change, or be removed, without prior notice. + * + * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool. + * + * @param cache + * A pointer to the mempool cache. + * @param mp + * A pointer to the mempool. + * @param n + * The number of objects to be put in the mempool cache. + * @return + * The pointer to where to put the objects in the mempool cache. + * NULL if the request itself is too big for the cache, i.e. + * exceeds the cache flush threshold. + */ + __rte_experimental +static __rte_always_inline void * +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache, + struct rte_mempool *mp, + unsigned int n) +{ + void **cache_objs; + + RTE_ASSERT(cache != NULL); + RTE_ASSERT(mp != NULL); + + rte_mempool_trace_cache_zc_put_bulk(cache, mp, n); + + if (n <= cache->flushthresh - cache->len) { + /* +* The objects can be added to the cache without crossing the +* flush threshold. +*/ + cache_objs = &cache->objs[cache->len]; + cache->len += n; + } else if (likely(n <= cache->flushthresh)) { + /* +* The request itself fits into the cache. +* But first, the cache must be flushed to the backend, so +* adding the objects does not cross the flush threshold. +*/ + cache_objs = &cache->objs[0]; + rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len); + cache->len = n; + } else { + /* The request itself is too big for the cache. */ + return NULL; + } + + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1); + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n); + + return cache_objs; +} + +/** + * @warning + * @b EXPERIMENTAL: This API may change, or be removed, without prior notice. + * + * Zero-copy un-put objects in a user-owned mempool cache. + * + * @param cache + * A pointer to the mempool cache. + * @param n + * The numb
[PATCH v4] mempool cache: add zero-copy get and put functions
Zero-copy access to mempool caches is beneficial for PMD performance, and must be provided by the mempool library to fix [Bug 1052] without a performance regression. [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052 v4: * Fix checkpatch warnings. v3: * Bugfix: Respect the cache size; compare to the flush threshold instead of RTE_MEMPOOL_CACHE_MAX_SIZE. * Added 'rewind' function for incomplete 'put' operations. (Konstantin) * Replace RTE_ASSERTs with runtime checks of the request size. Instead of failing, return NULL if the request is too big. (Konstantin) * Modified comparison to prevent overflow if n is really huge and len is non-zero. * Updated the comments in the code. v2: * Fix checkpatch warnings. * Fix missing registration of trace points. * The functions are inline, so they don't go into the map file. v1 changes from the RFC: * Removed run-time parameter checks. (Honnappa) This is a hot fast path function; requiring correct application behaviour, i.e. function parameters must be valid. * Added RTE_ASSERT for parameters instead. Code for this is only generated if built with RTE_ENABLE_ASSERT. * Removed fallback when 'cache' parameter is not set. (Honnappa) * Chose the simple get function; i.e. do not move the existing objects in the cache to the top of the new stack, just leave them at the bottom. * Renamed the functions. Other suggestions are welcome, of course. ;-) * Updated the function descriptions. * Added the functions to trace_fp and version.map. Signed-off-by: Morten Brørup --- lib/mempool/mempool_trace_points.c | 9 ++ lib/mempool/rte_mempool.h | 165 + lib/mempool/rte_mempool_trace_fp.h | 23 lib/mempool/version.map| 5 + 4 files changed, 202 insertions(+) diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c index 4ad76deb34..83d353a764 100644 --- a/lib/mempool/mempool_trace_points.c +++ b/lib/mempool/mempool_trace_points.c @@ -77,3 +77,12 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free, RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname, lib.mempool.set.ops.byname) + +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk, + lib.mempool.cache.zc.put.bulk) + +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind, + lib.mempool.cache.zc.put.rewind) + +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk, + lib.mempool.cache.zc.get.bulk) diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h index 9f530db24b..00387e7543 100644 --- a/lib/mempool/rte_mempool.h +++ b/lib/mempool/rte_mempool.h @@ -47,6 +47,7 @@ #include #include #include +#include #include "rte_mempool_trace_fp.h" @@ -1346,6 +1347,170 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache, cache->len = 0; } +/** + * @warning + * @b EXPERIMENTAL: This API may change, or be removed, without prior notice. + * + * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool. + * + * @param cache + * A pointer to the mempool cache. + * @param mp + * A pointer to the mempool. + * @param n + * The number of objects to be put in the mempool cache. + * @return + * The pointer to where to put the objects in the mempool cache. + * NULL if the request itself is too big for the cache, i.e. + * exceeds the cache flush threshold. + */ +__rte_experimental +static __rte_always_inline void * +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache, + struct rte_mempool *mp, + unsigned int n) +{ + void **cache_objs; + + RTE_ASSERT(cache != NULL); + RTE_ASSERT(mp != NULL); + + rte_mempool_trace_cache_zc_put_bulk(cache, mp, n); + + if (n <= cache->flushthresh - cache->len) { + /* +* The objects can be added to the cache without crossing the +* flush threshold. +*/ + cache_objs = &cache->objs[cache->len]; + cache->len += n; + } else if (likely(n <= cache->flushthresh)) { + /* +* The request itself fits into the cache. +* But first, the cache must be flushed to the backend, so +* adding the objects does not cross the flush threshold. +*/ + cache_objs = &cache->objs[0]; + rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len); + cache->len = n; + } else { + /* The request itself is too big for the cache. */ + return NULL; + } + + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1); + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n); + + return cache_objs; +} + +/** + * @warning + * @b EXPERIMENTAL: This API may change, or be removed, without prior notice. + * + * Zero-copy un-put objects in a user-owned mempool cache. + * + * @param cache + * A pointer to the mempool cach
RE: [PATCH v2] mempool cache: add zero-copy get and put functions
> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > Sent: Friday, 23 December 2022 17.58 > > > > From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > > > Sent: Thursday, 22 December 2022 16.57 > > > > > > > Zero-copy access to mempool caches is beneficial for PMD > performance, > > > and > > > > must be provided by the mempool library to fix [Bug 1052] without > a > > > > performance regression. > > > > > > LGTM in general, thank you for working on it. > > > Few comments below. [...] > > > RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE); > > > I think it is too excessive. > > > Just: > > > if (n <= RTE_MEMPOOL_CACHE_MAX_SIZE) return NULL; > > > seems much more convenient for the users here and > > > more close to other mempool/ring API behavior. > > > In terms of performance - I don’t think one extra comparison here > > > would really count. > > > > The insignificant performance degradation seems like a good tradeoff > for making the function more generic. > > I will update the function documentation and place the run-time check > here, so both trace and stats reflect what happened: > > > > RTE_ASSERT(cache != NULL); > > RTE_ASSERT(mp != NULL); > > - RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE); > > > > rte_mempool_trace_cache_zc_put_bulk(cache, mp, n); > > + > > + if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) { > > + rte_errno = -ENOSPC; // Or EINVAL? > > + return NULL; > > + } > > > > /* Increment stats now, adding in mempool always succeeds. */ > > > > I will probably also be able to come up with solution for > zc_get_bulk(), so both trace and stats make sense if called with n > > > RTE_MEMPOOL_CACHE_MAX_SIZE. I have sent a new patch, where I switched to the same code flow as in the micro-optimization patch, so this run-time check doesn't affect the most common case. Also, I realized that I need to compare to the cache flush threshold instead of RTE_MEMPOOL_CACHE_MAX_SIZE, to respect the cache size. Otherwise, a zc_cache_get() operation could deplete a small mempool; and zc_cache_put() could leave the cache with too many objects, thus violating the invariant that cache->len <= cache->flushthreshold. > > > > > > > > I also think would be really good to add: > > > add zc_(get|put)_bulk_start(), zc_(get|put)_bulk_finish(). > > > Where _start would check/fill the cache and return the pointer, > > > while _finsih would updathe cache->len. > > > Similar to what we have for rte_ring _peek_ API. > > > That would allow to extend this API usage - let say inside PMDs > > > it could be used not only for MBUF_FAST_FREE case, but for generic > > > TX code path (one that have to call rte_mbuf_prefree()) also. > > > > I don't see a use case for zc_get_start()/_finish(). > > > > And since the mempool cache is a stack, it would *require* that the > application reads the array in reverse order. In such case, the > > function should not return a pointer to the array of objects, but a > pointer to the top of the stack. > > > > So I prefer to stick with the single-function zero-copy get, i.e. > without start/finish. > > Yes, it would be more complicated than just update cache->len. > I don't have any real use-case for _get_ too - mostly just for symmetry > with put. > > > > > > > I do agree with you about the use case for zc_put_start()/_finish(). > > > > Unlike the ring, there is no need for locking with the mempool cache, > so we can implement something much simpler: > > > > Instead of requiring calling both zc_put_start() and _finish() for > every zero-copy burst, we could add a zc_put_rewind() function, only > > to be called if some number of objects were not added anyway: > > > > /* FIXME: Function documentation here. */ > > __rte_experimental > > static __rte_always_inline void > > rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache, > > unsigned int n) > > { > > RTE_ASSERT(cache != NULL); > > RTE_ASSERT(n <= cache->len); > > > > rte_mempool_trace_cache_zc_put_rewind(cache, n); > > > > /* Rewind stats. */ > > RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, -n); > > > > cache->len -= n; > > } > > > > I have a strong preference for _rewind() over _start() and _finish(), > because in the full burst case, it only touches the > > rte_mempool_cache structure once, whereas splitting it up into > _start() and _finish() touches the rte_mempool_cache structure both > > before and after copying the array of objects. > > > > What do you think? > > And your concern is that between _get_start(_C_) and get_finish(_C_) > the _C_ > cache line can be bumped out of CPU Dcache, right? > I don't think such situation would be a common one. Yes, that is the essence of my concern. And I agree that it is probably uncommon. There might also be some performance benefits by having the load/store/modify of _C_ closely together; but I don't know enough about CPU internals to determine if significant or not. > But, if