Re: rte_atomic_*_explicit
On 2024-01-25 23:10, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Thursday, 25 January 2024 19.54 Why do rte_stdatomic.h functions have the suffix "_explicit"? Especially since there aren't any wrappers for the implicit variants. More to type, more to read. They have the "_explicit" suffix to make their names similar to those in stdatomic.h. OK, so to avoid a situation where someone accidentally misinterpret rte_atomic_fetch_add(&counter, 1, rte_memory_order_relaxed); as what, exactly? If you have a wrapper, why not take the opportunity and reap some benefits from this and fix/extend the standard API, making it a better fit for your application. Like removing the redundant "_explicit", "fetch"-less add/sub, maybe and a function for single-writer atomic add (non-atomic read + non-atomic add + atomic store), etc. Prohibiting implicit operations was done already, so it's already now not a straight-off copy of the standard API. You might consider their existence somewhat temporary until C11 stdatomics can be fully phased in, so there's another argument for similar names. (This probably does not happen as long as compilers generate slower code for C11 stdatomics than with their atomic built-ins.) To me, it seems reasonable a wrapper API should stay as long as it provide a benefit over the standard API. One could imagine that being a long time. I imagine some DPDK developers being tired of migrating from one atomics API to another. -> GCC built-ins (-> attempted C11 stdatomics?) -> . Now you are adding a future "-> CXY atomics" move as well, and with that it also seems natural to add a change back to a wrapper or complementary API, when CXY didn't turned out good enough for some particular platform, or when some non-complaint compiler comes along. I suggested fixing the original API, or at least have a wrapper API, already at the point DPDK moved to direct GCC built-in calls. Then we wouldn't have had this atomics API ping-pong. When was this API introduced? Shouldn't it say "experimental" somewhere? They were introduced as part of the migration to C11. I suppose they were not marked experimental because they replaced something we didn't want anymore (the compiler built-ins for atomics, e.g. __atomic_load_n()). I don't recall if we discussed experimental marking or not. Reverse paper trail: https://git.dpdk.org/dpdk/log/lib/eal/include/rte_stdatomic.h https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git-send-email-roret...@linux.microsoft.com/ https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git-send-email-roret...@linux.microsoft.com/
Re: rte_atomic_*_explicit
On 2024-01-26 02:37, Honnappa Nagarahalli wrote: On Thu, Jan 25, 2024 at 11:10:47PM +0100, Morten Br�rup wrote: From: Mattias R�nnblom [mailto:hof...@lysator.liu.se] Sent: Thursday, 25 January 2024 19.54 Why do rte_stdatomic.h functions have the suffix "_explicit"? Especially since there aren't any wrappers for the implicit variants. More to type, more to read. They have the "_explicit" suffix to make their names similar to those in stdatomic.h. You might consider their existence somewhat temporary until C11 stdatomics can be fully phased in, so there's another argument for similar names. (This probably does not happen as long as compilers generate slower code for C11 stdatomics than with their atomic built-ins.) yes, there was feedback at the time it was. * we should *not* have non-explicit versions of the macros * the atomic generic functions should be named to match C11 standard with a rte_ prefix. This was mainly done to ensure that users think through the memory ordering they want to use. This also matches with the compiler atomic built-ins. Without explicit, it is sequentially consistent memory order. "This" is referring to the first bullet only, correct? You don't have to distinguish between implicit and explicit if you only have explicit. When was this API introduced? Shouldn't it say "experimental" somewhere? They were introduced as part of the migration to C11. I suppose they were not marked experimental because they replaced something we didn't want anymore (the compiler built-ins for atomics, e.g. __atomic_load_n()). I don't recall if we discussed experimental marking or not. i don't think we discussed it since they're wrapper macros. Reverse paper trail: https://git.dpdk.org/dpdk/log/lib/eal/include/rte_stdatomic.h https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git- send-email-roret...@linux.microsoft.com/ https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git- send-email-roret...@linux.microsoft.com/
RE: [PATCH 0/8] optimize the firmware loading process
> On 1/25/2024 2:06 AM, Chaoyong He wrote: > >> On 1/15/2024 2:54 AM, Chaoyong He wrote: > >>> This patch series aims to speedup the DPDK application start by > >>> optimize the firmware loading process in sereval places. > >>> We also simplify the port name in multiple PF firmware case to > >>> make the customer happy. > >>> > >> > >> <...> > >> > >>> net/nfp: add the elf module > >>> net/nfp: reload the firmware only when firmware changed > >> > >> Above commit adds elf parser capability and second one loads > >> firmware when build time is different. > >> > >> I can see this is an optimization effort, to understand FW status > >> before loading FW, but relying on build time seems fragile. Does > >> it help to add a new section to store version information and > >> evaluate based > on this information? > >> > > > > We have a branch of firmware (several app type combined with > > NFD3/NFDk) with the same version information(monthly publish), so > > the > version information can't help us, because we can't distinguish > among > >> them. > > > > But the build time is different for every firmware, and that's the > > reason we > choose it. > > > > If version is same although FW itself is different, isn't this > problem on its > >> own? > Perhaps an additional field is required in version syntax. > >>> > >>> Actually, it's just the role the build time which embed in the > >>> firmware plays, > >> which is unique for every firmware, and which can't be modified once > >> the firmware was published. > >>> > >> > >> I see it is already in the elf binary but relying on build time of a > >> FW to decide to load it or not looks a weak method to me, and fragile as > stated before. > >> > >>> What you said also has its mean, but in practice we can't accept(at > >>> least can't > >> do it immediately), which needs to discuss with firmware team. > >>> > >> > >> As it is an optimization I assume it is not urgent, so would you mind > >> discussing the issue with the FW team, perhaps it can lead to a > >> better solution, we can proceed after that. > > > > After discussing with the FW team, seems we still can't just use version > because our firmware are published monthly. > > > > Between two public version, we also have internal versions like 'rc0, rc1' > scheme just as DPDK. > > But the problem is, we may compile and share many firmware in daily > development. > > They are maybe only valid for one time and will never publish, so we won't > assign a version for them. > > > > So is my understanding correct that this effort is not for customers and not > for > published FWs, but mostly for developers with development FWs. > > If so perhaps it can be simplified even more, there can be a devarg that force > loads the FW, for development scenario. > By default, driver checks the FW version and loads according, but if flag is > set it > loads the FW even with same version. When a new development FW is out, > developer can run DPDK app with this parameter and it loads the FW to test, > does this make sense. Yeah, it's a perfect method, thanks. > > I am not trying to be difficult, just the proposed approach didn't satisfy me > I > am trying to understand it and help if I can. And if you please continue > discussion in the mailing list it enables others to comment and perhaps > provide a better option. > > > > So the build time is the only information we can distinguish them, if just > using version, PMD will not reload the firmware which needs to test and > debug. > > > > How about we first using version to check, and then using build time to > check for the same version? > > > > Nope, I though elf parsing and relying on build time can be removed, if you > have them anyway, no need to add the version check, can use just the build > time to check. > > > And another related question, if developers are using daily FWs with exact > same version, when a developer had an unexpected behavior, how she will > know exactly which version of the FW (in the device), this maybe required to > track down the problem. Shouldn't each FW have a unique identifier, except > the "build time" info which is stored in the elf binary not FW itself? We will use the devarg and version as your advice, and remove the build time logics. > > >> > >> Meanwhile I will continue with remaining patches, excluding these two > >> patches. > >> > >>> If you insist that we should change the design, maybe we can just > >>> kick off > >> two commits, and upstream the other commits? > >>> - net/nfp: add the elf module > >>> - net/nfp: reload the firmware only when firmware changed > >
Re: [RFC] service: extend service function call statistics
On 2024-01-26 00:19, Morten Brørup wrote: From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] Sent: Thursday, 25 January 2024 20.15 Add two new per-service counters. RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function invocations where no work was performed. RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations resulting in an error. The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e., counting all invocations, regardless of return value). The new statistics may be useful for both debugging and profiling (e.g., calculate the average per-call processing latency for non-idle service calls). Service core tests are extended to cover the new counters, and coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved. OK to all of the above. Good stuff. The documentation for the CYCLES attributes are updated to reflect their actual semantics. If this is intended behavior, then updating the documentation seems appropriate - I would even go so far as considering it a bug fix. However, quite a few cycles may be consumed by a service before it can conclude that it had no work to do. Shouldn't that be considered time spent by the service? I.e. should the code be fixed instead of the documentation? Generally, polling for new work in the service is cheap, in my experience. But there's nothing in principle that prevents the situation your describe from occurring. You could add an "IDLE_CYCLES" counter in case that would ever be a real-world problem. That wouldn't be a fix, but rather just returning to the old, subtly broken, (pre-22.11?) semantics. Have a look at 809bd24 to see the rationale for the change. There's an example in 4689c57. The cause of this ambiguity is due to the fact that the platform/OS (i.e., DPDK) doesn't know which service to "wake up" (which in turn is the result of the platform not being able to tracking input sources, like NIC RX queues or timer wheels) and thus must ask every service to check if it has something to do. Alternatively, keep the behavior (for backwards compatibility) and fix the documentation, as this patch does, and add an IDLE_CYCLES counter for time spent in idle calls. PS: We're not using DPDK service cores in our applications, so I'm not familiar with the details. We are using something somewhat similar (but homegrown), also for profiling and power management purposes, and my feedback is based on my experience with our own variant of service cores. When are you making the switch to service cores? :) Either way: Acked-by: Morten Brørup Thanks for the review.
[PATCH 0/2] dma/skeleton: add support for SG copy and fill ops
This patchset adds support for SG copy and fill ops. And test passed by [1]. [1] https://patchwork.dpdk.org/project/dpdk/cover/cover.1700156485.git.gmuthukri...@marvell.com/ Chengwen Feng (2): dma/skeleton: support SG copy ops dma/skeleton: support fill ops drivers/dma/skeleton/skeleton_dmadev.c | 147 +++-- drivers/dma/skeleton/skeleton_dmadev.h | 44 ++-- 2 files changed, 171 insertions(+), 20 deletions(-) -- 2.17.1
[PATCH 2/2] dma/skeleton: support fill ops
Add support for fill operation. Signed-off-by: Chengwen Feng --- drivers/dma/skeleton/skeleton_dmadev.c | 53 +++--- drivers/dma/skeleton/skeleton_dmadev.h | 16 +--- 2 files changed, 59 insertions(+), 10 deletions(-) diff --git a/drivers/dma/skeleton/skeleton_dmadev.c b/drivers/dma/skeleton/skeleton_dmadev.c index d1d257a064..48f88f9fc1 100644 --- a/drivers/dma/skeleton/skeleton_dmadev.c +++ b/drivers/dma/skeleton/skeleton_dmadev.c @@ -38,7 +38,8 @@ skeldma_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *dev_info, dev_info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM | RTE_DMA_CAPA_SVA | RTE_DMA_CAPA_OPS_COPY | -RTE_DMA_CAPA_OPS_COPY_SG; +RTE_DMA_CAPA_OPS_COPY_SG | +RTE_DMA_CAPA_OPS_FILL; dev_info->max_vchans = 1; dev_info->max_desc = SKELDMA_MAX_DESC; dev_info->min_desc = SKELDMA_MIN_DESC; @@ -100,8 +101,19 @@ do_copy_sg(struct skeldma_desc *desc) } } +static inline void +do_fill(struct skeldma_desc *desc) +{ + uint8_t *fills = (uint8_t *)&desc->fill.pattern; + uint8_t *dst = (uint8_t *)desc->fill.dst; + uint32_t i; + + for (i = 0; i < desc->fill.len; i++) + dst[i] = fills[i % 8]; +} + static uint32_t -cpucopy_thread(void *param) +cpuwork_thread(void *param) { #define SLEEP_THRESHOLD1 #define SLEEP_US_VAL 10 @@ -127,6 +139,8 @@ cpucopy_thread(void *param) rte_memcpy(desc->copy.dst, desc->copy.src, desc->copy.len); else if (desc->op == SKELDMA_OP_COPY_SG) do_copy_sg(desc); + else if (desc->op == SKELDMA_OP_FILL) + do_fill(desc); __atomic_fetch_add(&hw->completed_count, 1, __ATOMIC_RELEASE); (void)rte_ring_enqueue(hw->desc_completed, (void *)desc); @@ -162,7 +176,7 @@ skeldma_start(struct rte_dma_dev *dev) * 1) fflush pending/running/completed ring to empty ring. * 2) init ring idx to zero. * 3) init running statistics. -* 4) mark cpucopy task exit_flag to false. +* 4) mark cpuwork task exit_flag to false. */ fflush_ring(hw, hw->desc_pending); fflush_ring(hw, hw->desc_running); @@ -178,9 +192,9 @@ skeldma_start(struct rte_dma_dev *dev) snprintf(name, sizeof(name), "dma-skel%d", dev->data->dev_id); ret = rte_thread_create_internal_control(&hw->thread, name, - cpucopy_thread, dev); + cpuwork_thread, dev); if (ret) { - SKELDMA_LOG(ERR, "Start cpucopy thread fail!"); + SKELDMA_LOG(ERR, "Start cpuwork thread fail!"); return -EINVAL; } @@ -462,6 +476,34 @@ skeldma_copy_sg(void *dev_private, uint16_t vchan, return hw->ridx++; } +static int +skeldma_fill(void *dev_private, uint16_t vchan, +uint64_t pattern, rte_iova_t dst, +uint32_t length, uint64_t flags) +{ + struct skeldma_hw *hw = dev_private; + struct skeldma_desc *desc; + int ret; + + RTE_SET_USED(vchan); + + ret = rte_ring_dequeue(hw->desc_empty, (void **)&desc); + if (ret) + return -ENOSPC; + desc->op = SKELDMA_OP_FILL; + desc->ridx = hw->ridx; + desc->fill.dst = (void *)(uintptr_t)dst; + desc->fill.len = length; + desc->fill.pattern = pattern; + if (flags & RTE_DMA_OP_FLAG_SUBMIT) + submit(hw, desc); + else + (void)rte_ring_enqueue(hw->desc_pending, (void *)desc); + hw->submitted_count++; + + return hw->ridx++; +} + static int skeldma_submit(void *dev_private, uint16_t vchan) { @@ -573,6 +615,7 @@ skeldma_create(const char *name, struct rte_vdev_device *vdev, int lcore_id) dev->fp_obj->dev_private = dev->data->dev_private; dev->fp_obj->copy = skeldma_copy; dev->fp_obj->copy_sg = skeldma_copy_sg; + dev->fp_obj->fill = skeldma_fill; dev->fp_obj->submit = skeldma_submit; dev->fp_obj->completed = skeldma_completed; dev->fp_obj->completed_status = skeldma_completed_status; diff --git a/drivers/dma/skeleton/skeleton_dmadev.h b/drivers/dma/skeleton/skeleton_dmadev.h index 7d32dd5095..c9bf3153ba 100644 --- a/drivers/dma/skeleton/skeleton_dmadev.h +++ b/drivers/dma/skeleton/skeleton_dmadev.h @@ -16,6 +16,7 @@ enum skeldma_op { SKELDMA_OP_COPY, SKELDMA_OP_COPY_SG, + SKELDMA_OP_FILL, }; struct skeldma_desc { @@ -34,14 +35,19 @@ struct skeldma_desc { uint16_t nb_src; uint16_t nb_dst; } copy_sg; + struct { + void *dst; + uint32_t len; + uint64_t pattern; + }
[PATCH 1/2] dma/skeleton: support SG copy ops
Add support scatter gather copy. Signed-off-by: Chengwen Feng --- drivers/dma/skeleton/skeleton_dmadev.c | 96 -- drivers/dma/skeleton/skeleton_dmadev.h | 28 ++-- 2 files changed, 113 insertions(+), 11 deletions(-) diff --git a/drivers/dma/skeleton/skeleton_dmadev.c b/drivers/dma/skeleton/skeleton_dmadev.c index eab03852dd..d1d257a064 100644 --- a/drivers/dma/skeleton/skeleton_dmadev.c +++ b/drivers/dma/skeleton/skeleton_dmadev.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2021 HiSilicon Limited + * Copyright(c) 2021-2024 HiSilicon Limited */ #include @@ -37,10 +37,12 @@ skeldma_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *dev_info, dev_info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM | RTE_DMA_CAPA_SVA | -RTE_DMA_CAPA_OPS_COPY; +RTE_DMA_CAPA_OPS_COPY | +RTE_DMA_CAPA_OPS_COPY_SG; dev_info->max_vchans = 1; dev_info->max_desc = SKELDMA_MAX_DESC; dev_info->min_desc = SKELDMA_MIN_DESC; + dev_info->max_sges = SKELDMA_MAX_SGES; return 0; } @@ -55,6 +57,49 @@ skeldma_configure(struct rte_dma_dev *dev, const struct rte_dma_conf *conf, return 0; } +static inline void +do_copy_sg_one(struct rte_dma_sge *src, struct rte_dma_sge *dst, uint16_t nb_dst, uint64_t offset) +{ + uint32_t src_off = 0, dst_off = 0; + uint32_t copy_len = 0; + uint64_t tmp = 0; + uint16_t i; + + /* Locate the segment from which the copy is started. */ + for (i = 0; i < nb_dst; i++) { + tmp += dst[i].length; + if (offset < tmp) { + copy_len = tmp - offset; + dst_off = dst[i].length - copy_len; + break; + } + } + + for (/* Use the above index */; i < nb_dst; i++, copy_len = dst[i].length) { + copy_len = RTE_MIN(copy_len, src->length - src_off); + rte_memcpy((uint8_t *)(uintptr_t)dst[i].addr + dst_off, + (uint8_t *)(uintptr_t)src->addr + src_off, + copy_len); + src_off += copy_len; + if (src_off >= src->length) + break; + dst_off = 0; + } +} + +static inline void +do_copy_sg(struct skeldma_desc *desc) +{ + uint64_t offset = 0; + uint16_t i; + + for (i = 0; i < desc->copy_sg.nb_src; i++) { + do_copy_sg_one(&desc->copy_sg.src[i], desc->copy_sg.dst, + desc->copy_sg.nb_dst, offset); + offset += desc->copy_sg.src[i].length; + } +} + static uint32_t cpucopy_thread(void *param) { @@ -76,9 +121,13 @@ cpucopy_thread(void *param) rte_delay_us_sleep(SLEEP_US_VAL); continue; } - hw->zero_req_count = 0; - rte_memcpy(desc->dst, desc->src, desc->len); + + if (desc->op == SKELDMA_OP_COPY) + rte_memcpy(desc->copy.dst, desc->copy.src, desc->copy.len); + else if (desc->op == SKELDMA_OP_COPY_SG) + do_copy_sg(desc); + __atomic_fetch_add(&hw->completed_count, 1, __ATOMIC_RELEASE); (void)rte_ring_enqueue(hw->desc_completed, (void *)desc); } @@ -368,10 +417,42 @@ skeldma_copy(void *dev_private, uint16_t vchan, ret = rte_ring_dequeue(hw->desc_empty, (void **)&desc); if (ret) return -ENOSPC; - desc->src = (void *)(uintptr_t)src; - desc->dst = (void *)(uintptr_t)dst; - desc->len = length; + desc->op = SKELDMA_OP_COPY; + desc->ridx = hw->ridx; + desc->copy.src = (void *)(uintptr_t)src; + desc->copy.dst = (void *)(uintptr_t)dst; + desc->copy.len = length; + if (flags & RTE_DMA_OP_FLAG_SUBMIT) + submit(hw, desc); + else + (void)rte_ring_enqueue(hw->desc_pending, (void *)desc); + hw->submitted_count++; + + return hw->ridx++; +} + +static int +skeldma_copy_sg(void *dev_private, uint16_t vchan, + const struct rte_dma_sge *src, + const struct rte_dma_sge *dst, + uint16_t nb_src, uint16_t nb_dst, + uint64_t flags) +{ + struct skeldma_hw *hw = dev_private; + struct skeldma_desc *desc; + int ret; + + RTE_SET_USED(vchan); + + ret = rte_ring_dequeue(hw->desc_empty, (void **)&desc); + if (ret) + return -ENOSPC; + desc->op = SKELDMA_OP_COPY_SG; desc->ridx = hw->ridx; + memcpy(desc->copy_sg.src, src, sizeof(*src) * nb_src); + memcpy(desc->copy_sg.dst, dst, sizeof(*dst) * nb_dst); + desc->copy_sg.nb_src = nb_src; + desc->copy_sg.nb_dst = nb_dst; if (flags & RTE_DMA_O
Re: [PATCH] RFC: use C11 alignas instead of GCC attribute aligned
On 2024-01-25 23:53, Morten Brørup wrote: From: Tyler Retzlaff [mailto:roret...@linux.microsoft.com] Sent: Thursday, 25 January 2024 19.37 ping. Please review this thread if you have time, the main point of discussion I would like to receive consensus on the following questions. 1. Should we continue to expand common alignments behind an __rte_macro i.e. what do we prefer to appear in code alignas(RTE_CACHE_LINE_MIN_SIZE) -- or -- __rte_cache_aligned One of the benefits of dropping the macro is it provides a clear visual indicator that it is not placed in the same location or get applied to types as is done with __attribute__((__aligned__(n))). We don't want our own proprietary variant of something that already exists in the C standard. Now that we have moved to C11, the __rte alignment macros should be considered obsolete. Making so something cache-line aligned is not in C11. __rte_cache_aligned is shorter, provides a tiny bit of abstraction, and is already an established DPDK standard. So just keep the macro. If it would change, I would argue for it to be changed to rte_cache_aligned (i.e., just moving it out of __ namespace, and maybe making it all-uppercase). Non-trivial C programs wrap things all the time, standard or not. It's not something to be overly concerned about, imo. Note: I don't mind convenience macros for common use cases, so we could also introduce the macro suggested by Mattias [1]: #define RTE_CACHE_ALIGNAS alignas(RTE_CACHE_LINE_SIZE) [1]: https://inbox.dpdk.org/dev/dc3f3131-38e6-4219-861e-b31ec10c0...@lysator.liu.se/ 2. where should we place alignas(n) or __rte_macro (if we use a macro) Should it be on the same line as the variable or field or on the preceeding line? /* same line example struct */ struct T { /* alignas(64) applies to field0 *not* struct T type declaration */ alignas(64) void *field0; void *field1; ... other fields ... alignas(64) uint64_t field5; uint32_t field6; ... more fields ... }; /* same line example array */ alignas(64) static const uint32_t array[4] = { ... }; -- or -- /* preceeding line example struct */ struct T { /* alignas(64) applies to field0 *not* struct T type declaration */ alignas(64) void *field0; void *field1; ... other fields ... alignas(64) uint64_t field5; uint32_t field6; ... more fields ... }; /* preceeding line example array */ alignas(64) static const uint32_t array[4] = { ... }; Searching the net for what other projects do, I came across this required placement [2]: uint64_t alignas(64) field5; [2]: https://lore.kernel.org/buildroot/2023073851.6faa3391@windsurf/T/ So let's follow the standard's intention and put them on the same line. On an case-by-case basis, we can wrap lines if it improves readability, like we do with function headers that have a lot of attributes. I'll submit patches for lib/* once the discussion is concluded. thanks folks
RE: [RFC] service: extend service function call statistics
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Friday, 26 January 2024 09.28 > > On 2024-01-26 00:19, Morten Brørup wrote: > >> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] > >> Sent: Thursday, 25 January 2024 20.15 > >> > >> Add two new per-service counters. > >> > >> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service > function > >> invocations where no work was performed. > >> > >> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations > >> resulting in an error. > >> > >> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e., > >> counting all invocations, regardless of return value). > >> > >> The new statistics may be useful for both debugging and profiling > >> (e.g., calculate the average per-call processing latency for non- > idle > >> service calls). > >> > >> Service core tests are extended to cover the new counters, and > >> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved. > > > > OK to all of the above. Good stuff. > > > >> > >> The documentation for the CYCLES attributes are updated to reflect > >> their actual semantics. > > > > If this is intended behavior, then updating the documentation seems > appropriate - I would even go so far as considering it a bug fix. > > > > However, quite a few cycles may be consumed by a service before it > can conclude that it had no work to do. Shouldn't that be considered > time spent by the service? I.e. should the code be fixed instead of the > documentation? > > > > Generally, polling for new work in the service is cheap, in my > experience. But there's nothing in principle that prevents the > situation > your describe from occurring. You could add an "IDLE_CYCLES" counter in > case that would ever be a real-world problem. > > That wouldn't be a fix, but rather just returning to the old, subtly > broken, (pre-22.11?) semantics. > > Have a look at 809bd24 to see the rationale for the change. There's an > example in 4689c57. > > The cause of this ambiguity is due to the fact that the platform/OS > (i.e., DPDK) doesn't know which service to "wake up" (which in turn is > the result of the platform not being able to tracking input sources, > like NIC RX queues or timer wheels) and thus must ask every service to > check if it has something to do. OK. Makes good sense. So definitely fix the documentation, not the code. :-) > > > Alternatively, keep the behavior (for backwards compatibility) and > fix the documentation, as this patch does, and add an IDLE_CYCLES > counter for time spent in idle calls. > > > > PS: We're not using DPDK service cores in our applications, so I'm > not familiar with the details. We are using something somewhat similar > (but homegrown), also for profiling and power management purposes, and > my feedback is based on my experience with our own variant of service > cores. > > > > When are you making the switch to service cores? :) Our own "service cores" implementation has some slightly different properties, which we are still experimenting with. E.g. in addition to the special return value "idle (no work to do)", we also have a special return value for "incomplete (more work urgently pending)" when a service processed a full burst and still has more work pending its input queue. We are also considering returning a value to inform what time it needs to be called again. This concept is only an idea, and we haven't started experimenting with it yet. From a high level perspective, the service cores library is much like an operating system's CPU scheduler, although based on voluntary time sharing. Many algorithms and many parameters can be considered. It can also tie into power management and prioritization of different tasks. > > > Either way: > > > > Acked-by: Morten Brørup > > > > Thanks for the review.
RE: [PATCH v2 6/8] common/mlx5: fix calloc parameters
> -Original Message- > From: Morten Brørup > Sent: Wednesday, January 24, 2024 20:01 > To: Ferruh Yigit ; Dariusz Sosnowski > ; Slava Ovsiienko ; Ori > Kam ; Suanming Mou ; Matan > Azrad ; Dmitry Kozlyuk > Cc: dev@dpdk.org; sta...@dpdk.org; dkozl...@nvidia.com > Subject: RE: [PATCH v2 6/8] common/mlx5: fix calloc parameters > > > From: Ferruh Yigit [mailto:ferruh.yi...@amd.com] > > Sent: Wednesday, 24 January 2024 19.54 > > > > gcc [1] generates warning [2] about calloc usage, because calloc > > parameter order is wrong, fixing it by replacing parameters. > > > > [1] > > gcc (GCC) 14.0.1 20240124 (experimental) > > > > [2] > > Compiling C object .../common_mlx5_mlx5_common_mr.c.o > > .../mlx5/mlx5_common_mr.c: In function ‘mlx5_mempool_get_chunks’: > > .../common/mlx5/mlx5_common_mr.c:1384:29: > > warning: ‘calloc’ sizes specified with ‘sizeof’ in the earlier > > argument and not in the later argument [-Wcalloc-transposed-args] > > 1384 | *out = calloc(sizeof(**out), n); > > | ^ > > > > Fixes: 7297d2cdecce ("common/mlx5: fix external memory pool > > registration") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Ferruh Yigit > > --- > > Acked-by: Morten Brørup Acked-by: Dariusz Sosnowski
RE: [PATCH] RFC: use C11 alignas instead of GCC attribute aligned
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Friday, 26 January 2024 11.05 > > On 2024-01-25 23:53, Morten Brørup wrote: > >> From: Tyler Retzlaff [mailto:roret...@linux.microsoft.com] > >> Sent: Thursday, 25 January 2024 19.37 > >> > >> ping. > >> > >> Please review this thread if you have time, the main point of > >> discussion > >> I would like to receive consensus on the following questions. > >> > >> 1. Should we continue to expand common alignments behind an > __rte_macro > >> > >>i.e. what do we prefer to appear in code > >> > >>alignas(RTE_CACHE_LINE_MIN_SIZE) > >> > >>-- or -- > >> > >>__rte_cache_aligned > >> > >> One of the benefits of dropping the macro is it provides a clear > visual > >> indicator that it is not placed in the same location or get applied > >> to types as is done with __attribute__((__aligned__(n))). > > > > We don't want our own proprietary variant of something that already > exists in the C standard. Now that we have moved to C11, the __rte > alignment macros should be considered obsolete. > > Making so something cache-line aligned is not in C11. We are talking about the __rte_aligned() macro, not the cache alignment macro. > > __rte_cache_aligned is shorter, provides a tiny bit of abstraction, and > is already an established DPDK standard. So just keep the macro. If it > would change, I would argue for it to be changed to rte_cache_aligned > (i.e., just moving it out of __ namespace, and maybe making it > all-uppercase). > > Non-trivial C programs wrap things all the time, standard or not. It's > not something to be overly concerned about, imo. Using the cache alignment macro was obviously a bad example for discussing the __rte_aligned() macro. FYI, Tyler later agreed to introducing the RTE_CACHE_ALIGNAS you had proposed in an earlier correspondence. > > > > > Note: I don't mind convenience macros for common use cases, so we > could also introduce the macro suggested by Mattias [1]: > > > > #define RTE_CACHE_ALIGNAS alignas(RTE_CACHE_LINE_SIZE) > > > > [1]: https://inbox.dpdk.org/dev/dc3f3131-38e6-4219-861e- > b31ec10c0...@lysator.liu.se/ > > > >> > >> 2. where should we place alignas(n) or __rte_macro (if we use a > macro) > >> > >> Should it be on the same line as the variable or field or on the > >> preceeding line? > >> > >>/* same line example struct */ > >>struct T { > >>/* alignas(64) applies to field0 *not* struct T type > declaration > >> */ > >>alignas(64) void *field0; > >>void *field1; > >> > >>... other fields ... > >> > >>alignas(64) uint64_t field5; > >>uint32_t field6; > >> > >>... more fields ... > >> > >>}; > >> > >>/* same line example array */ > >>alignas(64) static const uint32_t array[4] = { ... }; > >> > >>-- or -- > >> > >>/* preceeding line example struct */ > >>struct T { > >>/* alignas(64) applies to field0 *not* struct T type > declaration > >> */ > >>alignas(64) > >>void *field0; > >>void *field1; > >> > >>... other fields ... > >> > >>alignas(64) > >>uint64_t field5; > >>uint32_t field6; > >> > >>... more fields ... > >> > >>}; > >> > >>/* preceeding line example array */ > >>alignas(64) > >>static const uint32_t array[4] = { ... }; > >> > > > > Searching the net for what other projects do, I came across this > required placement [2]: > > > > uint64_t alignas(64) field5; > > > > [2]: > https://lore.kernel.org/buildroot/2023073851.6faa3391@windsurf/T/ > > > > So let's follow the standard's intention and put them on the same > line. > > On an case-by-case basis, we can wrap lines if it improves > readability, like we do with function headers that have a lot of > attributes. > > > >> > >> I'll submit patches for lib/* once the discussion is concluded. > >> > >> thanks folks > >
RE: rte_atomic_*_explicit
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Friday, 26 January 2024 09.07 > > On 2024-01-25 23:10, Morten Brørup wrote: > >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > >> Sent: Thursday, 25 January 2024 19.54 > >> > >> Why do rte_stdatomic.h functions have the suffix "_explicit"? > >> Especially > >> since there aren't any wrappers for the implicit variants. > >> > >> More to type, more to read. > > > > They have the "_explicit" suffix to make their names similar to those > in stdatomic.h. > > > > OK, so to avoid a situation where someone accidentally misinterpret > rte_atomic_fetch_add(&counter, 1, rte_memory_order_relaxed); > as what, exactly? > > If you have a wrapper, why not take the opportunity and reap some > benefits from this and fix/extend the standard API, making it a better > fit for your application. Like removing the redundant "_explicit", > "fetch"-less add/sub, maybe and a function for single-writer atomic add > (non-atomic read + non-atomic add + atomic store), etc. > > Prohibiting implicit operations was done already, so it's already now > not a straight-off copy of the standard API. > > > You might consider their existence somewhat temporary until C11 > stdatomics can be fully phased in, so there's another argument for > similar names. (This probably does not happen as long as compilers > generate slower code for C11 stdatomics than with their atomic built- > ins.) > > > > To me, it seems reasonable a wrapper API should stay as long as it > provide a benefit over the standard API. One could imagine that being a > long time. > > I imagine some DPDK developers being tired of migrating from one > atomics > API to another. -> GCC built-ins (-> attempted C11 > stdatomics?) -> . Now you are adding a future "-> CXY > atomics" move as well, and with that it also seems natural to add a > change back to a wrapper or complementary API, when CXY didn't turned > out good enough for some particular platform, or when some non- > complaint > compiler comes along. Yes, more migrations seem to be on the roadmap. We can take the opportunity to change direction now, and decide to keep the API long term. Then it would need more documentation (basically copying function descriptions from ), and the functions could have the "_explicit" suffix removed (macros with the suffix could be added for backwards compatibility), and more functions - like the ones you suggested above - could be added. What do people think? 1. Keep considering a temporary wrapper for until compilers reach some undefined level of maturity, or 2. Consider stable, clean it up (remove "_explicit" suffix), add documentation to the macros, and extend it. Living in a DPDK-only environment, I would prefer option 2; but if mixing DPDK code with non-DPDK code (that uses ) it might be weird. > > I suggested fixing the original API, or at least have a > wrapper API, already at the point DPDK moved to direct GCC built-in > calls. Then we wouldn't have had this atomics API ping-pong. The decision back then might have been too hasty, and based on incomplete assumptions. Please shout louder next time you think a mistake is in the making. > > >> > >> When was this API introduced? Shouldn't it say "experimental" > >> somewhere? > > > > They were introduced as part of the migration to C11. > > I suppose they were not marked experimental because they replaced > something we didn't want anymore (the compiler built-ins for atomics, > e.g. __atomic_load_n()). I don't recall if we discussed experimental > marking or not. In hindsight, they should probably have been marked "experimental".
[PATCH] app/testpmd: support updating flow rule actions
"flow actions_update" updates a flow rule specified by a rule ID with a new action list by making a call to "rte_flow_actions_update()": flow actions_update {port_id} {rule_id} actions {action} [/ {action} [...]] / end [user_id] Creating, updating and destroying a flow rule: testpmd> flow create 0 group 1 pattern eth / end actions drop / end Flow rule #0 created testpmd> flow actions_update 0 0 actions queue index 1 / end Flow rule #0 updated with new actions testpmd> flow destroy 0 rule 0 Flow rule #0 destroyed Signed-off-by: Oleksandr Kolomeiets Reviewed-by: Mykola Kostenok Reviewed-by: Christian Koue Muf --- .mailmap| 3 ++ app/test-pmd/cmdline.c | 4 ++ app/test-pmd/cmdline_flow.c | 34 +++- app/test-pmd/config.c | 43 app/test-pmd/testpmd.h | 3 ++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 44 + 6 files changed, 130 insertions(+), 1 deletion(-) diff --git a/.mailmap b/.mailmap index 6011526cae..44da31c1c2 100644 --- a/.mailmap +++ b/.mailmap @@ -232,6 +232,7 @@ Chintu Hetam Choonho Son Chris Metcalf Christian Ehrhardt +Christian Koue Muf Christian Maciocco Christophe Fontaine Christophe Grosse @@ -986,6 +987,7 @@ Mukesh Dua Murphy Yang Murthy NSSR Muthurajan Jayakumar +Mykola Kostenok Nachiketa Prachanda Nagadheeraj Rottela Naga Harish K S V @@ -1040,6 +1042,7 @@ Odi Assli Ognjen Joldzic Ola Liljedahl Oleg Polyakov +Oleksandr Kolomeiets Olga Shern Olivier Gournet Olivier Matz diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index f704319771..8249e4eb92 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -852,6 +852,10 @@ static void cmd_help_long_parsed(void *parsed_result, "flow destroy {port_id} rule {rule_id} [...]\n" "Destroy specific flow rules.\n\n" + "flow actions_update {port_id} {rule_id}" + " actions {action} [/ {action} [...]] / end [user_id]\n" + "Update a flow rule with new actions.\n\n" + "flow flush {port_id}\n" "Destroy all flow rules.\n\n" diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c index 359c187b3c..0af8d13121 100644 --- a/app/test-pmd/cmdline_flow.c +++ b/app/test-pmd/cmdline_flow.c @@ -91,6 +91,7 @@ enum index { VALIDATE, CREATE, DESTROY, + ACTIONS_UPDATE, FLUSH, DUMP, QUERY, @@ -250,6 +251,7 @@ enum index { VC_TUNNEL_SET, VC_TUNNEL_MATCH, VC_USER_ID, + VC_IS_USER_ID, /* Dump arguments */ DUMP_ALL, @@ -3084,6 +3086,7 @@ static const struct token token_list[] = { VALIDATE, CREATE, DESTROY, + ACTIONS_UPDATE, FLUSH, DUMP, LIST, @@ -3888,6 +3891,17 @@ static const struct token token_list[] = { .args = ARGS(ARGS_ENTRY(struct buffer, port)), .call = parse_destroy, }, + [ACTIONS_UPDATE] = { + .name = "actions_update", + .help = "update a flow rule with new actions", + .next = NEXT(NEXT_ENTRY(VC_IS_USER_ID, END), +NEXT_ENTRY(ACTIONS), +NEXT_ENTRY(COMMON_RULE_ID), +NEXT_ENTRY(COMMON_PORT_ID)), + .args = ARGS(ARGS_ENTRY(struct buffer, args.vc.rule_id), +ARGS_ENTRY(struct buffer, port)), + .call = parse_vc, + }, [FLUSH] = { .name = "flush", .help = "destroy all flow rules", @@ -4134,6 +4148,11 @@ static const struct token token_list[] = { .args = ARGS(ARGS_ENTRY(struct buffer, args.vc.user_id)), .call = parse_vc, }, + [VC_IS_USER_ID] = { + .name = "user_id", + .help = "rule identifier is user-id", + .call = parse_vc, + }, /* Validate/create pattern. */ [ITEM_PATTERN] = { .name = "pattern", @@ -8083,8 +8102,13 @@ parse_vc(struct context *ctx, const struct token *token, if (!out->command) { if (ctx->curr != VALIDATE && ctx->curr != CREATE && ctx->curr != PATTERN_TEMPLATE_CREATE && - ctx->curr != ACTIONS_TEMPLATE_CREATE) + ctx->curr != ACTIONS_TEMPLATE_CREATE && + ctx->curr != ACTIONS_UPDATE) return -1; + if (ctx->curr == ACTIONS_UPDATE) + out->args.vc.pat
Re: [PATCH v4 1/1] ethdev: parsing multiple representor devargs string
On 1/21/2024 7:19 PM, Harman Kalra wrote: > Adding support for parsing multiple representor devargs strings > passed to a PCI BDF. There may be scenario where port representors > for various PFs or VFs under PFs are required and all these are > representor ports shall be backed by single pci device. In such > case port representors can be created using devargs string: > ,representor=[pf[0-1],pf2vf[1,2-3],[4-5]] > This patch is to be able to parse multiple representor device argument, but I am concerned how it is used. When there are multiple representor ports backed up by same device, can't it cause syncronization issues, like if two representor interfaces used for conflicting configurations. Or if datapath will be supported, what if two representator used simultaneously. Meanwhile please check some comments below related to the parser code. <...> > @@ -459,9 +460,26 @@ eth_dev_devargs_tokenise(struct rte_kvargs *arglist, > const char *str_in) > break; > > case 3: /* Parsing list */ > - if (*letter == ']') > - state = 2; > - else if (*letter == '\0') > + if (*letter == ']') { > + /* For devargs having singles lists move to > state 2 once letter > + * becomes ']' so each can be considered as > different pair key > + * value. But in nested lists case e.g. > multiple representors > + * case i.e. [pf[0-3],pfvf[3,4-6]], complete > nested list should > + * be considered as one pair value, hence > checking if end of outer > + * list ']' is reached else stay on state 3. > + */ > + if ((strcmp("representor", pair->key) == 0) > && > + (*(letter + 1) != '\0' && *(letter + 2) != > '\0' && > + *(letter + 3) != '\0') > && > + ((*(letter + 2) == 'p' && *(letter + 3) == > 'f') || > + (*(letter + 2) == 'v' && *(letter + 3) == > 'f') || > + (*(letter + 2) == 's' && *(letter + 3) == > 'f') || > + (*(letter + 2) == 'c' && isdigit(*(letter > + 3))) || > + (*(letter + 2) == '[' && isdigit(*(letter > + 3) > Above is hard to understand but there are some assumptions in the input, can we list supported syntax in the comment to make it more clear. For example following seems not support, can you please check if intentional: [vf0,vf1] // I am aware this can be put as vf[0,1] too [vf[0,1],3] [vf[0],vf[1]] I am not saying above should be supported, but syntax should be clear what is supported what is not. Also I can't check but is the redundant input verified, like: [vf[0-3],vf[3,4]]
Re: Testing scatter support for PMDs using testpmd
On Jan 24, 2024, at 12:16 PM, Jeremy Spewock wrote: Hello maintainers, In porting over the first ethdev suite to the new DTS framework, there was an inconsistency that we found and we were wondering if anyone would be able to shed some light on this. In general the inconsistency pertains to Intel and Mellanox NICs, where one will throw an error and not start testpmd while the other will work as expected. In the original DTS suite for testing scattered packets, testpmd is started with the flags --max-packet-len=9000 and --mbuf-size=2048. This starts and works fine on Intel NICs, but when you use the same flags on a Mellanox NIC, it will produce the error seen below. There is a flag that exists for testpmd called --enable-scatter and when this flag is provided, the Mellanox NIC seems to accept the flags and start without error. Our assumption is that this should be consistent between NICs, is there a reason that one NIC would allow for starting testpmd without explicitly enabling scatter and the other wouldn't? Should this flag always be present, and is it an error that testpmd can start without it in the first place? Here is the error provided when attempting to run on a Mellanox NIC: mlx5_net: port 0 Rx queue 0: Scatter offload is not configured and no enough mbuf space(2048) to contain the maximum RX packet length(9000) with head-room(128) mlx5_net: port 0 unable to allocate rx queue index 0 Fail to configure port 0 rx queues Start ports failed Thank you for any insight, Jeremy Hello Jeremy, I can share a little bit of what I've seen while working on our devices. The client can specify the max packet size, MTU, mbuf size, and whether to enable Rx or Tx s/g (separately). For performance reasons we don't want to enable s/g if it's not needed. Now, the client can easily set things up with a small MTU, start processing, stop the port, and increase the MTU - beyond what a single mbuf would hold. To avoid having to tear down and rebuild the queues on an MTU change to enable s/g support, we automatically enable Rx s/g if the client presents mbufs which are too small to hold the max MTU. Unfortunately the API to configure the Tx queues doesn't tell us anything about the mbuf size, and there's nothing stopping the client from configuring Tx before Rx. So we can't reliably auto-enable Tx s/g, and it is possible to get into a config where the Rx side produces chained mbufs which the Tx side can't handle. To avoid this misconfig we have some versions of our PMD set to fail to start if Rx s/g is enabled but Tx s/g isn't. Hope this helps, Andrew
AGX100 and ACC100 series for 24.03
Hi Maxime, Kind reminder to review these 2 series for 24.03: 1. https://patches.dpdk.org/project/dpdk/patch/20240123165454.104465-2-hernan.var...@intel.com/ 2. https://patches.dpdk.org/project/dpdk/patch/20240112203618.27094-2-hernan.var...@intel.com/ Thanks, Hernan
Re: [24.03 RFC] argparse: add argparse library
On Thu, 25 Jan 2024 14:31:03 +0800 fengchengwen wrote: > Hi Stephen, > > On 2024/1/24 23:54, Stephen Hemminger wrote: > > On Tue, 21 Nov 2023 12:26:51 + > > Chengwen Feng wrote: > > > >> Introduce argparse library (which was inspired by the thread [1]), > >> compared with getopt, the argparse has following advantages: > >> 1) Set the help information when defining parameters. > >> 2) Support positional parameters. > >> > >> The parameters parsing according following: > >> 1) positional: use callback to parse (passed the long-name as the key > >>for callback). > >> 2) optional: > >>In addition to callback to parse, but also support: > >> 2.1) no-val: support set default value to saver. > >> 2.2) has-val: support set value to saver, the value must be conform > >> RTE_ARGPARSE_ARG_VAL_xxx. > >> 2.3) opt-val: if current without value then treat as no-val, else could > >> treat as has-val. > > > > How compatiable is this with Python or other implementations of argparse > > in C? > > This library is a subset of Python argparse, and it don't support some > advanced > features, such as: subcommand, exclusive group. We could extend it to support > if needed. > > As for other implementation of argparse in C, I found a interesting site [1], > this library supports all features except the compact/order/wchar. > > [1] > https://attractivechaos.wordpress.com/2018/08/31/a-survey-of-argument-parsing-libraries-in-c-c/ I was looking at https://github.com/cofyc/argparse?tab=readme-ov-file One good thing there is that argparse options can be defined in array. Like: struct argparse_option options[] = { OPT_HELP(), OPT_GROUP("Basic options"), OPT_BOOLEAN('f', "force", &force, "force to do", NULL, 0, 0), OPT_BOOLEAN('t', "test", &test, "test only", NULL, 0, 0), OPT_STRING('p', "path", &path, "path to read", NULL, 0, 0), OPT_INTEGER('i', "int", &int_num, "selected integer", NULL, 0, 0), OPT_FLOAT('s', "float", &flt_num, "selected float", NULL, 0, 0), OPT_GROUP("Bits options"), OPT_BIT(0, "read", &perms, "read perm", NULL, PERM_READ, OPT_NONEG), OPT_BIT(0, "write", &perms, "write perm", NULL, PERM_WRITE, 0), OPT_BIT(0, "exec", &perms, "exec perm", NULL, PERM_EXEC, 0), OPT_END(), };
Re: [PATCH v3 0/2] ethdev: add the check for PTP capability
On 1/11/2024 6:25 AM, lihuisong (C) wrote: > Hi Ferruh, > > 在 2023/11/23 19:56, lihuisong (C) 写道: >> >> 在 2023/11/2 7:39, Ferruh Yigit 写道: >>> timesync_read_rx_timestamp >>> On 9/21/2023 12:59 PM, lihuisong (C) wrote: add ice & igc maintainers 在 2023/9/21 19:06, Ferruh Yigit 写道: > On 9/21/2023 11:02 AM, lihuisong (C) wrote: >> Hi Ferruh, >> >> Sorry for my delay reply because of taking a look at all PMDs >> implementation. >> >> >> 在 2023/9/16 1:46, Ferruh Yigit 写道: >>> On 8/17/2023 9:42 AM, Huisong Li wrote: From the first version of ptpclient, it seems that this example assume that the PMDs support the PTP feature and enable PTP by default. Please see commit ab129e9065a5 ("examples/ptpclient: add minimal PTP client") which are introduced in 2015. And two years later, Rx HW timestamp offload was introduced to enable or disable PTP feature in HW via rte_eth_rxmode. Please see commit 42ffc45aa340 ("ethdev: add Rx HW timestamp capability"). >>> Hi Huisong, >>> >>> As far as I know this offload is not for PTP. >>> PTP and TIMESTAMP are different. >> If TIMESTAMP offload cannot stand for PTP, we may need to add one new >> offlaod for PTP. >> > Can you please detail what is "PTP offload"? > It indicates whether the device supports PTP or enable PTP feature. >>> We have 'rte_eth_timesync_enable()' and 'rte_eth_timesync_disable()' >>> APIs to control PTP support. >> No, this is just to control it. >> we still need to like a device capablity to report application if the >> port support to call this API, right? >>> >>> But when mention from "offload", it is something device itself does. >>> >>> PTP is a protocol (IEEE 1588), and used to synchronize clocks. >>> What I get is protocol can be parsed by networking stack and it can be >>> used by application to synchronize clock. >>> >>> When you are refer to "PTP offload", does it mean device (NIC) >>> understands the protocol and parse it to synchronize device clock with >>> other devices? >> Good point. PTP offload is unreasonable. >> But the capablity is required indeed. >> What do you think of introducing a RTE_ETH_DEV_PTP in >> dev->data->dev_flags for PTP feature? > > Can you take a look at this discussion line again? > > Let's introduce a RTE_ETH_DEV_PTP in dev->data->dev_flags to reveal if > the device support PTP feature. > Hi Huisong, First let me try to summarize the discussion since it has been some time. HW timer block can be used for Rx timestamp offload (RTE_ETH_RX_OFFLOAD_TIMESTAMP) and/or PTP support, but they are two different things. This patch uses 'RTE_ETH_RX_OFFLOAD_TIMESTAMP' capability for PTP support, which is wrong. After we agreed on above, your next question is to use 'dev_flag' to report PTP capability. First, can you please describe what is the motivation to learn if HW supports PTP or now, what is the benefit of knowing this. If we agree that there is a need to know the PTP capability, question is where to report this capability. Suggested 'dev_flags' is used for various things, some are internal flags and some are status, I don't think overloading this variable is not good idea. Other option is an update 'rte_eth_dev_info_get()' for it but it has the same problem, this function is already overloaded with many different things. We can have an API just to get PTP capability, but this will require a new dev_ops, this can be an option but my concern is having a capability dev_ops for each feature create a mess in dev_ops. Perhaps we can have single get_capability() API, and it gets features as flags to return if that feature is supported or not, but this requires a wider discussion. Instead can we deduce the capability from PTP relevant dev_ops, if they are implemented we can say it is supported? This doesn't require new support.
RE: rte_atomic_*_explicit
> > > >> > >> On Thu, Jan 25, 2024 at 11:10:47PM +0100, Morten Br�rup wrote: > From: Mattias R�nnblom [mailto:hof...@lysator.liu.se] > Sent: Thursday, 25 January 2024 19.54 > > Why do rte_stdatomic.h functions have the suffix "_explicit"? > Especially > since there aren't any wrappers for the implicit variants. > > More to type, more to read. > >>> > >>> They have the "_explicit" suffix to make their names similar to > >>> those in > >> stdatomic.h. > >>> > >>> You might consider their existence somewhat temporary until C11 > >>> stdatomics > >> can be fully phased in, so there's another argument for similar > >> names. (This probably does not happen as long as compilers generate > >> slower code for C11 stdatomics than with their atomic built-ins.) > >> > >> yes, there was feedback at the time it was. > >> > >> * we should *not* have non-explicit versions of the macros > >> * the atomic generic functions should be named to match C11 standard > >>with a rte_ prefix. > > This was mainly done to ensure that users think through the memory > ordering they want to use. This also matches with the compiler atomic built- > ins. Without explicit, it is sequentially consistent memory order. > > > > "This" is referring to the first bullet only, correct? > > You don't have to distinguish between implicit and explicit if you only have > explicit. Agree on your thought. The '_explicit' was added to be aligned with the standard atomic API naming. The thought was - if we are aligned on the names, it needs less explanation for users. > > >> > >>> > > When was this API introduced? Shouldn't it say "experimental" > somewhere? > >>> > >>> They were introduced as part of the migration to C11. > >>> I suppose they were not marked experimental because they replaced > >> something we didn't want anymore (the compiler built-ins for atomics, e.g. > >> __atomic_load_n()). I don't recall if we discussed experimental marking or > not. > >> > >> i don't think we discussed it since they're wrapper macros. > >> > >>> > >>> > >>> Reverse paper trail: > >>> https://git.dpdk.org/dpdk/log/lib/eal/include/rte_stdatomic.h > >>> https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git > >>> - > >> send-email-roret...@linux.microsoft.com/ > >>> https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git > >>> - > >> send-email-roret...@linux.microsoft.com/ > >>>
[PATCH v4 1/7] net/gve: fully expose RSS offload support in dev_info
This patch communicates that the GVE driver supports RSS, along with the RSS offloads supported by the driver. Signed-off-by: Joshua Washington Reviewed-by: Rushil Gupta Reviewed-by: Jeroen de Borst --- drivers/net/gve/gve_ethdev.c | 4 +++- drivers/net/gve/gve_ethdev.h | 8 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c index d162fd3864..6acdb4e13b 100644 --- a/drivers/net/gve/gve_ethdev.c +++ b/drivers/net/gve/gve_ethdev.c @@ -405,7 +405,7 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) dev_info->max_mtu = priv->max_mtu; dev_info->min_mtu = RTE_ETHER_MIN_MTU; - dev_info->rx_offload_capa = 0; + dev_info->rx_offload_capa = RTE_ETH_RX_OFFLOAD_RSS_HASH; dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS | RTE_ETH_TX_OFFLOAD_UDP_CKSUM| @@ -442,6 +442,8 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) .nb_align = 1, }; + dev_info->flow_type_rss_offloads = GVE_RSS_OFFLOAD_ALL; + return 0; } diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h index 9893fcfee6..14c72ec91a 100644 --- a/drivers/net/gve/gve_ethdev.h +++ b/drivers/net/gve/gve_ethdev.h @@ -33,6 +33,14 @@ RTE_MBUF_F_TX_L4_MASK |\ RTE_MBUF_F_TX_TCP_SEG) +#define GVE_RSS_OFFLOAD_ALL ( \ + RTE_ETH_RSS_IPV4 | \ + RTE_ETH_RSS_NONFRAG_IPV4_TCP | \ + RTE_ETH_RSS_IPV6 | \ + RTE_ETH_RSS_IPV6_EX | \ + RTE_ETH_RSS_NONFRAG_IPV6_TCP | \ + RTE_ETH_RSS_IPV6_TCP_EX) + /* A list of pages registered with the device during setup and used by a queue * as buffers */ -- 2.43.0.429.g432eaa2c6b-goog
[PATCH v4 2/7] net/gve: RSS adminq command changes
This change introduces admin queue changes that enable the configuration of RSS parameters for the GVE driver. Signed-off-by: Joshua Washington Reviewed-by: Rushil Gupta Reviewed-by: Jeroen de Borst --- drivers/net/gve/base/gve.h| 15 drivers/net/gve/base/gve_adminq.c | 58 +++ drivers/net/gve/base/gve_adminq.h | 29 drivers/net/gve/gve_ethdev.h | 6 +++- 4 files changed, 107 insertions(+), 1 deletion(-) diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h index f7b297e759..9c58fc4238 100644 --- a/drivers/net/gve/base/gve.h +++ b/drivers/net/gve/base/gve.h @@ -51,4 +51,19 @@ enum gve_state_flags_bit { GVE_PRIV_FLAGS_NAPI_ENABLED = 4, }; +enum gve_rss_hash_algorithm { + GVE_RSS_HASH_UNDEFINED = 0, + GVE_RSS_HASH_TOEPLITZ = 1, +}; + +struct gve_rss_config { + uint16_t hash_types; + enum gve_rss_hash_algorithm alg; + uint16_t key_size; + uint16_t indir_size; + uint8_t *key; + uint32_t *indir; +}; + + #endif /* _GVE_H_ */ diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c index 343bd13d67..629d15cfbe 100644 --- a/drivers/net/gve/base/gve_adminq.c +++ b/drivers/net/gve/base/gve_adminq.c @@ -389,6 +389,9 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv, case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES: priv->adminq_dcfg_device_resources_cnt++; break; + case GVE_ADMINQ_CONFIGURE_RSS: + priv->adminq_cfg_rss_cnt++; + break; case GVE_ADMINQ_SET_DRIVER_PARAMETER: priv->adminq_set_driver_parameter_cnt++; break; @@ -938,3 +941,58 @@ int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv, gve_free_dma_mem(&ptype_map_dma_mem); return err; } + +int gve_adminq_configure_rss(struct gve_priv *priv, +struct gve_rss_config *rss_config) +{ + struct gve_dma_mem indirection_table_dma_mem; + struct gve_dma_mem rss_key_dma_mem; + union gve_adminq_command cmd; + __be32 *indir = NULL; + u8 *key = NULL; + int err = 0; + int i; + + if (!rss_config->indir_size || !rss_config->key_size) + return -EINVAL; + + indir = gve_alloc_dma_mem(&indirection_table_dma_mem, + rss_config->indir_size * + sizeof(*rss_config->indir)); + if (!indir) { + err = -ENOMEM; + goto out; + } + for (i = 0; i < rss_config->indir_size; i++) + indir[i] = cpu_to_be32(rss_config->indir[i]); + + key = gve_alloc_dma_mem(&rss_key_dma_mem, + rss_config->key_size * + sizeof(*rss_config->key)); + if (!key) { + err = -ENOMEM; + goto out; + } + memcpy(key, rss_config->key, rss_config->key_size); + + memset(&cmd, 0, sizeof(cmd)); + cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_RSS); + cmd.configure_rss = (struct gve_adminq_configure_rss) { + .hash_types = cpu_to_be16(rss_config->hash_types), + .halg = rss_config->alg, + .hkey_len = cpu_to_be16(rss_config->key_size), + .indir_len = cpu_to_be16(rss_config->indir_size), + .hkey_addr = cpu_to_be64(rss_key_dma_mem.pa), + .indir_addr = cpu_to_be64(indirection_table_dma_mem.pa), + }; + + err = gve_adminq_execute_cmd(priv, &cmd); + +out: + if (indir) + gve_free_dma_mem(&indirection_table_dma_mem); + if (key) + gve_free_dma_mem(&rss_key_dma_mem); + return err; +} + diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h index f05362f85f..95f4960561 100644 --- a/drivers/net/gve/base/gve_adminq.h +++ b/drivers/net/gve/base/gve_adminq.h @@ -19,6 +19,7 @@ enum gve_adminq_opcodes { GVE_ADMINQ_DESTROY_TX_QUEUE = 0x7, GVE_ADMINQ_DESTROY_RX_QUEUE = 0x8, GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES = 0x9, + GVE_ADMINQ_CONFIGURE_RSS= 0xA, GVE_ADMINQ_SET_DRIVER_PARAMETER = 0xB, GVE_ADMINQ_REPORT_STATS = 0xC, GVE_ADMINQ_REPORT_LINK_SPEED= 0xD, @@ -377,6 +378,27 @@ struct gve_adminq_get_ptype_map { __be64 ptype_map_addr; }; +#define GVE_RSS_HASH_IPV4 BIT(0) +#define GVE_RSS_HASH_TCPV4 BIT(1) +#define GVE_RSS_HASH_IPV6 BIT(2) +#define GVE_RSS_HASH_IPV6_EX BIT(3) +#define GVE_RSS_HASH_TCPV6 BIT(4) +#define GVE_RSS_HASH_TCPV6_EX BIT(5) +#define GVE_RSS_HASH_UDPV4 BIT(6) +#define GVE_RSS_HASH_UDPV6 BIT(7) +#define GVE_RSS_HASH_UDPV6_EX BIT(8) + +/* RSS configuration comman
[PATCH v4 3/7] net/gve: add gve_rss library for handling RSS-related behaviors
This change includes a number of helper functions to facilitate RSS configuration on the GVE DPDK driver. These methods are declared in gve_rss.h. Signed-off-by: Joshua Washington Reviewed-by: Rushil Gupta Reviewed-by: Jeroen de Borst --- drivers/net/gve/base/gve_adminq.h | 10 +- drivers/net/gve/gve_ethdev.c | 2 +- drivers/net/gve/gve_ethdev.h | 4 +- drivers/net/gve/gve_rss.c | 206 ++ drivers/net/gve/gve_rss.h | 107 drivers/net/gve/meson.build | 1 + 6 files changed, 319 insertions(+), 11 deletions(-) create mode 100644 drivers/net/gve/gve_rss.c create mode 100644 drivers/net/gve/gve_rss.h diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h index 95f4960561..24abd945cc 100644 --- a/drivers/net/gve/base/gve_adminq.h +++ b/drivers/net/gve/base/gve_adminq.h @@ -378,15 +378,6 @@ struct gve_adminq_get_ptype_map { __be64 ptype_map_addr; }; -#define GVE_RSS_HASH_IPV4 BIT(0) -#define GVE_RSS_HASH_TCPV4 BIT(1) -#define GVE_RSS_HASH_IPV6 BIT(2) -#define GVE_RSS_HASH_IPV6_EX BIT(3) -#define GVE_RSS_HASH_TCPV6 BIT(4) -#define GVE_RSS_HASH_TCPV6_EX BIT(5) -#define GVE_RSS_HASH_UDPV4 BIT(6) -#define GVE_RSS_HASH_UDPV6 BIT(7) -#define GVE_RSS_HASH_UDPV6_EX BIT(8) /* RSS configuration command */ struct gve_adminq_configure_rss { @@ -428,6 +419,7 @@ union gve_adminq_command { GVE_CHECK_UNION_LEN(64, gve_adminq_command); struct gve_priv; +struct gve_rss_config; struct gve_queue_page_list; int gve_adminq_alloc(struct gve_priv *priv); void gve_adminq_free(struct gve_priv *priv); diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c index 6acdb4e13b..936ca22cb9 100644 --- a/drivers/net/gve/gve_ethdev.c +++ b/drivers/net/gve/gve_ethdev.c @@ -442,7 +442,7 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) .nb_align = 1, }; - dev_info->flow_type_rss_offloads = GVE_RSS_OFFLOAD_ALL; + dev_info->flow_type_rss_offloads = GVE_RTE_RSS_OFFLOAD_ALL; return 0; } diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h index f7635e829c..bc486cb941 100644 --- a/drivers/net/gve/gve_ethdev.h +++ b/drivers/net/gve/gve_ethdev.h @@ -33,7 +33,7 @@ RTE_MBUF_F_TX_L4_MASK |\ RTE_MBUF_F_TX_TCP_SEG) -#define GVE_RSS_OFFLOAD_ALL ( \ +#define GVE_RTE_RSS_OFFLOAD_ALL ( \ RTE_ETH_RSS_IPV4 | \ RTE_ETH_RSS_NONFRAG_IPV4_TCP | \ RTE_ETH_RSS_IPV6 | \ @@ -290,6 +290,8 @@ struct gve_priv { const struct rte_memzone *stats_report_mem; uint16_t stats_start_idx; /* start index of array of stats written by NIC */ uint16_t stats_end_idx; /* end index of array of stats written by NIC */ + + struct gve_rss_config rss_config; }; static inline bool diff --git a/drivers/net/gve/gve_rss.c b/drivers/net/gve/gve_rss.c new file mode 100644 index 00..931180f8f2 --- /dev/null +++ b/drivers/net/gve/gve_rss.c @@ -0,0 +1,206 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2023 Google LLC + */ + +#include "gve_rss.h" + +int +gve_generate_rss_reta(struct rte_eth_dev *dev, struct gve_rss_config *config) +{ + int i; + if (!config || !config->indir) + return -EINVAL; + + for (i = 0; i < config->indir_size; i++) + config->indir[i] = i % dev->data->nb_rx_queues; + + return 0; +} + + +int +gve_init_rss_config(struct gve_rss_config *gve_rss_conf, + uint16_t key_size, uint16_t indir_size) +{ + int err; + + gve_rss_conf->alg = GVE_RSS_HASH_TOEPLITZ; + + gve_rss_conf->key_size = key_size; + gve_rss_conf->key = rte_zmalloc("rss key", + key_size * sizeof(*gve_rss_conf->key), + RTE_CACHE_LINE_SIZE); + if (!gve_rss_conf->key) + return -ENOMEM; + + gve_rss_conf->indir_size = indir_size; + gve_rss_conf->indir = rte_zmalloc("rss reta", + indir_size * sizeof(*gve_rss_conf->indir), + RTE_CACHE_LINE_SIZE); + if (!gve_rss_conf->indir) { + err = -ENOMEM; + goto err_with_key; + } + + return 0; +err_with_key: + rte_free(gve_rss_conf->key); + return err; +} + +int +gve_init_rss_config_from_priv(struct gve_priv *priv, + struct gve_rss_config *gve_rss_conf) +{ + int err = gve_init_rss_config(gve_rss_conf, priv->rss_config.key_size, + priv->rss_config.indir_size); + if (err) + return err; + + gve_rss_conf->hash_types = priv->rss_config.hash_types; + gve_rss_conf->alg = priv->rss_config.alg; + memcpy(gve_rss_conf->key, priv->rss_config.key, + gve_rss_conf->key_size * sizeof(*gve_rs
[PATCH v4 4/7] net/gve: RSS configuration update support
This patch adds support for updating the RSS hash key and hash fields in the GVE PMD through the implementation of rss_hash_update and rss_hash_conf_get. The RSS hash key for gVNIC is required to be 40 bytes. On initial configuration of the RSS hash key, the RSS redirection table will be set to a static default, using a round-robin approach for all queues. Note, however, that this patch does not include support for setting the redirection table explicitly. In dev_configure, if the static redirection table has been set, it will be updated to reflect the new queue count, if it has changed. The RSS key must be set before any other RSS configuration can happen. As such, an attempt to set the hash types before the key is configured will fail. Signed-off-by: Joshua Washington Reviewed-by: Rushil Gupta Reviewed-by: Jeroen de Borst --- drivers/net/gve/gve_ethdev.c | 132 ++- drivers/net/gve/gve_ethdev.h | 3 + 2 files changed, 133 insertions(+), 2 deletions(-) diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c index 936ca22cb9..2a68d31808 100644 --- a/drivers/net/gve/gve_ethdev.c +++ b/drivers/net/gve/gve_ethdev.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(C) 2022 Intel Corporation + * Copyright(C) 2022-2023 Intel Corporation + * Copyright(C) 2023 Google LLC */ #include "gve_ethdev.h" @@ -8,6 +9,7 @@ #include "base/gve_osdep.h" #include "gve_version.h" #include "rte_ether.h" +#include "gve_rss.h" static void gve_write_version(uint8_t *driver_version_register) @@ -88,12 +90,31 @@ gve_dev_configure(struct rte_eth_dev *dev) { struct gve_priv *priv = dev->data->dev_private; - if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) + if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) { dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH; + priv->rss_config.alg = GVE_RSS_HASH_TOEPLITZ; + } if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO) priv->enable_rsc = 1; + /* Reset RSS RETA in case number of queues changed. */ + if (priv->rss_config.indir) { + struct gve_rss_config update_reta_config; + gve_init_rss_config_from_priv(priv, &update_reta_config); + gve_generate_rss_reta(dev, &update_reta_config); + + int err = gve_adminq_configure_rss(priv, &update_reta_config); + if (err) + PMD_DRV_LOG(ERR, + "Could not reconfigure RSS redirection table."); + else + gve_update_priv_rss_config(priv, &update_reta_config); + + gve_free_rss_config(&update_reta_config); + return err; + } + return 0; } @@ -443,6 +464,8 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) }; dev_info->flow_type_rss_offloads = GVE_RTE_RSS_OFFLOAD_ALL; + dev_info->hash_key_size = GVE_RSS_HASH_KEY_SIZE; + dev_info->reta_size = GVE_RSS_INDIR_SIZE; return 0; } @@ -646,6 +669,107 @@ gve_xstats_get_names(struct rte_eth_dev *dev, return count; } + +static int +gve_rss_hash_update(struct rte_eth_dev *dev, + struct rte_eth_rss_conf *rss_conf) +{ + struct gve_priv *priv = dev->data->dev_private; + struct gve_rss_config gve_rss_conf; + int rss_reta_size; + int err; + + if (gve_validate_rss_hf(rss_conf->rss_hf)) { + PMD_DRV_LOG(ERR, "Unsupported hash function."); + return -EINVAL; + } + + if (rss_conf->algorithm != RTE_ETH_HASH_FUNCTION_TOEPLITZ && + rss_conf->algorithm != RTE_ETH_HASH_FUNCTION_DEFAULT) { + PMD_DRV_LOG(ERR, "Device only supports Toeplitz algorithm."); + return -EINVAL; + } + + if (rss_conf->rss_key_len) { + if (rss_conf->rss_key_len != GVE_RSS_HASH_KEY_SIZE) { + PMD_DRV_LOG(ERR, + "Invalid hash key size. Only RSS hash key size " + "of %u supported", GVE_RSS_HASH_KEY_SIZE); + return -EINVAL; + } + + if (!rss_conf->rss_key) { + PMD_DRV_LOG(ERR, "RSS key must be non-null."); + return -EINVAL; + } + } else { + if (!priv->rss_config.key_size) { + PMD_DRV_LOG(ERR, "RSS key must be initialized before " + "any other configuration."); + return -EINVAL; + } + rss_conf->rss_key_len = priv->rss_config.key_size; + } + + rss_reta_size = priv->rss_config.indir ? + priv->rss_config.indir_size : + GVE_RSS_INDIR_SIZE
[PATCH v4 5/7] net/gve: RSS redirection table update support
This patch introduces support for updating the RSS redirection table in the GVE PMD through the implementation of rss_reta_update and rss_reta_query. Due to an infrastructure limitation, the RSS hash key must be manually configured before the redirection table can be updated or queried. The redirection table is expected to be exactly 128 bytes. Signed-off-by: Joshua Washington Reviewed-by: Rushil Gupta Reviewed-by: Jeroen de Borst --- drivers/net/gve/gve_ethdev.c | 95 1 file changed, 95 insertions(+) diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c index 2a68d31808..3b8ec5872d 100644 --- a/drivers/net/gve/gve_ethdev.c +++ b/drivers/net/gve/gve_ethdev.c @@ -770,6 +770,97 @@ gve_rss_hash_conf_get(struct rte_eth_dev *dev, return 0; } +static int +gve_rss_reta_update(struct rte_eth_dev *dev, + struct rte_eth_rss_reta_entry64 *reta_conf, uint16_t reta_size) +{ + struct gve_priv *priv = dev->data->dev_private; + struct gve_rss_config gve_rss_conf; + int table_id; + int err; + int i; + + /* RSS key must be set before the redirection table can be set. */ + if (!priv->rss_config.key || priv->rss_config.key_size == 0) { + PMD_DRV_LOG(ERR, "RSS hash key msut be set before the " + "redirection table can be updated."); + return -ENOTSUP; + } + + if (reta_size != GVE_RSS_INDIR_SIZE) { + PMD_DRV_LOG(ERR, "Redirection table must have %hu elements", + (uint16_t)GVE_RSS_INDIR_SIZE); + return -EINVAL; + } + + err = gve_init_rss_config_from_priv(priv, &gve_rss_conf); + if (err) { + PMD_DRV_LOG(ERR, "Error allocating new RSS config."); + return err; + } + + table_id = 0; + for (i = 0; i < priv->rss_config.indir_size; i++) { + int table_entry = i % RTE_ETH_RETA_GROUP_SIZE; + if (reta_conf[table_id].mask & (1ULL << table_entry)) + gve_rss_conf.indir[i] = + reta_conf[table_id].reta[table_entry]; + + if (table_entry == RTE_ETH_RETA_GROUP_SIZE - 1) + table_id++; + } + + err = gve_adminq_configure_rss(priv, &gve_rss_conf); + if (err) + PMD_DRV_LOG(ERR, "Problem configuring RSS with device."); + else + gve_update_priv_rss_config(priv, &gve_rss_conf); + + gve_free_rss_config(&gve_rss_conf); + return err; +} + +static int +gve_rss_reta_query(struct rte_eth_dev *dev, + struct rte_eth_rss_reta_entry64 *reta_conf, uint16_t reta_size) +{ + struct gve_priv *priv = dev->data->dev_private; + int table_id; + int i; + + if (!(dev->data->dev_conf.rxmode.offloads & + RTE_ETH_RX_OFFLOAD_RSS_HASH)) { + PMD_DRV_LOG(ERR, "RSS not configured."); + return -ENOTSUP; + } + + /* RSS key must be set before the redirection table can be queried. */ + if (!priv->rss_config.key) { + PMD_DRV_LOG(ERR, "RSS hash key must be set before the " + "redirection table can be initialized."); + return -ENOTSUP; + } + + if (reta_size != priv->rss_config.indir_size) { + PMD_DRV_LOG(ERR, "RSS redirection table must have %d entries.", + priv->rss_config.indir_size); + return -EINVAL; + } + + table_id = 0; + for (i = 0; i < priv->rss_config.indir_size; i++) { + int table_entry = i % RTE_ETH_RETA_GROUP_SIZE; + if (reta_conf[table_id].mask & (1ULL << table_entry)) + reta_conf[table_id].reta[table_entry] = + priv->rss_config.indir[i]; + + if (table_entry == RTE_ETH_RETA_GROUP_SIZE - 1) + table_id++; + } + + return 0; +} + static const struct eth_dev_ops gve_eth_dev_ops = { .dev_configure= gve_dev_configure, .dev_start= gve_dev_start, @@ -792,6 +883,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = { .xstats_get_names = gve_xstats_get_names, .rss_hash_update = gve_rss_hash_update, .rss_hash_conf_get= gve_rss_hash_conf_get, + .reta_update = gve_rss_reta_update, + .reta_query = gve_rss_reta_query, }; static const struct eth_dev_ops gve_eth_dev_ops_dqo = { @@ -816,6 +909,8 @@ static const struct eth_dev_ops gve_eth_dev_ops_dqo = { .xstats_get_names = gve_xstats_get_names, .rss_hash_update = gve_rss_hash_update, .rss_hash_conf_get= gve_rss_hash_conf_get, + .reta_update = gve_rss_reta_update, + .reta_query = gve_rss_reta_query, }; static void -- 2.43.0.429.g432eaa2c6b-goog
[PATCH v4 6/7] net/gve: update gve.ini with RSS capabilities
This patch updates the DPDK feature matrix to expose that the GVE driver supports RSS hash, RSS key update, and RSS reta update. Signed-off-by: Joshua Washington --- doc/guides/nics/features/gve.ini | 3 +++ 1 file changed, 3 insertions(+) diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini index 838edd456a..bc08648dbc 100644 --- a/doc/guides/nics/features/gve.ini +++ b/doc/guides/nics/features/gve.ini @@ -15,3 +15,6 @@ Linux= Y x86-32 = Y x86-64 = Y Usage doc= Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y -- 2.43.0.429.g432eaa2c6b-goog
[PATCH v4 7/7] net/gve: update GVE documentation with RSS support
This patch updates the GVE doc page to communicate that GVE now supports RSS configuration and explains the limitations. Signed-off-by: Joshua Washington --- doc/guides/nics/gve.rst | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst index 1c3eaf03ef..80991e70cf 100644 --- a/doc/guides/nics/gve.rst +++ b/doc/guides/nics/gve.rst @@ -70,6 +70,8 @@ Supported features of the GVE PMD are: - Link state information - Tx multi-segments (Scatter Tx) - Tx UDP/TCP/SCTP Checksum +- RSS hash configuration +- RSS redirection table update and query Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD. Jumbo Frame is not supported in PMD for now. @@ -77,10 +79,12 @@ It'll be added in a future DPDK release. Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been released in production. -Currently, setting MTU with value larger than 1460 is not supported. +RSS +^^^ -Currently, only "RSS hash" is force enabled -so that the backend hardware device calculated hash values -could be shared with applications. -But for RSS, there is no such API to config RSS hash function or RETA table. -So, limited RSS is supported only with default config/setting. +GVE RSS can be enabled and configured using the standard interfaces. The driver +does not support querying the initial RSS configuration. + +The RSS hash key must be exactly 40 bytes, and the redirection table must have +128 entries. The RSS hash key must be configured before the redirection table +can be set up. -- 2.43.0.429.g432eaa2c6b-goog
Re: rte_atomic_*_explicit
On Fri, Jan 26, 2024 at 04:58:54PM +, Honnappa Nagarahalli wrote: > > > > > > > >> > > >> On Thu, Jan 25, 2024 at 11:10:47PM +0100, Morten Br�rup wrote: > > From: Mattias R�nnblom [mailto:hof...@lysator.liu.se] > > Sent: Thursday, 25 January 2024 19.54 > > > > Why do rte_stdatomic.h functions have the suffix "_explicit"? > > Especially > > since there aren't any wrappers for the implicit variants. > > > > More to type, more to read. > > >>> > > >>> They have the "_explicit" suffix to make their names similar to > > >>> those in > > >> stdatomic.h. > > >>> > > >>> You might consider their existence somewhat temporary until C11 > > >>> stdatomics > > >> can be fully phased in, so there's another argument for similar > > >> names. (This probably does not happen as long as compilers generate > > >> slower code for C11 stdatomics than with their atomic built-ins.) > > >> > > >> yes, there was feedback at the time it was. > > >> > > >> * we should *not* have non-explicit versions of the macros > > >> * the atomic generic functions should be named to match C11 standard > > >>with a rte_ prefix. > > > This was mainly done to ensure that users think through the memory > > ordering they want to use. This also matches with the compiler atomic built- > > ins. Without explicit, it is sequentially consistent memory order. > > > > > > > "This" is referring to the first bullet only, correct? > > > > You don't have to distinguish between implicit and explicit if you only have > > explicit. > Agree on your thought. > > The '_explicit' was added to be aligned with the standard atomic API naming. > The thought was - if we are aligned on the names, it needs less explanation > for users. well another benefit is that in the future if/when a decision is made to directly use the names from the standard it is most trivial to just strip the rte_ prefix tree wide.
Re: rte_atomic_*_explicit
On Fri, Jan 26, 2024 at 11:52:11AM +0100, Morten Brørup wrote: > > From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > > Sent: Friday, 26 January 2024 09.07 > > > > On 2024-01-25 23:10, Morten Brørup wrote: > > >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > > >> Sent: Thursday, 25 January 2024 19.54 > > >> > > >> Why do rte_stdatomic.h functions have the suffix "_explicit"? > > >> Especially > > >> since there aren't any wrappers for the implicit variants. > > >> > > >> More to type, more to read. > > > > > > They have the "_explicit" suffix to make their names similar to those > > in stdatomic.h. > > > > > > > OK, so to avoid a situation where someone accidentally misinterpret > > rte_atomic_fetch_add(&counter, 1, rte_memory_order_relaxed); > > as what, exactly? > > > > If you have a wrapper, why not take the opportunity and reap some > > benefits from this and fix/extend the standard API, making it a better > > fit for your application. Like removing the redundant "_explicit", > > "fetch"-less add/sub, maybe and a function for single-writer atomic add > > (non-atomic read + non-atomic add + atomic store), etc. > > > > Prohibiting implicit operations was done already, so it's already now > > not a straight-off copy of the standard API. > > > > > You might consider their existence somewhat temporary until C11 > > stdatomics can be fully phased in, so there's another argument for > > similar names. (This probably does not happen as long as compilers > > generate slower code for C11 stdatomics than with their atomic built- > > ins.) > > > > > > > To me, it seems reasonable a wrapper API should stay as long as it > > provide a benefit over the standard API. One could imagine that being a > > long time. > > > > I imagine some DPDK developers being tired of migrating from one > > atomics > > API to another. -> GCC built-ins (-> attempted C11 > > stdatomics?) -> . Now you are adding a future "-> CXY > > atomics" move as well, and with that it also seems natural to add a > > change back to a wrapper or complementary API, when CXY didn't turned > > out good enough for some particular platform, or when some non- > > complaint > > compiler comes along. > > Yes, more migrations seem to be on the roadmap. > > We can take the opportunity to change direction now, and decide to keep the > API long term. > Then it would need more documentation (basically copying function > descriptions from ), and the functions could have the > "_explicit" suffix removed (macros with the suffix could be added for > backwards compatibility), and more functions - like the ones you suggested > above - could be added. > > What do people think? > 1. Keep considering a temporary wrapper for > until compilers reach some undefined level of maturity, or > 2. Consider stable, clean it up (remove "_explicit" > suffix), add documentation to the macros, and extend it. > > Living in a DPDK-only environment, I would prefer option 2; but if mixing > DPDK code with non-DPDK code (that uses ) it might be weird. rte_stdatomic.h should be considered temporary, but how long temporary is depends on when we can deprecate support for distributions and the older toolchains they are tied to. the macros were introduced to allow a path to gradually moving to standard c11 atomics. gcc versions available on distributions we promise support for is currently the biggest barrier to direct adoption. * older versions of gcc generate sub-optimal code when using c11 atomic generics in some instances. * gcc c++23 support is incomplete, in particular at the time we evaluated using the c11 atomics directly but gcc c++23 held us back because the stdatomic.h it shipped didn't interoperate. * the macros allow developers to easily evaluate/compare with and without standard C atomics with a single build-time option -Denable_stdatomic=true * eventually we will want to move directly to the names from the standard, arguably just search and replace of rte_atomic with atomic is the most mechanically trivial - hence there is some value in keeping _explicit suffix. > > > > I suggested fixing the original API, or at least have a > > wrapper API, already at the point DPDK moved to direct GCC built-in > > calls. Then we wouldn't have had this atomics API ping-pong. > > The decision back then might have been too hasty, and based on incomplete > assumptions. > Please shout louder next time you think a mistake is in the making. > > > > > >> > > >> When was this API introduced? Shouldn't it say "experimental" > > >> somewhere? > > > > > > They were introduced as part of the migration to C11. > > > I suppose they were not marked experimental because they replaced > > something we didn't want anymore (the compiler built-ins for atomics, > > e.g. __atomic_load_n()). I don't recall if we discussed experimental > > marking or not. > > In hindsight, they should probably have been marked
Re: [PATCH v3 0/2] ethdev: add the check for PTP capability
在 2024/1/27 0:54, Ferruh Yigit 写道: On 1/11/2024 6:25 AM, lihuisong (C) wrote: Hi Ferruh, 在 2023/11/23 19:56, lihuisong (C) 写道: 在 2023/11/2 7:39, Ferruh Yigit 写道: timesync_read_rx_timestamp On 9/21/2023 12:59 PM, lihuisong (C) wrote: add ice & igc maintainers 在 2023/9/21 19:06, Ferruh Yigit 写道: On 9/21/2023 11:02 AM, lihuisong (C) wrote: Hi Ferruh, Sorry for my delay reply because of taking a look at all PMDs implementation. 在 2023/9/16 1:46, Ferruh Yigit 写道: On 8/17/2023 9:42 AM, Huisong Li wrote: From the first version of ptpclient, it seems that this example assume that the PMDs support the PTP feature and enable PTP by default. Please see commit ab129e9065a5 ("examples/ptpclient: add minimal PTP client") which are introduced in 2015. And two years later, Rx HW timestamp offload was introduced to enable or disable PTP feature in HW via rte_eth_rxmode. Please see commit 42ffc45aa340 ("ethdev: add Rx HW timestamp capability"). Hi Huisong, As far as I know this offload is not for PTP. PTP and TIMESTAMP are different. If TIMESTAMP offload cannot stand for PTP, we may need to add one new offlaod for PTP. Can you please detail what is "PTP offload"? It indicates whether the device supports PTP or enable PTP feature. We have 'rte_eth_timesync_enable()' and 'rte_eth_timesync_disable()' APIs to control PTP support. No, this is just to control it. we still need to like a device capablity to report application if the port support to call this API, right? But when mention from "offload", it is something device itself does. PTP is a protocol (IEEE 1588), and used to synchronize clocks. What I get is protocol can be parsed by networking stack and it can be used by application to synchronize clock. When you are refer to "PTP offload", does it mean device (NIC) understands the protocol and parse it to synchronize device clock with other devices? Good point. PTP offload is unreasonable. But the capablity is required indeed. What do you think of introducing a RTE_ETH_DEV_PTP in dev->data->dev_flags for PTP feature? Can you take a look at this discussion line again? Let's introduce a RTE_ETH_DEV_PTP in dev->data->dev_flags to reveal if the device support PTP feature. Hi Ferruh, Thanks for taking your time to reply. Hi Huisong, First let me try to summarize the discussion since it has been some time. HW timer block can be used for Rx timestamp offload (RTE_ETH_RX_OFFLOAD_TIMESTAMP) and/or PTP support, but they are two different things. This patch uses 'RTE_ETH_RX_OFFLOAD_TIMESTAMP' capability for PTP support, which is wrong. correct. After we agreed on above, your next question is to use 'dev_flag' to report PTP capability. First, can you please describe what is the motivation to learn if HW supports PTP or now, what is the benefit of knowing this. There are a couple of device which have the same driver on a platform or OS. But just allow one device to support or be responsible for PTP feature. The firmware will report a capability to tell the device if it is support PTP. But, currently, driver doesn't know how to report user which device support PTP feature. In addition, many drivers use RTE_LIBRTE_IEEE1588 to control PTP code flow. Whether the device supports PTP is irrelevant to this macro. If we agree that there is a need to know the PTP capability, question is where to report this capability. Suggested 'dev_flags' is used for various things, some are internal flags and some are status, I don't think overloading this variable is not good idea. Yes, we need to consider carefully. Other option is an update 'rte_eth_dev_info_get()' for it but it has the same problem, this function is already overloaded with many different things. We can have an API just to get PTP capability, but this will require a new dev_ops, this can be an option but my concern is having a capability dev_ops for each feature create a mess in dev_ops. Perhaps we can have single get_capability() API, and it gets features as flags to return if that feature is supported or not, but this requires a wider discussion. Instead can we deduce the capability from PTP relevant dev_ops, if they are implemented we can say it is supported? This doesn't require new support. Thank you mentioning so many ways. For the end of advice, I don't think it is appropriate. Because we have to modify dynamically the pointer address of all PTP APIs in dev_ops on the above case. How about we use rte_eth_dev_info.dev_capa to report PTP offload? This is specifically used to report "Non-offload capabilities" according to its document. .