答复: Inquiry Regarding Sending Patches to DPDK
Dear Morten, I have successfully resolved this problem. Prior to the tests, I overlooked the necessity of disabling smtpencryption. However, upon modifying my .gitconfig file as demonstrated below, everything now functions flawlessly. If IT allows anonymous email sending, then the 2FA problem will be solved. [user] name = howard_wang email = howard_w...@realsil.com.cn [core] editor = vim [sendemail] smtpserverport=25 smtpserver=smtpsrv.realsil.com.cn smtpdomain = 172.29.32.27 Thanks! Howard Wang -邮件原件- 发件人: Morten Brørup 发送时间: 2024年8月14日 18:15 收件人: 王颢 ; Stephen Hemminger 抄送: dev@dpdk.org 主题: RE: Inquiry Regarding Sending Patches to DPDK External mail. Howard, I'm using.gitconfig to configure my git send-email options. Try this in your .gitconfig: [user] name = Howard Wang email = howard_w...@realsil.com.cn [sendemail] from = Howard Wang envelopeSender = howard_w...@realsil.com.cn smtpServer = smtpsrv.realsil.com.cn Med venlig hilsen / Kind regards, -Morten Brørup > -Original Message- > From: 王颢 [mailto:howard_w...@realsil.com.cn] > Sent: Wednesday, 14 August 2024 11.52 > To: Stephen Hemminger > Cc: dev@dpdk.org > Subject: 答复: Inquiry Regarding Sending Patches to DPDK > > Dear Stephen, > > Now I have a better understanding of the anonymous sending suggested > by the company's IT department. Since the second-factor authentication > for the email account is Microsoft's Okta, which seems not > straightforward to configure with an account and password, they have > enabled anonymous sending for me. Here's how it works approximately: > When I send emails, I don't need to input an account or password. > Instead, I just need to configure the server and port number, and I can send > emails. Attached below is the script I've written. > However, it seems there are some issues, and perhaps I need to conduct > further research. > > test result: > https://mails.dpdk.org/archives/dev/2024-August/299466.html > python: > #!/usr/bin/env python3 > import smtplib > from email.mime.multipart import MIMEMultipart from email.mime.text > import MIMEText from email.mime.base import MIMEBase from email import > encoders > > smtp_server = 'smtpsrv.realsil.com.cn' > smtp_port = 25 > > from_addr = 'howard_w...@realsil.com.cn' > to_addr = 'dev@dpdk.org' > > msg = MIMEMultipart() > msg['From'] = from_addr > msg['To'] = to_addr > #msg['Subject'] = 'test anonymous send mail' > > filename = '0001-net-r8169-add-PMD-driver-skeleton.patch' > with open(filename, 'rb') as attachment: > part = MIMEBase('application', 'octet-stream') > part.set_payload(attachment.read()) > encoders.encode_base64(part) > part.add_header('Content-Disposition', f"attachment; filename= > {filename}") > msg.attach(part) > > try: > server = smtplib.SMTP(smtp_server, smtp_port) > server.sendmail(from_addr, [to_addr], msg.as_string()) > server.quit() > print('Mail sent successfully!') > except Exception as e: > print(f'Failed to send mail: {e}') > > Thanks! > Howard Wang > > -邮件原件- > 发件人: Stephen Hemminger > 发送时间: 2024年8月12日 22:56 > 收件人: 王颢 > 抄送: dev@dpdk.org > 主题: Re: Inquiry Regarding Sending Patches to DPDK > > > External mail. > > > > On Mon, 12 Aug 2024 07:52:39 + > 王颢 wrote: > > > Dear all, > > > > I hope this message finds you well. > > > > I would like to seek your advice on an issue I've encountered. Our > > company > has recently enabled two-factor authentication (2FA) for our email accounts. > The IT department has suggested that I abandon using the "git send-email" > method, as configured through git config, to send patches to DPDK. > Instead, they have recommended using "Exchange anonymous send mail." > However, I believe this approach might not be feasible. > > > > I wanted to confirm this with you and see if you could provide any > > guidance > on the matter. I look forward to your response. > > > > Thank you very much for your time and assistance. > > > > Best regards, > > Howard Wang > > There are two issues here: > Using git send-email is not required. You can generate patch files and > put them in your email. > BUT Microsoft Exchange does not preserve text formatting in messages. > Any patches sent that way are usually corrupted. > > At Microsoft, we ended up using a special server (not Exchange) to > send Linux and DPDK patches. Or using non-corporate accounts.
[PATCH v2 0/2] examples/l3fwd fixes for ACL mode
From: Konstantin Ananyev As Song Jiale pointed outprevious fix is not enough to fix the problem he is observing with l3fwd in ACl mode: https://bugs.dpdk.org/show_bug.cgi?id=1502 This is a second attempt to fix it. Konstantin Ananyev (2): examples/l3fwd: fix read beyond array bondaries examples/l3fwd: fix read beyond array boundaries in ACL mode examples/l3fwd/l3fwd_acl.c | 37 examples/l3fwd/l3fwd_altivec.h | 6 - examples/l3fwd/l3fwd_common.h| 7 ++ examples/l3fwd/l3fwd_em_hlm.h| 2 +- examples/l3fwd/l3fwd_em_sequential.h | 2 +- examples/l3fwd/l3fwd_fib.c | 2 +- examples/l3fwd/l3fwd_lpm_altivec.h | 2 +- examples/l3fwd/l3fwd_lpm_neon.h | 2 +- examples/l3fwd/l3fwd_lpm_sse.h | 2 +- examples/l3fwd/l3fwd_neon.h | 6 - examples/l3fwd/l3fwd_sse.h | 6 - 11 files changed, 55 insertions(+), 19 deletions(-) -- 2.35.3
[RFC 0/6] Stage-Ordered API and other extensions for ring library
From: Konstantin Ananyev Konstantin Ananyev (6): ring: common functions for 'move head' ops ring: make copying functions generic ring/soring: introduce Staged Ordered Ring app/test: add unit tests for soring API examples/l3fwd: make ACL work in pipeline and eventdev modes ring: minimize reads of the counterpart cache-line The main aim of these series is to extend ring library with new API that allows user to create/use Staged-Ordered-Ring (SORING) abstraction. In addition to that there are few other patches that serve different purposes: - first two patches are just code reordering to de-duplicate and generalize existing rte_ring code. - next two patches introduce SORING API into the ring library and provide UT for it. - patch #5 extends l3fwd sample app to work in pipeline (worker-pool) mode. Right now it is done for demonstration and performance comparison pruposes: it makes possible to run l3fwd in different modes: run-to-completion, eventdev, pipeline and perform sort-of 'apple-to-apple' performance comparisons. I am aware that in general community consensus on l3fwd is to keep its functionality simple and limited. From other side we already do have eventdev mode for it, so why pipeline should be prohibited? Though if l3fwd is not an option, then we need to select some other existing sample app to integrate with. Probably ipsec-secgw would be the second best choice from my perspective, though it would require much more effort. Have to say that current l3fwd patch is way too big and unfinished, so if we'll decide to go forward with it, it has to be split and reworked. - patch #6 - attempt to optimize (by caching counter-part tail value) enqueue/dequeue operations for vanilla rte_ring. Logically tt is not linked with patches 3-5 and probably should be in a separate series. I put it here for now just to minimize 'Depends-on' hassle, so everyone can build/try everything in one go. Seeking community help/feedback (apart from usual patch review activity): = - While we tested these changes quite extensively, our platform coverage is limited to x86 right now. So would appreciate the feedback how it behaves on other architectures DPDK supports (ARM, PPC, etc.). Specially for patch #6: so far we didn't observe noticeable performance improvement with it on x86_64, So if there would be no real gain on other platforms (or scenarios) - I am ok to drop that patch. - Adding new (pipeline) mode for l3fwd sample app. Is it worth it? If not, what other sample app should be used to demonstrate new functionality we worked on? ipsec-secgw? Something else? SORING overview === Staged-Ordered-Ring (SORING) provides a SW abstraction for 'ordered' queues with multiple processing 'stages'. It is based on conventional DPDK rte_ring, re-uses many of its concepts, and even substantial part of its code. It can be viewed as an 'extension' of rte_ring functionality. In particular, main SORING properties: - circular ring buffer with fixed size objects - producer, consumer plus multiple processing stages in between. - allows to split objects processing into multiple stages. - objects remain in the same ring while moving from one stage to the other, initial order is preserved, no extra copying needed. - preserves the ingress order of objects within the queue across multiple stages - each stage (and producer/consumer) can be served by single and/or multiple threads. - number of stages, size and number of objects in the ring are configurable at ring initialization time. Data-path API provides four main operations: - enqueue/dequeue works in the same manner as for conventional rte_ring, all rte_ring synchronization types are supported. - acquire/release - for each stage there is an acquire (start) and release (finish) operation. After some objects are 'acquired' - given thread can safely assume that it has exclusive ownership of these objects till it will invoke 'release' for them. After 'release', objects can be 'acquired' by next stage and/or dequeued by the consumer (in case of last stage). Expected use-case: applications that uses pipeline model (probably with multiple stages) for packet processing, when preserving incoming packet order is important. The concept of ‘ring with stages’ is similar to DPDK OPDL eventdev PMD [1], but the internals are different. In particular, SORING maintains internal array of 'states' for each element in the ring that is shared by all threads/processes that access the ring. That allows 'release' to avoid excessive waits on the tail value and helps to improve performancei and scalability. In terms of performance, with our measurements rte_soring and conventional rte_ring provide nearly identical numbers. As an example, on our SUT: Intel ICX CPU @ 2.00GHz, l3fwd (--lookup=acl) in pipeline mode (see patch #5 for detai
[RFC 1/6] ring: common functions for 'move head' ops
From: Konstantin Ananyev Note upfront: that change doesn't introduce any functional or performance changes. It is just a code-reordering for: - code deduplication - ability in future to re-use the same code to introduce new functionality For each sync mode corresponding move_prod_head() and move_cons_head() are nearly identical to each other, the only differences are: - do we need to use a @capacity to calculate number of entries or not. - what we need to update (prod/cons) and what is used as read-only counterpart. So instead of having 2 copies of nearly identical functions, introduce a new common one that could be used by both functions: move_prod_head() and move_cons_head(). As another positive thing - we can get rid of referencing whole rte_ring structure in that new common sub-function. Signed-off-by: Konstantin Ananyev --- lib/ring/rte_ring_c11_pvt.h | 134 +-- lib/ring/rte_ring_elem_pvt.h | 66 +++ lib/ring/rte_ring_generic_pvt.h | 121 lib/ring/rte_ring_hts_elem_pvt.h | 85 ++-- lib/ring/rte_ring_rts_elem_pvt.h | 85 ++-- 5 files changed, 149 insertions(+), 342 deletions(-) diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h index 629b2d9288..048933ddc6 100644 --- a/lib/ring/rte_ring_c11_pvt.h +++ b/lib/ring/rte_ring_c11_pvt.h @@ -28,41 +28,19 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val, rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release); } -/** - * @internal This function updates the producer head for enqueue - * - * @param r - * A pointer to the ring structure - * @param is_sp - * Indicates whether multi-producer path is needed or not - * @param n - * The number of elements we will want to enqueue, i.e. how far should the - * head be moved - * @param behavior - * RTE_RING_QUEUE_FIXED:Enqueue a fixed number of items from a ring - * RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring - * @param old_head - * Returns head value as it was before the move, i.e. where enqueue starts - * @param new_head - * Returns the current/new head value i.e. where enqueue finishes - * @param free_entries - * Returns the amount of free space in the ring BEFORE head was moved - * @return - * Actual number of objects enqueued. - * If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only. - */ static __rte_always_inline unsigned int -__rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, - unsigned int n, enum rte_ring_queue_behavior behavior, - uint32_t *old_head, uint32_t *new_head, - uint32_t *free_entries) +__rte_ring_headtail_move_head(struct rte_ring_headtail *d, + const struct rte_ring_headtail *s, uint32_t capacity, + unsigned int is_st, unsigned int n, + enum rte_ring_queue_behavior behavior, + uint32_t *old_head, uint32_t *new_head, uint32_t *entries) { - const uint32_t capacity = r->capacity; - uint32_t cons_tail; - unsigned int max = n; + uint32_t stail; int success; + unsigned int max = n; - *old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed); + *old_head = rte_atomic_load_explicit(&d->head, + rte_memory_order_relaxed); do { /* Reset n to the initial burst count */ n = max; @@ -73,112 +51,36 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, /* load-acquire synchronize with store-release of ht->tail * in update_tail. */ - cons_tail = rte_atomic_load_explicit(&r->cons.tail, + stail = rte_atomic_load_explicit(&s->tail, rte_memory_order_acquire); /* The subtraction is done between two unsigned 32bits value * (the result is always modulo 32 bits even if we have -* *old_head > cons_tail). So 'free_entries' is always between 0 +* *old_head > s->tail). So 'free_entries' is always between 0 * and capacity (which is < size). */ - *free_entries = (capacity + cons_tail - *old_head); + *entries = (capacity + stail - *old_head); /* check that we have enough room in ring */ - if (unlikely(n > *free_entries)) + if (unlikely(n > *entries)) n = (behavior == RTE_RING_QUEUE_FIXED) ? - 0 : *free_entries; + 0 : *entries; if (n == 0) return 0; *new_head = *old_head + n; - if (is_sp) { - r->prod.head = *new_head; + if
[RFC 2/6] ring: make copying functions generic
From: Konstantin Ananyev Note upfront: that change doesn't introduce any functional or performance changes. It is just a code-reordering for: - improve code modularity and re-usability - ability in future to re-use the same code to introduce new functionality There is no real need for enqueue_elems()/dequeue_elems() to get pointer to actual rte_ring structure, instead it is enough to pass a pointer to actual elements buffer inside the ring. In return, we'll get a copying functions that could be used for other queueing abstractions that do have circular ring buffer inside. Signed-off-by: Konstantin Ananyev --- lib/ring/rte_ring_elem_pvt.h | 117 --- 1 file changed, 68 insertions(+), 49 deletions(-) diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h index 3a83668a08..216cb6089f 100644 --- a/lib/ring/rte_ring_elem_pvt.h +++ b/lib/ring/rte_ring_elem_pvt.h @@ -17,12 +17,14 @@ #endif static __rte_always_inline void -__rte_ring_enqueue_elems_32(struct rte_ring *r, const uint32_t size, - uint32_t idx, const void *obj_table, uint32_t n) +__rte_ring_enqueue_elems_32(void *ring_table, const void *obj_table, + uint32_t size, uint32_t idx, uint32_t n) { unsigned int i; - uint32_t *ring = (uint32_t *)&r[1]; + + uint32_t *ring = ring_table; const uint32_t *obj = (const uint32_t *)obj_table; + if (likely(idx + n <= size)) { for (i = 0; i < (n & ~0x7); i += 8, idx += 8) { ring[idx] = obj[i]; @@ -60,14 +62,14 @@ __rte_ring_enqueue_elems_32(struct rte_ring *r, const uint32_t size, } static __rte_always_inline void -__rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t prod_head, - const void *obj_table, uint32_t n) +__rte_ring_enqueue_elems_64(void *ring_table, const void *obj_table, + uint32_t size, uint32_t idx, uint32_t n) { unsigned int i; - const uint32_t size = r->size; - uint32_t idx = prod_head & r->mask; - uint64_t *ring = (uint64_t *)&r[1]; + + uint64_t *ring = ring_table; const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table; + if (likely(idx + n <= size)) { for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { ring[idx] = obj[i]; @@ -93,14 +95,14 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t prod_head, } static __rte_always_inline void -__rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t prod_head, - const void *obj_table, uint32_t n) +__rte_ring_enqueue_elems_128(void *ring_table, const void *obj_table, + uint32_t size, uint32_t idx, uint32_t n) { unsigned int i; - const uint32_t size = r->size; - uint32_t idx = prod_head & r->mask; - rte_int128_t *ring = (rte_int128_t *)&r[1]; + + rte_int128_t *ring = ring_table; const rte_int128_t *obj = (const rte_int128_t *)obj_table; + if (likely(idx + n <= size)) { for (i = 0; i < (n & ~0x1); i += 2, idx += 2) memcpy((void *)(ring + idx), @@ -126,37 +128,47 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t prod_head, * single and multi producer enqueue functions. */ static __rte_always_inline void -__rte_ring_enqueue_elems(struct rte_ring *r, uint32_t prod_head, - const void *obj_table, uint32_t esize, uint32_t num) +__rte_ring_do_enqueue_elems(void *ring_table, const void *obj_table, + uint32_t size, uint32_t idx, uint32_t esize, uint32_t num) { /* 8B and 16B copies implemented individually to retain * the current performance. */ if (esize == 8) - __rte_ring_enqueue_elems_64(r, prod_head, obj_table, num); + __rte_ring_enqueue_elems_64(ring_table, obj_table, size, + idx, num); else if (esize == 16) - __rte_ring_enqueue_elems_128(r, prod_head, obj_table, num); + __rte_ring_enqueue_elems_128(ring_table, obj_table, size, + idx, num); else { - uint32_t idx, scale, nr_idx, nr_num, nr_size; + uint32_t scale, nr_idx, nr_num, nr_size; /* Normalize to uint32_t */ scale = esize / sizeof(uint32_t); nr_num = num * scale; - idx = prod_head & r->mask; nr_idx = idx * scale; - nr_size = r->size * scale; - __rte_ring_enqueue_elems_32(r, nr_size, nr_idx, - obj_table, nr_num); + nr_size = size * scale; + __rte_ring_enqueue_elems_32(ring_table, obj_table, nr_size, + nr_idx, nr_num); } } static __rte_always_inline void -__rte_ring_dequeue_elems_32(struct rte_ring *r, const uint32_t size, - uint32_t idx, void *obj_table, uint32_t n)
[RFC 3/6] ring/soring: introduce Staged Ordered Ring
From: Konstantin Ananyev Staged-Ordered-Ring (SORING) provides a SW abstraction for 'ordered' queues with multiple processing 'stages'. It is based on conventional DPDK rte_ring, re-uses many of its concepts, and even substantial part of its code. It can be viewed as an 'extension' of rte_ring functionality. In particular, main SORING properties: - circular ring buffer with fixed size objects - producer, consumer plus multiple processing stages in the middle. - allows to split objects processing into multiple stages. - objects remain in the same ring while moving from one stage to the other, initial order is preserved, no extra copying needed. - preserves the ingress order of objects within the queue across multiple stages, i.e.: at the same stage multiple threads can process objects from the ring in any order, but for the next stage objects will always appear in the original order. - each stage (and producer/consumer) can be served by single and/or multiple threads. - number of stages, size and number of objects in the ring are configurable at ring initialization time. Data-path API provides four main operations: - enqueue/dequeue works in the same manner as for conventional rte_ring, all rte_ring synchronization types are supported. - acquire/release - for each stage there is an acquire (start) and release (finish) operation. after some objects are 'acquired' - given thread can safely assume that it has exclusive possession of these objects till 'release' for them is invoked. Note that right now user has to release exactly the same number of objects that was acquired before. After 'release', objects can be 'acquired' by next stage and/or dequeued by the consumer (in case of last stage). Expected use-case: applications that uses pipeline model (probably with multiple stages) for packet processing, when preserving incoming packet order is important. I.E.: IPsec processing, etc. Signed-off-by: Konstantin Ananyev --- lib/ring/meson.build | 4 +- lib/ring/rte_soring.c | 144 ++ lib/ring/rte_soring.h | 270 ++ lib/ring/soring.c | 431 ++ lib/ring/soring.h | 124 lib/ring/version.map | 13 ++ 6 files changed, 984 insertions(+), 2 deletions(-) create mode 100644 lib/ring/rte_soring.c create mode 100644 lib/ring/rte_soring.h create mode 100644 lib/ring/soring.c create mode 100644 lib/ring/soring.h diff --git a/lib/ring/meson.build b/lib/ring/meson.build index 7fca958ed7..21f2c12989 100644 --- a/lib/ring/meson.build +++ b/lib/ring/meson.build @@ -1,8 +1,8 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2017 Intel Corporation -sources = files('rte_ring.c') -headers = files('rte_ring.h') +sources = files('rte_ring.c', 'rte_soring.c', 'soring.c') +headers = files('rte_ring.h', 'rte_soring.h') # most sub-headers are not for direct inclusion indirect_headers += files ( 'rte_ring_core.h', diff --git a/lib/ring/rte_soring.c b/lib/ring/rte_soring.c new file mode 100644 index 00..17b1b73a42 --- /dev/null +++ b/lib/ring/rte_soring.c @@ -0,0 +1,144 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Huawei Technologies Co., Ltd + */ + +#include "soring.h" +#include + +RTE_LOG_REGISTER_DEFAULT(soring_logtype, INFO); +#define RTE_LOGTYPE_SORING soring_logtype +#define SORING_LOG(level, ...) \ + RTE_LOG_LINE(level, SORING, "" __VA_ARGS__) + +static uint32_t +soring_calc_elem_num(uint32_t count) +{ + return rte_align32pow2(count + 1); +} + +static int +soring_check_param(uint32_t esize, uint32_t stsize, uint32_t count, + uint32_t stages) +{ + if (stages == 0) { + SORING_LOG(ERR, "invalid number of stages: %u", stages); + return -EINVAL; + } + + /* Check if element size is a multiple of 4B */ + if (esize == 0 || esize % 4 != 0) { + SORING_LOG(ERR, "invalid element size: %u", esize); + return -EINVAL; + } + + /* Check if ret-code size is a multiple of 4B */ + if (stsize % 4 != 0) { + SORING_LOG(ERR, "invalid retcode size: %u", stsize); + return -EINVAL; + } + +/* count must be a power of 2 */ + if (rte_is_power_of_2(count) == 0 || + (count > RTE_SORING_ELEM_MAX + 1)) { + SORING_LOG(ERR, "invalid number of elements: %u", count); + return -EINVAL; + } + + return 0; +} + +/* + * Calculate size offsets for SORING internal data layout. + */ +static size_t +soring_get_szofs(uint32_t esize, uint32_t stsize, uint32_t count, + uint32_t stages, size_t *elst_ofs, size_t *state_ofs, + size_t *stage_ofs) +{ + size_t sz; + const struct rte_soring * const r = NULL; + + sz = sizeof(r[0]) + (size_t)count * esize; + sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); + + if (elst_ofs != NULL) +
[RFC 4/6] app/test: add unit tests for soring API
From: Konstantin Ananyev Add both functional and stess test-cases for soring API. Stress test serves as both functional and performance test of soring enqueue/dequeue/acquire/release operations under high contention (for both over committed and non-over committed scenarios). Signed-off-by: Eimear Morrissey Signed-off-by: Konstantin Ananyev --- app/test/meson.build | 3 + app/test/test_soring.c | 452 app/test/test_soring_mt_stress.c | 45 ++ app/test/test_soring_stress.c | 48 ++ app/test/test_soring_stress.h | 35 ++ app/test/test_soring_stress_impl.h | 832 + 6 files changed, 1415 insertions(+) create mode 100644 app/test/test_soring.c create mode 100644 app/test/test_soring_mt_stress.c create mode 100644 app/test/test_soring_stress.c create mode 100644 app/test/test_soring_stress.h create mode 100644 app/test/test_soring_stress_impl.h diff --git a/app/test/meson.build b/app/test/meson.build index e29258e6ec..c290162e43 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -175,6 +175,9 @@ source_file_deps = { 'test_security_proto.c' : ['cryptodev', 'security'], 'test_seqlock.c': [], 'test_service_cores.c': [], +'test_soring.c': [], +'test_soring_mt_stress.c': [], +'test_soring_stress.c': [], 'test_spinlock.c': [], 'test_stack.c': ['stack'], 'test_stack_perf.c': ['stack'], diff --git a/app/test/test_soring.c b/app/test/test_soring.c new file mode 100644 index 00..381979bc6f --- /dev/null +++ b/app/test/test_soring.c @@ -0,0 +1,452 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Huawei Technologies Co., Ltd + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "test.h" + +#define MAX_ACQUIRED 20 + +#define SORING_TEST_ASSERT(val, expected) do { \ + RTE_TEST_ASSERT(expected == val, \ + "%s: expected %u got %u\n", #val, expected, val); \ +} while (0) + +static void +set_soring_init_param(struct rte_soring_param *prm, + const char *name, uint32_t esize, uint32_t elems, + uint32_t stages, uint32_t stsize, + enum rte_ring_sync_type rst_prod, + enum rte_ring_sync_type rst_cons) +{ + prm->name = name; + prm->esize = esize; + prm->elems = elems; + prm->stages = stages; + prm->stsize = stsize; + prm->prod_synt = rst_prod; + prm->cons_synt = rst_cons; +} + +static int +move_forward_stage(struct rte_soring *sor, + uint32_t num_packets, uint32_t stage) +{ + uint32_t acquired; + uint32_t ftoken; + uint32_t *acquired_objs[MAX_ACQUIRED]; + + acquired = rte_soring_acquire(sor, acquired_objs, NULL, stage, + num_packets, RTE_RING_QUEUE_FIXED, &ftoken, NULL); + SORING_TEST_ASSERT(acquired, num_packets); + rte_soring_release(sor, NULL, NULL, stage, num_packets, + ftoken); + + return 0; +} + +/* + * struct rte_soring_param param checking. + */ +static int +test_soring_init(void) +{ + struct rte_soring *sor = NULL; + struct rte_soring_param prm; + int rc; + size_t sz; + memset(&prm, 0, sizeof(prm)); + +/*init memory*/ + set_soring_init_param(&prm, "alloc_memory", sizeof(uintptr_t), + 4, 1, 4, RTE_RING_SYNC_MT, RTE_RING_SYNC_MT); + sz = rte_soring_get_memsize(&prm); + sor = rte_zmalloc(NULL, sz, RTE_CACHE_LINE_SIZE); + RTE_TEST_ASSERT_NOT_NULL(sor, "could not allocate memory for soring"); + + set_soring_init_param(&prm, "test_invalid_stages", sizeof(uintptr_t), + 4, 0, 4, RTE_RING_SYNC_MT, RTE_RING_SYNC_MT); + rc = rte_soring_init(sor, &prm); + RTE_TEST_ASSERT_FAIL(rc, "initted soring with invalid num stages"); + + set_soring_init_param(&prm, "test_invalid_esize", 0, + 4, 1, 4, RTE_RING_SYNC_MT, RTE_RING_SYNC_MT); + rc = rte_soring_init(sor, &prm); + RTE_TEST_ASSERT_FAIL(rc, "initted soring with 0 esize"); + + set_soring_init_param(&prm, "test_invalid_esize", 9, + 4, 1, 4, RTE_RING_SYNC_MT, RTE_RING_SYNC_MT); + rc = rte_soring_init(sor, &prm); + RTE_TEST_ASSERT_FAIL(rc, "initted soring with esize not multiple of 4"); + + set_soring_init_param(&prm, "test_invalid_rsize", sizeof(uintptr_t), + 4, 1, 3, RTE_RING_SYNC_MT, RTE_RING_SYNC_MT); + rc = rte_soring_init(sor, &prm); + RTE_TEST_ASSERT_FAIL(rc, "initted soring with rcsize not multiple of 4"); + + set_soring_init_param(&prm, "test_invalid_elems", sizeof(uintptr_t), + RTE_SORING_ELEM_MAX + 1, 1, 4, RTE_RING_SYNC_M
[RFC 5/6] examples/l3fwd: make ACL work in pipeline and eventdev modes
From: Konstantin Ananyev Note upfront: This is a huge commit that is combined from several ones. For now, I submit it just for reference and demonstration purposes and will probably remove it in future versions. If will decide to go ahead with it, then it needs to be reworked and split into several proper commits. It adds for l3fwd: - eventdev mode for ACL lookup-mode - Introduce a worker-pool-mode (right now implemented for ACL lookup-mode only). Worker-Pool mode is a simple pipeline model, with the following stages: 1) I/O thread receives packets from NIC RX HW queues and enqueues them into the work queue 2) Worker thread reads packets from the work queue(s), process them and then puts processed packets back into the work queue along with the processing status (routing info/error code). 3) I/O thread dequeues packets and their status from the work queue, and based on it either TX packet or drops it. Very similar to l3fwd-eventdev working model. Note that it could be several I/O threads, each can serve one or multiple HW RX queues. Also there could be several Worker threads, each of them can process packets from multiple work queues in round-robin fashion. Work queue can be one of the following types: - wqorder: allows Worker threads to process packets in any order, but guarantees that on dequeue stage the ingress order of packets will be preserved. I.E. at stage #3, I/O thread will get packets exactly in the same order as they were enqueued at stage #1. - wqunorder: doesn't provide any ordered guarantees. 'wqunroder' mode is implemented using 2 rte_ring structures per queue. 'wqorder' mode is implemtened using rte_soring structure per queue. To facilitate this new functionality, command line parameters were extended: --mode: Possible values one of: poll/eventdev/wqorder/wqorderS/wqunorder/wqunorderS Default value: poll - wqorder: Worker-Pool ordered mode with a separate work queue for each HW RX queue. - wqorderS: Worker-Pool ordered mode with one work queue per I/O thread. - wqunorder: Worker-Pool un-ordered mode with a separate work queue for each HW RX queue. - wqunorderS: Worker-Pool un-ordered mode with oen work queue per I/O thread. --wqsize: number of elements for each worker queue. --lookup-iter: forces to perform ACL lookup several times over the same packet. This is artificial parameter and is added temporally for benchmarking purposes. Will be removed in latest versions (if any). Note that in Worker-Pool mode all free lcores that were not assigned as I/O threads will be used as Worker threads. As an example: dpdk-l3fwd --lcores=53,55,57,59,61 ... -- \ -P -p f --config '(0,0,53)(1,0,53)(2,0,53)(3,0,53)' --lookup acl \ --parse-ptype --mode=wqorder ... In that case lcore 53 will be used as I/O thread (stages #1,3) to serve 4 HW RX queues, while lcores 55,57,59,61 will serve as Worker threads (stage #2). Signed-off-by: Konstantin Ananyev --- examples/l3fwd/l3fwd.h | 55 +++ examples/l3fwd/l3fwd_acl.c | 125 +++--- examples/l3fwd/l3fwd_acl_event.h | 258 + examples/l3fwd/l3fwd_event.c | 14 ++ examples/l3fwd/l3fwd_event.h | 1 + examples/l3fwd/l3fwd_sse.h | 49 +- examples/l3fwd/l3fwd_wqp.c | 274 +++ examples/l3fwd/l3fwd_wqp.h | 132 +++ examples/l3fwd/main.c| 75 - examples/l3fwd/meson.build | 1 + 10 files changed, 956 insertions(+), 28 deletions(-) create mode 100644 examples/l3fwd/l3fwd_acl_event.h create mode 100644 examples/l3fwd/l3fwd_wqp.c create mode 100644 examples/l3fwd/l3fwd_wqp.h diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index 93ce652d02..218f363764 100644 --- a/examples/l3fwd/l3fwd.h +++ b/examples/l3fwd/l3fwd.h @@ -77,6 +77,42 @@ struct __rte_cache_aligned lcore_rx_queue { uint16_t queue_id; }; +enum L3FWD_WORKER_MODE { + L3FWD_WORKER_POLL, + L3FWD_WORKER_UNQUE, + L3FWD_WORKER_ORQUE, +}; + +struct l3fwd_wqp_param { + enum L3FWD_WORKER_MODE mode; + uint32_t qsize;/**< Number of elems in worker queue */ + int32_t single;/**< use single queue per I/O (poll) thread */ +}; + +extern struct l3fwd_wqp_param l3fwd_wqp_param; + +enum { + LCORE_WQ_IN, + LCORE_WQ_OUT, + LCORE_WQ_NUM, +}; + +union lcore_wq { + struct rte_ring *r[LCORE_WQ_NUM]; + struct { + struct rte_soring *sor; + /* used by WQ, sort of thred-local var */ + uint32_t ftoken; + }; +}; + +struct lcore_wq_pool { + uint32_t nb_queue; + uint32_t qmask; + union lcore_wq queue[MAX_RX_QUEUE_PER_LCORE]; + struct l3fwd_wqp_param prm; +}; + struct __rte_cache_aligned lcore_conf { uint16_t n_rx_queue; struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE]; @@ -86,6 +122,7 @@ struct __rte_cache_aligned l
[RFC 6/6] ring: minimize reads of the counterpart cache-line
From: Konstantin Ananyev Note upfront: this change shouldn't affect rte_ring public API. Though as layout of public structures have changed - it is an ABI breakage. This is an attempt to implement rte_ring optimization that was suggested by Morten and discussed on this mailing list a while ago. The idea is to optimize MP/SP & MC/SC ring enqueue/dequeue ops by storing along with the head its Cached Foreign Tail (CFT) value. I.E.: for producer we cache consumer tail value and visa-versa. To avoid races head and CFT values are read/written using atomic 64-bit ops. In theory that might help by reducing number of times producer needs to access consumer's cache-line and visa-versa. In practice, I didn't see any impressive boost so far: - ring_per_autotest micro-bench - results are a mixed bag, Some are a bit better, some are worse. - [so]ring_stress_autotest micro-benchs: ~10-15% improvement - l3fwd in wqorder/wqundorder (see previous patch for details): no real difference. Though so far my testing scope was quite limited, I tried it only on x86 machines. So can I ask all interested parties: different platform vendors (ARM, PPC, etc.) and people who do use rte_ring extensively to give it a try and come up with the feedback. If there would be no real performance improvements on any platform we support, or some problems will be encountered - I am ok to drop that patch. Signed-off-by: Konstantin Ananyev --- drivers/net/mlx5/mlx5_hws_cnt.h | 5 ++-- drivers/net/ring/rte_eth_ring.c | 2 +- lib/ring/rte_ring.c | 6 ++-- lib/ring/rte_ring_core.h | 12 +++- lib/ring/rte_ring_generic_pvt.h | 46 +-- lib/ring/rte_ring_peek_elem_pvt.h | 4 +-- lib/ring/soring.c | 31 +++-- lib/ring/soring.h | 4 +-- 8 files changed, 77 insertions(+), 33 deletions(-) diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h index 996ac8dd9a..663146563c 100644 --- a/drivers/net/mlx5/mlx5_hws_cnt.h +++ b/drivers/net/mlx5/mlx5_hws_cnt.h @@ -388,11 +388,12 @@ __mlx5_hws_cnt_pool_enqueue_revert(struct rte_ring *r, unsigned int n, MLX5_ASSERT(r->prod.sync_type == RTE_RING_SYNC_ST); MLX5_ASSERT(r->cons.sync_type == RTE_RING_SYNC_ST); - current_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed); + current_head = rte_atomic_load_explicit(&r->prod.head.val.pos, + rte_memory_order_relaxed); MLX5_ASSERT(n <= r->capacity); MLX5_ASSERT(n <= rte_ring_count(r)); revert2head = current_head - n; - r->prod.head = revert2head; /* This ring should be SP. */ + r->prod.head.val.pos = revert2head; /* This ring should be SP. */ __rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n, &zcd->ptr1, &zcd->n1, &zcd->ptr2); /* Update tail */ diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c index 1346a0dba3..31009e90d2 100644 --- a/drivers/net/ring/rte_eth_ring.c +++ b/drivers/net/ring/rte_eth_ring.c @@ -325,7 +325,7 @@ eth_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) */ pmc->addr = &rng->prod.head; pmc->size = sizeof(rng->prod.head); - pmc->opaque[0] = rng->prod.head; + pmc->opaque[0] = rng->prod.head.val.pos; pmc->fn = ring_monitor_callback; return 0; } diff --git a/lib/ring/rte_ring.c b/lib/ring/rte_ring.c index aebb6d6728..cb2c39c7ad 100644 --- a/lib/ring/rte_ring.c +++ b/lib/ring/rte_ring.c @@ -102,7 +102,7 @@ reset_headtail(void *p) switch (ht->sync_type) { case RTE_RING_SYNC_MT: case RTE_RING_SYNC_ST: - ht->head = 0; + ht->head.raw = 0; ht->tail = 0; break; case RTE_RING_SYNC_MT_RTS: @@ -373,9 +373,9 @@ rte_ring_dump(FILE *f, const struct rte_ring *r) fprintf(f, " size=%"PRIu32"\n", r->size); fprintf(f, " capacity=%"PRIu32"\n", r->capacity); fprintf(f, " ct=%"PRIu32"\n", r->cons.tail); - fprintf(f, " ch=%"PRIu32"\n", r->cons.head); + fprintf(f, " ch=%"PRIu32"\n", r->cons.head.val.pos); fprintf(f, " pt=%"PRIu32"\n", r->prod.tail); - fprintf(f, " ph=%"PRIu32"\n", r->prod.head); + fprintf(f, " ph=%"PRIu32"\n", r->prod.head.val.pos); fprintf(f, " used=%u\n", rte_ring_count(r)); fprintf(f, " avail=%u\n", rte_ring_free_count(r)); } diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h index 270869d214..b88a1bc352 100644 --- a/lib/ring/rte_ring_core.h +++ b/lib/ring/rte_ring_core.h @@ -66,8 +66,17 @@ enum rte_ring_sync_type { * Depending on sync_type format of that structure might be different, * but offset for *sync_type* and *tail* values should remain the same. */ +union __rte_ring_head_cft { + /** raw 8B value to read/write *cnt* and *pos* as one atomic op */ +
RE: [PATCH v2 0/2] examples/l3fwd fixes for ACL mode
Sorry, that's a dup, sent by mistake this time. Please disregard. Konstantin > -Original Message- > From: Konstantin Ananyev > Sent: Thursday, August 15, 2024 9:53 AM > To: dev@dpdk.org > Cc: honnappa.nagaraha...@arm.com; jer...@marvell.com; hemant.agra...@nxp.com; > bruce.richard...@intel.com; > d...@linux.vnet.ibm.com; ruifeng.w...@arm.com; m...@smartsharesystems.com; > Konstantin Ananyev > > Subject: [PATCH v2 0/2] examples/l3fwd fixes for ACL mode > > From: Konstantin Ananyev > > As Song Jiale pointed outprevious fix is not enough to fix > the problem he is observing with l3fwd in ACl mode: > https://bugs.dpdk.org/show_bug.cgi?id=1502 > This is a second attempt to fix it. > > Konstantin Ananyev (2): > examples/l3fwd: fix read beyond array bondaries > examples/l3fwd: fix read beyond array boundaries in ACL mode > > examples/l3fwd/l3fwd_acl.c | 37 > examples/l3fwd/l3fwd_altivec.h | 6 - > examples/l3fwd/l3fwd_common.h| 7 ++ > examples/l3fwd/l3fwd_em_hlm.h| 2 +- > examples/l3fwd/l3fwd_em_sequential.h | 2 +- > examples/l3fwd/l3fwd_fib.c | 2 +- > examples/l3fwd/l3fwd_lpm_altivec.h | 2 +- > examples/l3fwd/l3fwd_lpm_neon.h | 2 +- > examples/l3fwd/l3fwd_lpm_sse.h | 2 +- > examples/l3fwd/l3fwd_neon.h | 6 - > examples/l3fwd/l3fwd_sse.h | 6 - > 11 files changed, 55 insertions(+), 19 deletions(-) > > -- > 2.35.3
crc stripping for vf on same pf
I have 2 pods running on same worker Pod1 send to pod2 Pod2 receive with 4 bytes less at end of packet This problem happens only if the 2 nic are on the same pf, If different pf, the problem doesn’t occurs I tried with dpdk21 and dpdk22 The code is using driver net_iavf nic e810c driver: ice firmware-version: 4.00 0x800139bc 21.5.9 Who does the stripping? The dpdk code or the card? Why is is different behavior for same pf and different pf ? What should i change or check? port_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_KEEP_CRC; //Don't strip CRC port_conf.rxmode.offloads &= pi_devInfo.rx_offload_capa; int ret = rte_eth_dev_configure(pi_nPort, nRxQueues, nTxQueues, &port_conf); struct rte_eth_rxconf rx_conf; rx_conf.offloads = RTE_ETH_RX_OFFLOAD_KEEP_CRC; int ret = rte_eth_rx_queue_setup( pi_nPort, nQueue, nRxRingSize, socket, performanceMode?NULL:&rx_conf, pool);
Re: 22.11.6 patches review and test
On Wed, 31 Jul 2024 at 20:37, wrote: > > Hi all, > > Here is a list of patches targeted for stable release 22.11.6. > > The planned date for the final release is August 20th. > > Please help with testing and validation of your use cases and report > any issues/results with reply-all to this mail. For the final release > the fixes and reported validations will be added to the release notes. > > A release candidate tarball can be found at: > > https://dpdk.org/browse/dpdk-stable/tag/?id=v22.11.6-rc1 > > These patches are located at branch 22.11 of dpdk-stable repo: > https://dpdk.org/browse/dpdk-stable/ > > Thanks. > > Luca Boccassi Hi Ali, As the deadline is approaching, I wanted to double check whether NVIDIA is planning to run regression tests for 22.11.6? If you need more time it's fine to extend the deadline, but if you do not have the bandwidth for this cycle that's ok too, just let me know and I'll go ahead with the release without waiting. Thanks!
RE: [RFC 3/6] ring/soring: introduce Staged Ordered Ring
> From: Konstantin Ananyev > > Staged-Ordered-Ring (SORING) provides a SW abstraction for 'ordered' queues > with multiple processing 'stages'. > It is based on conventional DPDK rte_ring, re-uses many of its concepts, > and even substantial part of its code. > It can be viewed as an 'extension' of rte_ring functionality. > In particular, main SORING properties: > - circular ring buffer with fixed size objects > - producer, consumer plus multiple processing stages in the middle. > - allows to split objects processing into multiple stages. > - objects remain in the same ring while moving from one stage to the other, > initial order is preserved, no extra copying needed. > - preserves the ingress order of objects within the queue across multiple > stages, i.e.: > at the same stage multiple threads can process objects from the ring in > any order, but for the next stage objects will always appear in the > original order. > - each stage (and producer/consumer) can be served by single and/or > multiple threads. > - number of stages, size and number of objects in the ring are > configurable at ring initialization time. > > Data-path API provides four main operations: > - enqueue/dequeue works in the same manner as for conventional rte_ring, > all rte_ring synchronization types are supported. > - acquire/release - for each stage there is an acquire (start) and > release (finish) operation. > after some objects are 'acquired' - given thread can safely assume that > it has exclusive possession of these objects till 'release' for them is > invoked. > Note that right now user has to release exactly the same number of > objects that was acquired before. > After 'release', objects can be 'acquired' by next stage and/or dequeued > by the consumer (in case of last stage). > > Expected use-case: applications that uses pipeline model > (probably with multiple stages) for packet processing, when preserving > incoming packet order is important. I.E.: IPsec processing, etc. > > Signed-off-by: Konstantin Ananyev > --- The existing RING library is for a ring of objects. It is very confusing that the new SORING library is for a ring of object pairs (obj, objst). The new SORING library should be for a ring of objects, like the existing RING library. Please get rid of all the objst stuff. This might also improve performance when not using the optional secondary object. With that in place, you can extend the SORING library with additional APIs for object pairs. I suggest calling the secondary object "metadata" instead of "status" or "state" or "ret-value". I agree that data passed as {obj[num], meta[num]} is more efficient than {obj, meta}[num] in some use cases, which is why your API uses two vector pointers instead of one. Furthermore, you should consider semi-zero-copy APIs for the "acquire"/"release" functions: The "acquire" function can use a concept similar to rte_pktmbuf_read(), where a vector is provided for copying (if the ring wraps), and the return value either points directly to the objects in the ring (zero-copy), or to the vector where the objects were copied to. And the "release" function does not need to copy the object vector back if the "acquire" function returned a zero-copy pointer.
[PATCH] net/af_packet: add explicit flush for Tx
From: Vignesh PS af_packet PMD uses system calls to transmit packets. Separate the transmit function into two different calls so its possible to avoid syscalls during transmit. Signed-off-by: Vignesh PS --- .mailmap | 1 + doc/guides/nics/af_packet.rst | 26 ++- drivers/net/af_packet/rte_eth_af_packet.c | 90 ++- 3 files changed, 110 insertions(+), 7 deletions(-) diff --git a/.mailmap b/.mailmap index 4a508bafad..5e9462b7cd 100644 --- a/.mailmap +++ b/.mailmap @@ -1548,6 +1548,7 @@ Viacheslav Ovsiienko Victor Kaplansky Victor Raj Vidya Sagar Velumuri +Vignesh PS Vignesh Sridhar Vijayakumar Muthuvel Manickam Vijaya Mohan Guvva diff --git a/doc/guides/nics/af_packet.rst b/doc/guides/nics/af_packet.rst index 66b977e1a2..fe92ef231f 100644 --- a/doc/guides/nics/af_packet.rst +++ b/doc/guides/nics/af_packet.rst @@ -29,6 +29,7 @@ Some of these, in turn, will be used to configure the PACKET_MMAP settings. * ``framesz`` - PACKET_MMAP frame size (optional, default 2048B; Note: multiple of 16B); * ``framecnt`` - PACKET_MMAP frame count (optional, default 512). +* ``explicit_flush`` - enable two stage packet transmit. Because this implementation is based on PACKET_MMAP, and PACKET_MMAP has its own pre-requisites, it should be noted that the inner workings of PACKET_MMAP @@ -39,6 +40,9 @@ As an example, if one changes ``framesz`` to be 1024B, it is expected that ``blocksz`` is set to at least 1024B as well (although 2048B in this case would allow two "frames" per "block"). +When ``explicit_flush`` is enabled, then the PMD will temporary buffer mbuf in a +ring buffer in the PMD until ``rte_eth_tx_done_cleanup`` is called on the TX queue. + This restriction happens because PACKET_MMAP expects each single "frame" to fit inside of a "block". And although multiple "frames" can fit inside of a single "block", a "frame" may not span across two "blocks". @@ -64,11 +68,25 @@ framecnt=512): .. code-block:: console - --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0 + --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0,explicit_flush=1 Features and Limitations -The PMD will re-insert the VLAN tag transparently to the packet if the kernel -strips it, as long as the ``RTE_ETH_RX_OFFLOAD_VLAN_STRIP`` is not enabled by the -application. +* The PMD will re-insert the VLAN tag transparently to the packet if the kernel + strips it, as long as the ``RTE_ETH_RX_OFFLOAD_VLAN_STRIP`` is not enabled by the + application. +* The PMD relies on send_to() system call to transmit packets from the PACKET_MMAP socket. + This system call can cause head-in-line blocking. Hence, it's advantageous to buffer the + packets in the drivers instead of immediately triggering packet transmits on calling + ``rte_eth_tx_burst()``. Therefore, the PMD splits the functionality of ``rte_eth_tx_burst()`` + into two functional stages, where ``rte_eth_tx_burst()`` causes packets to be be buffered + in the driver, and subsequent call to ``rte_eth_tx_done_cleanup()`` triggers the actual + packet transmits. With such disaggregated PMD design, it is possible to call + ``rte_eth_tx_burst()`` on workers and trigger tramists (by calling + ``rte_eth_tx_done_cleanup()``) from a control plane worker and eliminate + head-in-line blocking. +* To enable the two stage packet transmit, the PMD should be started with explicit_flush=1 + (Default explicit_flush=0). +* When calling ``rte_eth_tx_done_cleanup()`` the free_cnt parameter has no effect on how + many packets are flushed. The PMD will flush all the packets present in the buffer. diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index 6b7b16f348..cdbe43313a 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -36,9 +36,11 @@ #define ETH_AF_PACKET_FRAMESIZE_ARG"framesz" #define ETH_AF_PACKET_FRAMECOUNT_ARG "framecnt" #define ETH_AF_PACKET_QDISC_BYPASS_ARG "qdisc_bypass" +#define ETH_AF_PACKET_EXPLICIT_FLUSH_ARG "explicit_flush" #define DFLT_FRAME_SIZE(1 << 11) #define DFLT_FRAME_COUNT (1 << 9) +#define DFLT_FRAME_BURST (32) struct __rte_cache_aligned pkt_rx_queue { int sockfd; @@ -62,8 +64,10 @@ struct __rte_cache_aligned pkt_tx_queue { struct iovec *rd; uint8_t *map; + struct rte_ring *buf; unsigned int framecount; unsigned int framenum; + unsigned int explicit_flush; volatile unsigned long tx_pkts; volatile unsigned long err_pkts; @@ -91,6 +95,7 @@ static const char *valid_arguments[] = { ETH_AF_PACKET_FRAMESIZE_ARG, ETH_AF_PACKET_FRAMECOUNT_ARG, ETH_AF_PACKET_QDISC_BYPASS_ARG, + ETH_AF_PACKET_EXPLICIT
[PATCH] net/bonding: add user callback for bond xmit policy
From: Vignesh PS Add support to bonding PMD to allow user callback function registration for TX transmit policy. Signed-off-by: Vignesh PS --- .mailmap| 1 + drivers/net/bonding/eth_bond_private.h | 6 ++ drivers/net/bonding/rte_eth_bond.h | 17 + drivers/net/bonding/rte_eth_bond_api.c | 15 +++ drivers/net/bonding/rte_eth_bond_args.c | 2 ++ drivers/net/bonding/rte_eth_bond_pmd.c | 2 +- drivers/net/bonding/version.map | 1 + 7 files changed, 39 insertions(+), 5 deletions(-) diff --git a/.mailmap b/.mailmap index 4a508bafad..69b229a5b7 100644 --- a/.mailmap +++ b/.mailmap @@ -1548,6 +1548,7 @@ Viacheslav Ovsiienko Victor Kaplansky Victor Raj Vidya Sagar Velumuri +Vignesh PS Vignesh Sridhar Vijayakumar Muthuvel Manickam Vijaya Mohan Guvva diff --git a/drivers/net/bonding/eth_bond_private.h b/drivers/net/bonding/eth_bond_private.h index e688894210..4141b6e09f 100644 --- a/drivers/net/bonding/eth_bond_private.h +++ b/drivers/net/bonding/eth_bond_private.h @@ -32,6 +32,7 @@ #define PMD_BOND_XMIT_POLICY_LAYER2_KVARG ("l2") #define PMD_BOND_XMIT_POLICY_LAYER23_KVARG ("l23") #define PMD_BOND_XMIT_POLICY_LAYER34_KVARG ("l34") +#define PMD_BOND_XMIT_POLICY_USER_KVARG("user") extern int bond_logtype; @@ -101,9 +102,6 @@ struct rte_flow { uint8_t rule_data[]; }; -typedef void (*burst_xmit_hash_t)(struct rte_mbuf **buf, uint16_t nb_pkts, - uint16_t member_count, uint16_t *members); - /** Link Bonding PMD device private configuration Structure */ struct bond_dev_private { uint16_t port_id; /**< Port Id of Bonding Port */ @@ -118,7 +116,7 @@ struct bond_dev_private { /**< Flag for whether primary port is user defined or not */ uint8_t balance_xmit_policy; - /**< Transmit policy - l2 / l23 / l34 for operation in balance mode */ + /**< Transmit policy - l2 / l23 / l34 / user for operation in balance mode */ burst_xmit_hash_t burst_xmit_hash; /**< Transmit policy hash function */ diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h index f10165f2c6..66bc41097a 100644 --- a/drivers/net/bonding/rte_eth_bond.h +++ b/drivers/net/bonding/rte_eth_bond.h @@ -91,6 +91,11 @@ extern "C" { /**< Layer 2+3 (Ethernet MAC + IP Addresses) transmit load balancing */ #define BALANCE_XMIT_POLICY_LAYER34(2) /**< Layer 3+4 (IP Addresses + UDP Ports) transmit load balancing */ +#define BALANCE_XMIT_POLICY_USER (3) +/**< User callback function to transmit load balancing */ + +typedef void (*burst_xmit_hash_t)(struct rte_mbuf **buf, uint16_t nb_pkts, + uint16_t slave_count, uint16_t *slaves); /** * Create a bonding rte_eth_dev device @@ -351,6 +356,18 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonding_port_id, int rte_eth_bond_link_up_prop_delay_get(uint16_t bonding_port_id); +/** + * Register transmit callback function for bonded device to use when it is operating in + * balance mode. The callback is ignored in other modes of operation. + * + * @param cb_fn User defined callback function to determine the xmit slave + * + * @return + * 0 on success, negative value otherwise. + */ +__rte_experimental +int +rte_eth_bond_xmit_policy_cb_register(burst_xmit_hash_t cb_fn); #ifdef __cplusplus } diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c index 99e496556a..b53038eeda 100644 --- a/drivers/net/bonding/rte_eth_bond_api.c +++ b/drivers/net/bonding/rte_eth_bond_api.c @@ -15,6 +15,8 @@ #include "eth_bond_private.h" #include "eth_bond_8023ad_private.h" +static burst_xmit_hash_t burst_xmit_user_hash; + int check_for_bonding_ethdev(const struct rte_eth_dev *eth_dev) { @@ -972,6 +974,13 @@ rte_eth_bond_mac_address_reset(uint16_t bonding_port_id) return 0; } +int +rte_eth_bond_xmit_policy_cb_register(burst_xmit_hash_t cb_fn) +{ + burst_xmit_user_hash = cb_fn; + return 0; +} + int rte_eth_bond_xmit_policy_set(uint16_t bonding_port_id, uint8_t policy) { @@ -995,6 +1004,12 @@ rte_eth_bond_xmit_policy_set(uint16_t bonding_port_id, uint8_t policy) internals->balance_xmit_policy = policy; internals->burst_xmit_hash = burst_xmit_l34_hash; break; + case BALANCE_XMIT_POLICY_USER: + if (burst_xmit_user_hash == NULL) + return -1; + internals->balance_xmit_policy = policy; + internals->burst_xmit_hash = burst_xmit_user_hash; + break; default: return -1; diff --git a/drivers/net/bonding/rte_eth_bond_args.c b/drivers/net/bonding/rte_eth_bond_args.c index bdec5d61d4..eaa313bf73 100644 --- a/drivers/net/bonding/rte_eth_bond_args.c +++ b/drivers/net/bonding/rte_eth_bond_args.c @@ -261,6 +261,8 @@ bond
RE: [RFC 3/6] ring/soring: introduce Staged Ordered Ring
> > From: Konstantin Ananyev > > > > Staged-Ordered-Ring (SORING) provides a SW abstraction for 'ordered' queues > > with multiple processing 'stages'. > > It is based on conventional DPDK rte_ring, re-uses many of its concepts, > > and even substantial part of its code. > > It can be viewed as an 'extension' of rte_ring functionality. > > In particular, main SORING properties: > > - circular ring buffer with fixed size objects > > - producer, consumer plus multiple processing stages in the middle. > > - allows to split objects processing into multiple stages. > > - objects remain in the same ring while moving from one stage to the other, > > initial order is preserved, no extra copying needed. > > - preserves the ingress order of objects within the queue across multiple > > stages, i.e.: > > at the same stage multiple threads can process objects from the ring in > > any order, but for the next stage objects will always appear in the > > original order. > > - each stage (and producer/consumer) can be served by single and/or > > multiple threads. > > - number of stages, size and number of objects in the ring are > > configurable at ring initialization time. > > > > Data-path API provides four main operations: > > - enqueue/dequeue works in the same manner as for conventional rte_ring, > > all rte_ring synchronization types are supported. > > - acquire/release - for each stage there is an acquire (start) and > > release (finish) operation. > > after some objects are 'acquired' - given thread can safely assume that > > it has exclusive possession of these objects till 'release' for them is > > invoked. > > Note that right now user has to release exactly the same number of > > objects that was acquired before. > > After 'release', objects can be 'acquired' by next stage and/or dequeued > > by the consumer (in case of last stage). > > > > Expected use-case: applications that uses pipeline model > > (probably with multiple stages) for packet processing, when preserving > > incoming packet order is important. I.E.: IPsec processing, etc. > > > > Signed-off-by: Konstantin Ananyev > > --- > > The existing RING library is for a ring of objects. > > It is very confusing that the new SORING library is for a ring of object > pairs (obj, objst). > > The new SORING library should be for a ring of objects, like the existing > RING library. Please get rid of all the objst stuff. > > This might also improve performance when not using the optional secondary > object. > > > With that in place, you can extend the SORING library with additional APIs > for object pairs. > > I suggest calling the secondary object "metadata" instead of "status" or > "state" or "ret-value". > I agree that data passed as {obj[num], meta[num]} is more efficient than > {obj, meta}[num] in some use cases, which is why your API > uses two vector pointers instead of one. I suppose what you suggest is to have 2 set of functions: one that takes both objs[] and meta[] and second that takes just objs[]? If so, yes I can do that - in fact I was thinking about same thing. BTW, right now meta[] is an optional one anyway. Also will probably get rid of explicit 'behavior' and will have '_burst_' and '_bulk_' versions instead, same as rte_ring. > > Furthermore, you should consider semi-zero-copy APIs for the > "acquire"/"release" functions: > > The "acquire" function can use a concept similar to rte_pktmbuf_read(), where > a vector is provided for copying (if the ring wraps), and > the return value either points directly to the objects in the ring > (zero-copy), or to the vector where the objects were copied to. You mean to introduce analog of rte_ring '_zc_' functions? Yes, I considered that, but decided to leave it for the future. First, because we do need a generic and simple function with copying things anyway. Second I am not so convinced that this _zc_ will give much performance gain, while it definitely makes API not that straightforward. > And the "release" function does not need to copy the object vector back if > the "acquire" function returned a zero-copy pointer. For "release" you don't need to *always* copy objs[] and meta[]. It is optional and is left for the user to decide based on the use-case. If he doesn't need to update objs[] or meta[] he can just pass a NULL ptr here.
RE: [RFC 3/6] ring/soring: introduce Staged Ordered Ring
> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > > > > From: Konstantin Ananyev > > > > > > Staged-Ordered-Ring (SORING) provides a SW abstraction for 'ordered' > queues > > > with multiple processing 'stages'. > > > It is based on conventional DPDK rte_ring, re-uses many of its concepts, > > > and even substantial part of its code. > > > It can be viewed as an 'extension' of rte_ring functionality. > > > In particular, main SORING properties: > > > - circular ring buffer with fixed size objects > > > - producer, consumer plus multiple processing stages in the middle. > > > - allows to split objects processing into multiple stages. > > > - objects remain in the same ring while moving from one stage to the > other, > > > initial order is preserved, no extra copying needed. > > > - preserves the ingress order of objects within the queue across multiple > > > stages, i.e.: > > > at the same stage multiple threads can process objects from the ring in > > > any order, but for the next stage objects will always appear in the > > > original order. > > > - each stage (and producer/consumer) can be served by single and/or > > > multiple threads. > > > - number of stages, size and number of objects in the ring are > > > configurable at ring initialization time. > > > > > > Data-path API provides four main operations: > > > - enqueue/dequeue works in the same manner as for conventional rte_ring, > > > all rte_ring synchronization types are supported. > > > - acquire/release - for each stage there is an acquire (start) and > > > release (finish) operation. > > > after some objects are 'acquired' - given thread can safely assume that > > > it has exclusive possession of these objects till 'release' for them is > > > invoked. > > > Note that right now user has to release exactly the same number of > > > objects that was acquired before. > > > After 'release', objects can be 'acquired' by next stage and/or dequeued > > > by the consumer (in case of last stage). > > > > > > Expected use-case: applications that uses pipeline model > > > (probably with multiple stages) for packet processing, when preserving > > > incoming packet order is important. I.E.: IPsec processing, etc. > > > > > > Signed-off-by: Konstantin Ananyev > > > --- > > > > The existing RING library is for a ring of objects. > > > > It is very confusing that the new SORING library is for a ring of object > pairs (obj, objst). > > > > The new SORING library should be for a ring of objects, like the existing > RING library. Please get rid of all the objst stuff. > > > > This might also improve performance when not using the optional secondary > object. > > > > > > With that in place, you can extend the SORING library with additional APIs > for object pairs. > > > > I suggest calling the secondary object "metadata" instead of "status" or > "state" or "ret-value". > > I agree that data passed as {obj[num], meta[num]} is more efficient than > {obj, meta}[num] in some use cases, which is why your API > > uses two vector pointers instead of one. > > I suppose what you suggest is to have 2 set of functions: one that takes both > objs[] and meta[] and second that takes just objs[]? > If so, yes I can do that - in fact I was thinking about same thing. Yes, please. Mainly for readability/familiarity; it makes the API much more similar to the Ring API. > BTW, right now meta[] is an optional one anyway. I noticed that meta[] is optional, but it is confusing that the APIs are so much different than the Ring APIs. With two sets of functions, the basic set will resemble the Ring APIs much more. > Also will probably get rid of explicit 'behavior' and will have '_burst_' and > '_bulk_' versions instead, > same as rte_ring. +1 > > > > > Furthermore, you should consider semi-zero-copy APIs for the > "acquire"/"release" functions: > > > > The "acquire" function can use a concept similar to rte_pktmbuf_read(), > where a vector is provided for copying (if the ring wraps), and > > the return value either points directly to the objects in the ring (zero- > copy), or to the vector where the objects were copied to. > > You mean to introduce analog of rte_ring '_zc_' functions? > Yes, I considered that, but decided to leave it for the future. Somewhat similar, but I think the (semi-)zero-copy "acquire"/"release" APIs will be simpler than the rte_ring's _zc_ functions because we know that no other thread can dequeue the objects out of the ring before the processing stage has released them, i.e. no additional locking is required. Anyway, leave it for the future. I don't think it will require changes to the underlying implementation, so we don't need to consider it in advance. > First, because we do need a generic and simple function with copying things > anyway. > Second I am not so convinced that this _zc_ will give much performance gain, > while it definitely makes API not that straightforward. > > > And the "rel
[PATCH] app/testpmd: add L4 port to verbose output
To help distinguish packets we want to add more identifiable information and print port number for all packets. This will make packet metadata more uniform as previously it only printed port number for encapsulated packets. Bugzilla-ID: 1517 Signed-off-by: Alex Chapman Reviewed-by: Luca Vizzarro Reviewed-by: Paul Szczepanek --- app/test-pmd/util.c | 71 +++-- 1 file changed, 42 insertions(+), 29 deletions(-) diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c index bf9b639d95..5fa05fad16 100644 --- a/app/test-pmd/util.c +++ b/app/test-pmd/util.c @@ -81,7 +81,6 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[], char buf[256]; struct rte_net_hdr_lens hdr_lens; uint32_t sw_packet_type; - uint16_t udp_port; uint32_t vx_vni; const char *reason; int dynf_index; @@ -234,49 +233,63 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[], if (sw_packet_type & RTE_PTYPE_INNER_L4_MASK) MKDUMPSTR(print_buf, buf_size, cur_len, " - inner_l4_len=%d", hdr_lens.inner_l4_len); - if (is_encapsulation) { - struct rte_ipv4_hdr *ipv4_hdr; - struct rte_ipv6_hdr *ipv6_hdr; - struct rte_udp_hdr *udp_hdr; - uint8_t l2_len; - uint8_t l3_len; - uint8_t l4_len; - uint8_t l4_proto; - struct rte_vxlan_hdr *vxlan_hdr; - - l2_len = sizeof(struct rte_ether_hdr); - - /* Do not support ipv4 option field */ - if (RTE_ETH_IS_IPV4_HDR(packet_type)) { - l3_len = sizeof(struct rte_ipv4_hdr); - ipv4_hdr = rte_pktmbuf_mtod_offset(mb, + + struct rte_ipv4_hdr *ipv4_hdr; + struct rte_ipv6_hdr *ipv6_hdr; + struct rte_udp_hdr *udp_hdr; + struct rte_tcp_hdr *tcp_hdr; + uint8_t l2_len; + uint8_t l3_len; + uint8_t l4_len; + uint8_t l4_proto; + uint16_t l4_port; + struct rte_vxlan_hdr *vxlan_hdr; + + l2_len = sizeof(struct rte_ether_hdr); + + /* Do not support ipv4 option field */ + if (RTE_ETH_IS_IPV4_HDR(packet_type)) { + l3_len = sizeof(struct rte_ipv4_hdr); + ipv4_hdr = rte_pktmbuf_mtod_offset(mb, struct rte_ipv4_hdr *, l2_len); - l4_proto = ipv4_hdr->next_proto_id; - } else { - l3_len = sizeof(struct rte_ipv6_hdr); - ipv6_hdr = rte_pktmbuf_mtod_offset(mb, + l4_proto = ipv4_hdr->next_proto_id; + } else { + l3_len = sizeof(struct rte_ipv6_hdr); + ipv6_hdr = rte_pktmbuf_mtod_offset(mb, struct rte_ipv6_hdr *, l2_len); - l4_proto = ipv6_hdr->proto; - } - if (l4_proto == IPPROTO_UDP) { - udp_hdr = rte_pktmbuf_mtod_offset(mb, + l4_proto = ipv6_hdr->proto; + } + if (l4_proto == IPPROTO_UDP) { + udp_hdr = rte_pktmbuf_mtod_offset(mb, struct rte_udp_hdr *, l2_len + l3_len); + l4_port = RTE_BE_TO_CPU_16(udp_hdr->dst_port); + if (is_encapsulation) { l4_len = sizeof(struct rte_udp_hdr); vxlan_hdr = rte_pktmbuf_mtod_offset(mb, - struct rte_vxlan_hdr *, - l2_len + l3_len + l4_len); - udp_port = RTE_BE_TO_CPU_16(udp_hdr->dst_port); + struct rte_vxlan_hdr *, + l2_len + l3_len + l4_len); vx_vni = rte_be_to_cpu_32(vxlan_hdr->vx_vni); MKDUMPSTR(print_buf, buf_size, cur_len, " - VXLAN packet: packet type =%d, " "Destination UDP port =%d, VNI = %d, " "last_rsvd = %d", packet_type, - udp_port, vx_vni >> 8, vx_vni & 0xff); + l4_port, vx_vni >> 8, vx_vni & 0xff); + } else { + MKDUMPSTR(p
Re: [PATCH] app/testpmd: add L4 port to verbose output
On Thu, 15 Aug 2024 15:20:51 +0100 Alex Chapman wrote: > To help distinguish packets we want to add more identifiable > information and print port number for all packets. > This will make packet metadata more uniform as previously it > only printed port number for encapsulated packets. > > Bugzilla-ID: 1517 > > Signed-off-by: Alex Chapman > Reviewed-by: Luca Vizzarro > Reviewed-by: Paul Szczepanek The verbose output is already too verbose. Maybe you would like the simpler format (which does include the port number) see the network packet dissector patches.
Re: [PATCH] net/bonding: add user callback for bond xmit policy
Recheck-request: iol-marvell-Functional Putting in a retest for this.
RE: 22.11.6 patches review and test
> -Original Message- > From: Luca Boccassi > Sent: Thursday, August 15, 2024 2:11 PM > To: sta...@dpdk.org > Cc: dev@dpdk.org; Ali Alnubani ; John McNamara > ; Raslan Darawsheh ; NBU- > Contact-Thomas Monjalon (EXTERNAL) > Subject: Re: 22.11.6 patches review and test > > On Wed, 31 Jul 2024 at 20:37, wrote: > > > > Hi all, > > > > Here is a list of patches targeted for stable release 22.11.6. > > > > The planned date for the final release is August 20th. > > > > Please help with testing and validation of your use cases and report > > any issues/results with reply-all to this mail. For the final release > > the fixes and reported validations will be added to the release notes. > > > > A release candidate tarball can be found at: > > > > https://dpdk.org/browse/dpdk-stable/tag/?id=v22.11.6-rc1 > > > > These patches are located at branch 22.11 of dpdk-stable repo: > > https://dpdk.org/browse/dpdk-stable/ > > > > Thanks. > > > > Luca Boccassi > > Hi Ali, > > As the deadline is approaching, I wanted to double check whether > NVIDIA is planning to run regression tests for 22.11.6? If you need > more time it's fine to extend the deadline, but if you do not have the > bandwidth for this cycle that's ok too, just let me know and I'll go > ahead with the release without waiting. Thanks! Hi Luca, We will report our results hopefully by Monday. Apologies for the delay. Thanks, Ali
Re: 22.11.6 patches review and test
On Thu, 15 Aug 2024 at 17:19, Ali Alnubani wrote: > > > -Original Message- > > From: Luca Boccassi > > Sent: Thursday, August 15, 2024 2:11 PM > > To: sta...@dpdk.org > > Cc: dev@dpdk.org; Ali Alnubani ; John McNamara > > ; Raslan Darawsheh ; NBU- > > Contact-Thomas Monjalon (EXTERNAL) > > Subject: Re: 22.11.6 patches review and test > > > > On Wed, 31 Jul 2024 at 20:37, wrote: > > > > > > Hi all, > > > > > > Here is a list of patches targeted for stable release 22.11.6. > > > > > > The planned date for the final release is August 20th. > > > > > > Please help with testing and validation of your use cases and report > > > any issues/results with reply-all to this mail. For the final release > > > the fixes and reported validations will be added to the release notes. > > > > > > A release candidate tarball can be found at: > > > > > > https://dpdk.org/browse/dpdk-stable/tag/?id=v22.11.6-rc1 > > > > > > These patches are located at branch 22.11 of dpdk-stable repo: > > > https://dpdk.org/browse/dpdk-stable/ > > > > > > Thanks. > > > > > > Luca Boccassi > > > > Hi Ali, > > > > As the deadline is approaching, I wanted to double check whether > > NVIDIA is planning to run regression tests for 22.11.6? If you need > > more time it's fine to extend the deadline, but if you do not have the > > bandwidth for this cycle that's ok too, just let me know and I'll go > > ahead with the release without waiting. Thanks! > > Hi Luca, > > We will report our results hopefully by Monday. Apologies for the delay. > > Thanks, > Ali No problem at all, just wanted to check, thank you for the update
Re: [dpdk-dev] [PATCH v3 5/5] devtools: test different build types
On Sun, 8 Aug 2021 14:51:38 +0200 Thomas Monjalon wrote: > All builds were of type debugoptimized. > It is kept only for builds having an ABI check. > Others will have the default build type (release), > except if specified differently as in the x86 generic build > which will be a test of the non-optimized debug build type. > Some static builds will test the minsize build type. > > Signed-off-by: Thomas Monjalon > Acked-by: Andrew Rybchenko > > --- > > This patch cannot be merged now because it makes clang 11.1.0 crashing. > --- Dropping this patch from patchwork because of the clang crash.
DTS WG Meeting Minutes - August 15, 2024
# August 15, 2024 Attendees * Patrick Robb * Jeremy Spewock * Alex Chapman * Juraj Linkeš * Tomas Durovec * Dean Marx * Luca Vizzarro * Paul Szczepanek * Nicholas Pratte # Minutes = General Discussion * DTS Roadmap: https://docs.google.com/document/d/1Rcp1-gZWzGGCCSkbEsigrd0-NoQmknv6ZS7V2CPdgFo/edit * Will email out after this meeting * Speakers are all signed up for the CI and DTS talks at DPDK Summit = Patch discussions * Testpmd shell method names: should they align with existing testpmd runtime commands? I.e. should the “flow create” runtime command be implemented via a method named flow_create_*() or the more english intuitive create_flow_*() * One option is to implement both, and have one method call the other * This potentially creates confusion as people read different testsuites and see different functions used, not realizing they may be the same * The group agrees it is best to name methods in a human readable intuitive way… so like create_flow_*() from the example above. * Testpmd verbose parser * If we read port from testpmd to identify packets, they must have a tcp/udp layer, which may be limiting. If, for whatever reason, packets for a testsuite cannot be built with a l4, individual testsuites may have to check based on src mac address, checksum etc. * In almost all cases, packets can be build with a l4 * Checksum offload suite is submitted * Dependency on the existing testpmd verbose parser * RX side testcases work fine, but TX side behavior is not aligning with what is described in the testsuite, so feedback on this is appreciated * Checksum offload command * Csum set {layer name} hw {port number} * Returns sctp offload is not supported * TCP/UDP packets are working * Port assignment: * Physical ports are defined in the nodes conf section, then port ids are referred to in the testrun config * Also includes splitting the nodes and testrun configs into different files * Discussion on ticket regarding having a conf directory to contain these * Still some work to be done removing unneeded configuration from conf.yaml * VXLAN-GPE testsuite is now canceled as the feature is removed as of DPDK 24.07 * API Docs * Juraj needs reviews and testing * UNH people please rebuild the docs and provide your experience * Should specifically test meson install * Aim is to make it simple to use (and it is) * It builds with DPDK docs * L2fwd * Jeremy provided a review, more people at UNH please run this and provide feedback * When reviewing people should also review the dependency - add pktgen and testpmd change series * Tomas and Juraj have begun work on producing the testrun results json = Bugzilla discussions * None = Any other business * Next meeting Aug 29, 2024
Ethdev tracepoints optimization
Hi DPDK Community, I am currently working on developing performance analyses for applications using the ethdev library. These analyses are being implemented in Trace Compass, an open-source performance analyzer. One of the views I’ve implemented shows the rate of traffic received or sent by an ethernet port, measured in packets per second. However, I've encountered an issue with the lib.ethdev.rx.burst event, which triggers even when no packets are polled, leading to a significant number of irrelevant events in the trace. This becomes problematic as these "empty" events can overwhelm the tracer buffer, potentially causing the loss of more critical events due to their high frequency. To address this, I've modified the DPDK code in lib/ethdev/rte_ethdev.h to add a conditional statement that only triggers the event when nb_rx > 0. My question to the community is whether there are use cases where an "empty" lib.ethdev.rx.burst event could be useful. If not, would there be interest in submitting a patch with this modification? Moreover, I am looking to develop an analysis that calculates the throughput (in kb/s, mb/s, etc.) per NIC, utilizing the same events (i.e., lib.ethdev.rx.burst and lib.ethdev.tx.burst). These tracepoints do not provide packet size directly, only a pointer to the packet array. My attempt to use an eBPF program to iterate through that array to access the packet sizes was unsuccessful, as I found no method to export the computed data (e.g., via a custom tracepoint). Does anyone have suggestions or alternative approaches for achieving a throughput measurement? I would be grateful for any insights or suggestions you might have. Thank you! Adel
RE: [EXTERNAL] Re: [PATCH v5 1/1] examples/l2fwd-jobstats: fix lock availability
> -Original Message- > From: Stephen Hemminger > Sent: Sunday, August 11, 2024 9:47 PM > To: Rakesh Kudurumalla > Cc: ferruh.yi...@amd.com; andrew.rybche...@oktetlabs.ru; > or...@nvidia.com; tho...@monjalon.net; dev@dpdk.org; Jerin Jacob > ; Nithin Kumar Dabilpuram > ; sta...@dpdk.org > Subject: [EXTERNAL] Re: [PATCH v5 1/1] examples/l2fwd-jobstats: fix lock > availability > > On Sun, 11 Aug 2024 21: 29: 57 +0530 Rakesh Kudurumalla > wrote: > Race condition between jobstats > and time metrics > for forwarding and flushing is maintained using spinlock. > > Timer metrics are not displayed > On Sun, 11 Aug 2024 21:29:57 +0530 > Rakesh Kudurumalla wrote: > > > Race condition between jobstats and time metrics for forwarding and > > flushing is maintained using spinlock. > > Timer metrics are not displayed properly due to the frequent > > unavailability of the lock.This patch fixes the issue by introducing a > > delay before acquiring the lock in the loop. This delay allows for > > betteravailability of the lock, ensuring that show_lcore_stats() can > > periodically update the statistics even when forwarding jobs are > > running. > > > > Fixes: 204896f8d66c ("examples/l2fwd-jobstats: add new example") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Rakesh Kudurumalla > > Would be better if this code used RCU and not a lock Currently the jobstats app uses the lock only for collecting single snapshot of different statistics and printing the same from main core. With RCU since we cannot pause the worker core to collect such a single snapshot, integrating RCU would need a full redesign of the application and would take lot of effort.