13/01/2023 13:11, Thomas Monjalon пишет:
13/01/2023 13:39, Amit Prakash Shukla:
From: Thomas Monjalon <tho...@monjalon.net>
23/08/2022 11:38, Amit Prakash Shukla:
From: Konstantin Ananyev <konstantin.v.anan...@yandex.ru>
06/08/2022 19:35, Honnappa Nagarahalli пишет:
Replacing memcpy with rte_memcpy fixes the GCC-12 compilation
issue.

Any reason why this replacement fixes the problem?
Do you have any performance numbers with this change?

Also it would be better to change to rte_memcpy as the function
is called in fastpath.

On Arm platforms, memcpy in the later versions has the best
performance.

I agree with Honnappa, it is better to keep memcpy() here.
Actually what is strange - why it ends up in
__rte_ring_dequeue_elems_128() at all?
Inside rxa_intr_ring_dequeue() we clearly doing: rte_ring_dequeue(),
which should boil down to ___rte_ring_dequeue_elems_64().
it should go to __rte_ring_dequeue_elems_128() at all.

I agree. After having close look and doing few experiments, ideally it
should not be going to __rte_ring_dequeue_elems_128().
Sizeof(in call of rte_ring_enqueue_elem) gets evaluated at compile
time which in this case it is evaluated to 8 bytes so
__rte_ring_dequeue_elems_128() shall not be in the path. Looks like more
of a gcc-12 bug.?

Another q - is this warning happens only on arm platforms?

Warning is observed on x86 with build type as debug.
"meson --werror --buildtype=debug build"

I confirm the compilation issue on x86 with GCC 12 in a debug build.

We need to find a workaround.
Is it reported to GCC already?

I found an old gcc bug reporting similar issue. This bug seems to be re-opened 
recently in Dec-2022. Not sure if it is reopened specifically for gcc-12.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89689

Please would you like to open a bug specific to GCC 12?

Kevin has push a work around for DPDK-21.11.3.
https://git.dpdk.org/dpdk-stable/commit/?h=21.11&id=e1d728588dc73af9ed60cc0074d51a7f24b2ba60

In the meantime we could use Kevin's workaround:

#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 120000)
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wstringop-overflow"
#pragma GCC diagnostic ignored "-Wstringop-overread"
#endif

Opinions?



Yep, disable warnings should work.
Anoter way to consider - change enqueue/dequeue_elems_128()
functions to not use memcpy() at all.
Instead of that they can copy 2*num 64-bit entities directly,
same as _64_ versions do.
Something like the patch below.
That's pretty similar to what Amit initially proposed,
but without rte_memcpy() involvement.
Performance-wise I don't expect noticeable difference with
what we have right now.
But sure, we'll need to do extra checks here.


diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h
index 83788c56e6..de79040618 100644
--- a/lib/ring/rte_ring_elem_pvt.h
+++ b/lib/ring/rte_ring_elem_pvt.h
@@ -93,25 +93,32 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
        unsigned int i;
        const uint32_t size = r->size;
        uint32_t idx = prod_head & r->mask;
-       rte_int128_t *ring = (rte_int128_t *)&r[1];
-       const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+       uint64_t *ring = (uint64_t *)&r[1];
+       const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
        if (likely(idx + n <= size)) {
-               for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
-                       memcpy((void *)(ring + idx),
-                               (const void *)(obj + i), 32);
+               idx *= 2;
+               for (i = 0; i < 2 * (n & ~0x1); i += 4, idx += 4) {
+                       ring[idx] = obj[i];
+                       ring[idx + 1] = obj[i + 1];
+                       ring[idx + 2] = obj[i + 2];
+                       ring[idx + 3] = obj[i + 3];
+               }
                switch (n & 0x1) {
                case 1:
-                       memcpy((void *)(ring + idx),
-                               (const void *)(obj + i), 16);
+                       ring[idx] = obj[i];
+                       ring[idx + 1] = obj[i + 1];
                }
        } else {
-               for (i = 0; idx < size; i++, idx++)
-                       memcpy((void *)(ring + idx),
-                               (const void *)(obj + i), 16);
+               idx *= 2;
+               for (i = 0; idx < 2 * size; i += 2, idx += 2) {
+                       ring[idx] = obj[i];
+                       ring[idx + 1] = obj[i + 1];
+               }
                /* Start at the beginning */
-               for (idx = 0; i < n; i++, idx++)
-                       memcpy((void *)(ring + idx),
-                               (const void *)(obj + i), 16);
+               for (idx = 0; i < 2 * n; i += 2, idx += 2) {
+                       ring[idx] = obj[i];
+                       ring[idx + 1] = obj[i + 1];
+               }
        }
 }

@@ -227,21 +234,32 @@ __rte_ring_dequeue_elems_128(struct rte_ring *r, uint32_t prod_head,
        unsigned int i;
        const uint32_t size = r->size;
        uint32_t idx = prod_head & r->mask;
-       rte_int128_t *ring = (rte_int128_t *)&r[1];
-       rte_int128_t *obj = (rte_int128_t *)obj_table;
+       uint64_t *ring = (uint64_t *)&r[1];
+       unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
        if (likely(idx + n <= size)) {
-               for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
-                       memcpy((void *)(obj + i), (void *)(ring + idx), 32);
+               idx *= 2;
+               for (i = 0; i < 2 * (n & ~0x1); i += 4, idx += 4) {
+                       obj[i] = ring[idx];
+                       obj[i + 1] = ring[idx + 1];
+                       obj[i + 2] = ring[idx + 2];
+                       obj[i + 3] = ring[idx + 3];
+               }
                switch (n & 0x1) {
                case 1:
-                       memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+                       obj[i] = ring[idx];
+                       obj[i + 1] = ring[idx + 1];
                }
        } else {
-               for (i = 0; idx < size; i++, idx++)
-                       memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+               idx *= 2;
+               for (i = 0; idx < 2 * size; i += 2, idx += 2) {
+                       obj[i] = ring[idx];
+                       obj[i + 1] = ring[idx + 1];
+               }
                /* Start at the beginning */
-               for (idx = 0; i < n; i++, idx++)
-                       memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+               for (idx = 0; i < 2 * n; i += 2, idx += 2) {
+                       obj[i] = ring[idx];
+                       obj[i + 1] = ring[idx + 1];
+               }
        }
 }




Reply via email to