On 2023-10-05 13:51, Bruce Richardson wrote:
The event structure in DPDK is 16-bytes in size, and events are
regularly passed as parameters directly rather than being passed as
pointers.

When are events passed by-value, rather than by-reference? There are no such examples in the public eventdev API.

To help compiler optimize correctly, we can explicitly request
16-byte alignment for events, which means that we should be able
to do aligned vector loads/stores (e.g. with SSE or Neon) when working
with those events.


That change is both helping and sabotaging the optimizer's work. Now every stack allocation needs to be 2-byte aligned - in DPDK code, and in the application.

The effect this change has on an eventdev app using DSW is a ~3 cycle/event performance degradation on an AMD Zen 3 system, and a ~4 cycle/event performance degradation on a Skylake-generation Intel CPU.

What scenarios do you have in mind, where this change would improve the generated code? Something where there are no unaligned loads available in the ISA, or they are much slower than their aligned counterparts?

When I looked into the same issue for the DPDK IP checksumming routines, there basically were no such. Not that I could find.

Signed-off-by: Bruce Richardson <bruce.richard...@intel.com>
---
  lib/eventdev/rte_eventdev.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index 2ba8a7b090..bb0d59b059 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -1344,7 +1344,7 @@ struct rte_event {
                struct rte_event_vector *vec;
                /**< Event vector pointer. */
        };
-};
+} __rte_aligned(16);
/* Ethdev Rx adapter capability bitmap flags */
  #define RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT    0x1

Reply via email to