Acked-by: Nithin Dabilpuram <ndabilpu...@marvell.com>
On Wed, Mar 27, 2024 at 2:47 PM Robin Jarry <rja...@redhat.com> wrote: > > In some cases, the node context data is used to store two pointers > because the data is larger than the reserved 16 bytes. Having to define > intermediate structures just to be able to cast is tedious. And without > intermediate structures, casting to opaque pointers is hard without > violating strict aliasing rules. > > Add an unnamed union to allow storing opaque pointers in the node > context. Unfortunately, aligning an unnamed union that contains an array > produces inconsistent results between C and C++. To preserve ABI/API > compatibility in both C and C++, move all fast-path area fields into an > unnamed struct which is cache aligned. Use __rte_cache_min_aligned to > preserve existing alignment on architectures where cache lines are 128 > bytes. > > Add a static assert to ensure that the unnamed union is not larger than > the context array (RTE_NODE_CTX_SZ). > > Signed-off-by: Robin Jarry <rja...@redhat.com> > --- > > Notes: > v5: > > * Helper functions to hide casting proved to be harder than expected. > Naive casting may even be impossible without breaking strict aliasing > rules. The only other option would be to use explicit memcpy calls. > * Unnamed union tentative again. As suggested by Tyler (thank you!), > using an intermediate unnamed struct to carry the alignment produces > consistent ABI in C and C++. > * Also, Tyler (thank you!) suggested that the fast path area alignment > size may be incorrect for architectures where the cache line is not 64 > bytes. There will be a 64 bytes hole in the structure at the end of > the unnamed struct before the zero length next nodes array. Use > __rte_cache_min_aligned to preserve existing alignment. > > v4: > > * Replaced the unnamed union with helper inline functions. > > v3: > > * Added __extension__ to the unnamed struct inside the union. > * Fixed C++ header checks. > * Replaced alignas() with an explicit static_assert. > > lib/graph/rte_graph_worker_common.h | 27 ++++++++++++++++++++------- > 1 file changed, 20 insertions(+), 7 deletions(-) > > diff --git a/lib/graph/rte_graph_worker_common.h > b/lib/graph/rte_graph_worker_common.h > index 36d864e2c14e..84d4997bbbf6 100644 > --- a/lib/graph/rte_graph_worker_common.h > +++ b/lib/graph/rte_graph_worker_common.h > @@ -12,7 +12,9 @@ > * process, enqueue and move streams of objects to the next nodes. > */ > > +#include <assert.h> > #include <stdalign.h> > +#include <stddef.h> > > #include <rte_common.h> > #include <rte_cycles.h> > @@ -111,14 +113,21 @@ struct __rte_cache_aligned rte_node { > } dispatch; > }; > /* Fast path area */ > + __extension__ struct __rte_cache_min_aligned { > #define RTE_NODE_CTX_SZ 16 > - alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**< Node > Context. */ > - uint16_t size; /**< Total number of objects available. */ > - uint16_t idx; /**< Number of objects used. */ > - rte_graph_off_t off; /**< Offset of node in the graph reel. */ > - uint64_t total_cycles; /**< Cycles spent in this node. */ > - uint64_t total_calls; /**< Calls done to this node. */ > - uint64_t total_objs; /**< Objects processed by this node. */ > + union { > + uint8_t ctx[RTE_NODE_CTX_SZ]; > + __extension__ struct { > + void *ctx_ptr; > + void *ctx_ptr2; > + }; > + }; /**< Node Context. */ > + uint16_t size; /**< Total number of objects > available. */ > + uint16_t idx; /**< Number of objects used. */ > + rte_graph_off_t off; /**< Offset of node in the graph > reel. */ > + uint64_t total_cycles; /**< Cycles spent in this node. */ > + uint64_t total_calls; /**< Calls done to this node. */ > + uint64_t total_objs; /**< Objects processed by this node. > */ > union { > void **objs; /**< Array of object pointers. */ > uint64_t objs_u64; > @@ -127,9 +136,13 @@ struct __rte_cache_aligned rte_node { > rte_node_process_t process; /**< Process function. */ > uint64_t process_u64; > }; > + }; > alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next > nodes. */ > }; > > +static_assert(offsetof(struct rte_node, size) - offsetof(struct rte_node, > ctx) == RTE_NODE_CTX_SZ, > + "rte_node context must be RTE_NODE_CTX_SZ bytes exactly"); > + > /** > * @internal > * > -- > 2.44.0 >