> On Apr 23, 2024, at 3:56 PM, Mattias Rönnblom <hof...@lysator.liu.se> wrote: > > On 2024-04-23 13:15, Ferruh Yigit wrote: >> On 4/23/2024 10:08 AM, Mattias Rönnblom wrote: >>> Cache align Rx and Tx queue struct to avoid false sharing. >>> >>> RX struct happens to be 64 bytes on x86_64 already, so cache alignment >>> makes no change there, but it does on 32-bit ISAs. >>> >>> TX struct is 56 bytes on x86_64. >>> >> Hi Mattias, >> No objection to the patch. Is the improvement theoretical or do you >> measure any improvement practically, if so how much is the improvement? > > I didn't run any benchmarks. > > Two cores storing to a (falsely) shared cache line on a per-packet basis is > going to be very expensive, at least for "light touch" applications. > >>> Both structs keep counters, and in the RX case they are updated even >>> for empty polls. >>> >> Do you think does it help if move 'rx_pkts' & 'rx_bytes' update within >> the loop? > > No, why? Wouldn't that be worse? Especially since rx_pkts and rx_bytes are > declared volatile, so you are forcing a load-modify-store cycle for every > increment. > > I would drop "volatile", or replace it with an atomic (although *not* use an > atomic add for incrementing, but rather atomic load + <n> non-atomic adds + > atomic store). (Slightly unrelated discussion) Does the atomic load + increment + atomic store help in a non-contended case like this? Some platforms have optimizations for atomic-increments as well which would be missed.
> >>> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >>> --- >>> drivers/net/af_packet/rte_eth_af_packet.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c >>> b/drivers/net/af_packet/rte_eth_af_packet.c >>> index 397a32db58..28aeb7d08e 100644 >>> --- a/drivers/net/af_packet/rte_eth_af_packet.c >>> +++ b/drivers/net/af_packet/rte_eth_af_packet.c >>> @@ -6,6 +6,7 @@ >>> * All rights reserved. >>> */ >>> +#include <rte_common.h> >>> #include <rte_string_fns.h> >>> #include <rte_mbuf.h> >>> #include <ethdev_driver.h> >>> @@ -53,7 +54,7 @@ struct pkt_rx_queue { >>> volatile unsigned long rx_pkts; >>> volatile unsigned long rx_bytes; >>> -}; >>> +} __rte_cache_aligned; >>> >> Latest location for '__rte_cache_aligned' tag is at the beginning of the >> struct [1], so something like: >> `struct __rte_cache_aligned pkt_rx_queue {` >> [1] >> https://patchwork.dpdk.org/project/dpdk/list/?series=31746&state=%2A&archive=both