On 4/23/2024 9:56 PM, Mattias Rönnblom wrote:
> On 2024-04-23 13:15, Ferruh Yigit wrote:
>> On 4/23/2024 10:08 AM, Mattias Rönnblom wrote:
>>> Cache align Rx and Tx queue struct to avoid false sharing.
>>>
>>> RX struct happens to be 64 bytes on x86_64 already, so cache alignment
>>> makes no change there, but it does on 32-bit ISAs.
>>>
>>> TX struct is 56 bytes on x86_64.
>>>
>>
>> Hi Mattias,
>>
>> No objection to the patch. Is the improvement theoretical or do you
>> measure any improvement practically, if so how much is the improvement?
>>
> 
> I didn't run any benchmarks.
> 
> Two cores storing to a (falsely) shared cache line on a per-packet basis
> is going to be very expensive, at least for "light touch" applications.
> 

ack
I expect for af_packet bottleneck is the kernel side, so I don't expect
any visible improvement practically, but OK to fix this theoretical issue.

>>> Both structs keep counters, and in the RX case they are updated even
>>> for empty polls.
>>>
>>
>> Do you think does it help if move 'rx_pkts' & 'rx_bytes' update within
>> the loop?
>>
> 
> No, why? Wouldn't that be worse? Especially since rx_pkts and rx_bytes
> are declared volatile, so you are forcing a load-modify-store cycle for
> every increment.
> 

My intention was to prevent updating stats in empty polls, I thought
stats will be hot in the cache but won't work with volatile.


> I would drop "volatile", or replace it with an atomic (although *not*
> use an atomic add for incrementing, but rather atomic load + <n>
> non-atomic adds + atomic store).
> 

As only single core polls from an Rx queue, I assume atomics is required
only for stats reset case. And for that case won't we need atomic add?

And do you know if there is a performance difference between atomic add
and keeping volatile qualifier?


>>> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
>>> ---
>>>   drivers/net/af_packet/rte_eth_af_packet.c | 5 +++--
>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
>>> b/drivers/net/af_packet/rte_eth_af_packet.c
>>> index 397a32db58..28aeb7d08e 100644
>>> --- a/drivers/net/af_packet/rte_eth_af_packet.c
>>> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
>>> @@ -6,6 +6,7 @@
>>>    * All rights reserved.
>>>    */
>>>   +#include <rte_common.h>
>>>   #include <rte_string_fns.h>
>>>   #include <rte_mbuf.h>
>>>   #include <ethdev_driver.h>
>>> @@ -53,7 +54,7 @@ struct pkt_rx_queue {
>>>         volatile unsigned long rx_pkts;
>>>       volatile unsigned long rx_bytes;
>>> -};
>>> +} __rte_cache_aligned;
>>>   
>>
>> Latest location for '__rte_cache_aligned' tag is at the beginning of the
>> struct [1], so something like:
>> `struct __rte_cache_aligned pkt_rx_queue {`
>>
>> [1]
>> https://patchwork.dpdk.org/project/dpdk/list/?series=31746&state=%2A&archive=both

Reply via email to