On 4/24/2024 11:28 AM, Bruce Richardson wrote:
> On Wed, Apr 24, 2024 at 11:21:52AM +0100, Ferruh Yigit wrote:
>> On 4/23/2024 9:56 PM, Mattias Rönnblom wrote:
>>> On 2024-04-23 13:15, Ferruh Yigit wrote:
>>>> On 4/23/2024 10:08 AM, Mattias Rönnblom wrote:
>>>>> Cache align Rx and Tx queue struct to avoid false sharing.
>>>>>
>>>>> RX struct happens to be 64 bytes on x86_64 already, so cache
>>>>> alignment makes no change there, but it does on 32-bit ISAs.
>>>>>
>>>>> TX struct is 56 bytes on x86_64.
>>>>>
>>>>
>>>> Hi Mattias,
>>>>
>>>> No objection to the patch. Is the improvement theoretical or do you
>>>> measure any improvement practically, if so how much is the
>>>> improvement?
>>>>
>>>
>>> I didn't run any benchmarks.
>>>
>>> Two cores storing to a (falsely) shared cache line on a per-packet
>>> basis is going to be very expensive, at least for "light touch"
>>> applications.
>>>
>>
>> ack I expect for af_packet bottleneck is the kernel side, so I don't
>> expect any visible improvement practically, but OK to fix this
>> theoretical issue.
>>
>>>>> Both structs keep counters, and in the RX case they are updated even
>>>>> for empty polls.
>>>>>
>>>>
>>>> Do you think does it help if move 'rx_pkts' & 'rx_bytes' update within
>>>> the loop?
>>>>
>>>
>>> No, why? Wouldn't that be worse? Especially since rx_pkts and rx_bytes
>>> are declared volatile, so you are forcing a load-modify-store cycle for
>>> every increment.
>>>
>>
>> My intention was to prevent updating stats in empty polls, I thought
>> stats will be hot in the cache but won't work with volatile.
>>
> Yes, it will. Volatile only prevents caching in registers, it does not
> affect the storing of data within the cache hierarchy. Reads/writes of
> stats on empty polls should indeed hit the L1 as expected. However, that is
> still less efficient than just doing a register increment which could
> theoretically be the result without the volatile.
> 
> 

Thanks Bruce for clarification.

Reply via email to