On 2020-03-09 09:12, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
>> Sent: Monday, March 9, 2020 1:28 PM
>> To: Jerin Jacob Kollanukkaran <jer...@marvell.com>
>> Cc: dev@dpdk.org; Stefan Sundkvist <stefan.sundkv...@ericsson.com>;
>> ola.liljed...@arm.com
>> Subject: [EXT] Re: [PATCH 5/8] event/dsw: avoid migration waves in large
>> systems
>>
>> On 2020-03-09 08:17, Jerin Jacob Kollanukkaran wrote:
>>>> -----Original Message-----
>>>> From: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
>>>> Sent: Monday, March 9, 2020 12:21 PM
>>>> To: Jerin Jacob Kollanukkaran <jer...@marvell.com>
>>>> Cc: dev@dpdk.org; stefan.sundkv...@ericsson.com;
>>>> ola.liljed...@arm.com; Mattias Rönnblom
>>>> <mattias.ronnb...@ericsson.com>
>>>> Subject:  [PATCH 5/8] event/dsw: avoid migration waves in large
>>>> systems
>>>>
>>>> ---------------------------------------------------------------------
>>>> - DSW limits the rate of migrations on a per-port basis. Hence, as
>>>> the number of cores grows, so does the total migration capacity.
>>>>
>>>> In high core-count systems, this allows for a situation where flows
>>>> are migrated to a lightly loaded port which recently already received
>>>> a number of new flows (from other ports). The processing load
>>>> generated by these new flows may not yet be reflected in the lightly
>>>> loaded port's load estimate. The result is that the previously lightly 
>>>> loaded
>> port is now overloaded.
>>>> This patch adds a rough estimate of the size of the inbound
>>>> migrations to a particular port, which can be factored into the
>>>> migration logic, avoiding the above problem.
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
>>>> ---
>>>> @@ -491,6 +502,9 @@ dsw_select_emigration_target(struct dsw_evdev
>> *dsw,
>>>>            target_qfs[*targets_len] = *candidate_qf;
>>>>            (*targets_len)++;
>>>>
>>>> +  rte_atomic32_add(&dsw->ports[candidate_port_id].immigration_load,
>>>> +                   candidate_flow_load);
>>> These are the full barriers in arm64 and PowerPC.
>>> Request to change the C11 mem model[1] with Load and acquire semantics
>>> For better performance enhancement on non x86 machines.
>>>
>>> drivers/event/opdl is already moved to C11 mem model.
>>>
>>> [1]
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gcc.gnu.org_onlin
>>> edocs_gcc_-5F005f-5F005fatomic-
>> 2DBuiltins.html&d=DwIGaQ&c=nKjWec2b6R0m
>> OyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=WWfY
>> IvEKR8a
>> _FuTltGFBbtERAKU1akjXuokLpv2zSz0&s=bEjlLRgN4LriVpVzwYcdgcTV39OI_MZY
>> OG0
>>> QDhjmezw&e=
>>>
>> The performance impacts would be small, since this is in the slow path, with
>> something like a handful of memory barrier per core per ms.
> OK. If it is slow path, then yes, no point in changing.
>
> How about the other following uses in the DSW driver? Does it comes in 
> fastpath or slowpath?
>
> drivers/event/dsw/dsw_event.c:  new_total_on_loan = 
> rte_atomic32_add_return(&dsw->credits_on_loan,
> drivers/event/dsw/dsw_event.c:          
> rte_atomic32_sub(&dsw->credits_on_loan, acquired_credits);
> drivers/event/dsw/dsw_event.c:          
> rte_atomic32_sub(&dsw->credits_on_loan, return_credits);
>
Technically still the slow path, but a path much more often taken. For 
producer- and consumer-only ports, it's once per 64 events (per port). 
For ports that do both, it's less often.

Sounds like a bug that rte_atomic32_sub() needs a full barrier.

>> Arguably, it could be done for consistency reasons, but then you should 
>> change
>> all DSW atomics.
>>
>>>> +
>>>>            return true;


Reply via email to