On 2020-03-09 09:12, Jerin Jacob Kollanukkaran wrote: >> -----Original Message----- >> From: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >> Sent: Monday, March 9, 2020 1:28 PM >> To: Jerin Jacob Kollanukkaran <jer...@marvell.com> >> Cc: dev@dpdk.org; Stefan Sundkvist <stefan.sundkv...@ericsson.com>; >> ola.liljed...@arm.com >> Subject: [EXT] Re: [PATCH 5/8] event/dsw: avoid migration waves in large >> systems >> >> On 2020-03-09 08:17, Jerin Jacob Kollanukkaran wrote: >>>> -----Original Message----- >>>> From: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >>>> Sent: Monday, March 9, 2020 12:21 PM >>>> To: Jerin Jacob Kollanukkaran <jer...@marvell.com> >>>> Cc: dev@dpdk.org; stefan.sundkv...@ericsson.com; >>>> ola.liljed...@arm.com; Mattias Rönnblom >>>> <mattias.ronnb...@ericsson.com> >>>> Subject: [PATCH 5/8] event/dsw: avoid migration waves in large >>>> systems >>>> >>>> --------------------------------------------------------------------- >>>> - DSW limits the rate of migrations on a per-port basis. Hence, as >>>> the number of cores grows, so does the total migration capacity. >>>> >>>> In high core-count systems, this allows for a situation where flows >>>> are migrated to a lightly loaded port which recently already received >>>> a number of new flows (from other ports). The processing load >>>> generated by these new flows may not yet be reflected in the lightly >>>> loaded port's load estimate. The result is that the previously lightly >>>> loaded >> port is now overloaded. >>>> This patch adds a rough estimate of the size of the inbound >>>> migrations to a particular port, which can be factored into the >>>> migration logic, avoiding the above problem. >>>> >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >>>> --- >>>> @@ -491,6 +502,9 @@ dsw_select_emigration_target(struct dsw_evdev >> *dsw, >>>> target_qfs[*targets_len] = *candidate_qf; >>>> (*targets_len)++; >>>> >>>> + rte_atomic32_add(&dsw->ports[candidate_port_id].immigration_load, >>>> + candidate_flow_load); >>> These are the full barriers in arm64 and PowerPC. >>> Request to change the C11 mem model[1] with Load and acquire semantics >>> For better performance enhancement on non x86 machines. >>> >>> drivers/event/opdl is already moved to C11 mem model. >>> >>> [1] >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gcc.gnu.org_onlin >>> edocs_gcc_-5F005f-5F005fatomic- >> 2DBuiltins.html&d=DwIGaQ&c=nKjWec2b6R0m >> OyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=WWfY >> IvEKR8a >> _FuTltGFBbtERAKU1akjXuokLpv2zSz0&s=bEjlLRgN4LriVpVzwYcdgcTV39OI_MZY >> OG0 >>> QDhjmezw&e= >>> >> The performance impacts would be small, since this is in the slow path, with >> something like a handful of memory barrier per core per ms. > OK. If it is slow path, then yes, no point in changing. > > How about the other following uses in the DSW driver? Does it comes in > fastpath or slowpath? > > drivers/event/dsw/dsw_event.c: new_total_on_loan = > rte_atomic32_add_return(&dsw->credits_on_loan, > drivers/event/dsw/dsw_event.c: > rte_atomic32_sub(&dsw->credits_on_loan, acquired_credits); > drivers/event/dsw/dsw_event.c: > rte_atomic32_sub(&dsw->credits_on_loan, return_credits); > Technically still the slow path, but a path much more often taken. For producer- and consumer-only ports, it's once per 64 events (per port). For ports that do both, it's less often.
Sounds like a bug that rte_atomic32_sub() needs a full barrier. >> Arguably, it could be done for consistency reasons, but then you should >> change >> all DSW atomics. >> >>>> + >>>> return true;