On 05/10/2018, 10:22, "Ananyev, Konstantin" <[email protected]>
wrote:
> -----Original Message-----
> From: dev [mailto:[email protected]] On Behalf Of Gavin Hu (Arm
Technology China)
> Sent: Friday, October 5, 2018 1:47 AM
> To: Jerin Jacob <[email protected]>
> Cc: [email protected]; Honnappa Nagarahalli <[email protected]>;
Steve Capper <[email protected]>; Ola Liljedahl
> <[email protected]>; nd <[email protected]>; [email protected]
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load
>
> Hi Jerin,
>
> Thanks for your review, inline comments from our internal discussions.
>
> BR. Gavin
>
> > -----Original Message-----
> > From: Jerin Jacob <[email protected]>
> > Sent: Saturday, September 29, 2018 6:49 PM
> > To: Gavin Hu (Arm Technology China) <[email protected]>
> > Cc: [email protected]; Honnappa Nagarahalli
> > <[email protected]>; Steve Capper
> > <[email protected]>; Ola Liljedahl <[email protected]>; nd
> > <[email protected]>; [email protected]
> > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load
> >
> > -----Original Message-----
> > > Date: Mon, 17 Sep 2018 16:17:22 +0800
> > > From: Gavin Hu <[email protected]>
> > > To: [email protected]
> > > CC: [email protected], [email protected],
> > > [email protected], [email protected],
> > > [email protected], [email protected], [email protected]
> > > Subject: [PATCH v3 1/3] ring: read tail using atomic load
> > > X-Mailer: git-send-email 2.7.4
> > >
> > > External Email
> > >
> > > In update_tail, read ht->tail using __atomic_load.Although the
> > > compiler currently seems to be doing the right thing even without
> > > _atomic_load, we don't want to give the compiler freedom to optimise
> > > what should be an atomic load, it should not be arbitarily moved
> > > around.
> > >
> > > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
> > > Cc: [email protected]
> > >
> > > Signed-off-by: Gavin Hu <[email protected]>
> > > Reviewed-by: Honnappa Nagarahalli <[email protected]>
> > > Reviewed-by: Steve Capper <[email protected]>
> > > Reviewed-by: Ola Liljedahl <[email protected]>
> > > ---
> > > lib/librte_ring/rte_ring_c11_mem.h | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> The read of ht->tail needs to be atomic, a non-atomic read would not be
correct.
That's a 32bit value load.
AFAIK on all CPUs that we support it is an atomic operation.
[Ola] But that the ordinary C load is translated to an atomic load for the
target architecture is incidental.
If the design requires an atomic load (which is the case here), we should use
an atomic load on the language level. Then we can be sure it will always be
translated to an atomic load for the target in question or compilation will
fail. We don't have to depend on assumptions.
> But there are no memory ordering requirements (with
> regards to other loads and/or stores by this thread) so relaxed memory
order is sufficient.
> Another aspect of using __atomic_load_n() is that the compiler cannot
"optimise" this load (e.g. combine, hoist etc), it has to be done as
> specified in the source code which is also what we need here.
I think Jerin points that rte_pause() acts here as compiler barrier too,
so no need to worry that compiler would optimize out the loop.
[Ola] Sorry missed that. But the barrier behaviour of rte_pause() is not part
of C11, is it essentially a hand-made feature to support the legacy
multithreaded memory model (which uses explicit HW and compiler barriers). I'd
prefer code using the C11 memory model not to depend on such legacy features.
Konstantin
>
> One point worth mentioning though is that this change is for the
rte_ring_c11_mem.h file, not the legacy ring. It may be worth persisting
> with getting the C11 code right when people are less excited about
sending a release out?
>
> We can explain that for C11 we would prefer to do loads and stores as per
the C11 memory model. In the case of rte_ring, the code is
> separated cleanly into C11 specific files anyway.
>
> I think reading ht->tail using __atomic_load_n() is the most appropriate
way. We show that ht->tail is used for synchronization, we
> acknowledge that ht->tail may be written by other threads without any
other kind of synchronization (e.g. no lock involved) and we require
> an atomic load (any write to ht->tail must also be atomic).
>
> Using volatile and explicit compiler (or processor) memory barriers
(fences) is the legacy pre-C11 way of accomplishing these things. There's
> a reason why C11/C++11 moved away from the old ways.
> > >
> > > __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> > > --
> > > 2.7.4
> > >