https://en.cppreference.com/w/cpp/language/memory_model
<quote>
When an evaluation of an expression writes to a memory location and another 
evaluation reads or modifies the same memory location, the expressions are said 
to conflict. A program that has two conflicting evaluations has a data race 
unless

* both evaluations execute on the same thread or in the same signal handler, or
* both conflicting evaluations are atomic operations (see std::atomic), or
* one of the conflicting evaluations happens-before another (see 
std::memory_order)
If a data race occurs, the behavior of the program is undefined.
</quote>

C11 and C++11 have the same memory model (otherwise interoperability would be 
difficult).

Or as Jeff Preshing explains it:
"Any time two threads operate on a shared variable concurrently, and one of 
those operations performs a write, both threads must use atomic operations."
https://preshing.com/20130618/atomic-vs-non-atomic-operations/

So if ht->tail is written using e.g. __atomic_store_n(&ht->tail, val, mo), we 
need to also read it using e.g. __atomic_load_n().

-- Ola


On 05/10/2018, 13:15, "Ola Liljedahl" <ola.liljed...@arm.com> wrote:

    
    
    On 05/10/2018, 10:22, "Ananyev, Konstantin" <konstantin.anan...@intel.com> 
wrote:
    
        
        
        > -----Original Message-----
        > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Gavin Hu (Arm 
Technology China)
        > Sent: Friday, October 5, 2018 1:47 AM
        > To: Jerin Jacob <jerin.ja...@caviumnetworks.com>
        > Cc: dev@dpdk.org; Honnappa Nagarahalli 
<honnappa.nagaraha...@arm.com>; Steve Capper <steve.cap...@arm.com>; Ola 
Liljedahl
        > <ola.liljed...@arm.com>; nd <n...@arm.com>; sta...@dpdk.org
        > Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic 
load
        > 
        > Hi Jerin,
        > 
        > Thanks for your review, inline comments from our internal discussions.
        > 
        > BR. Gavin
        > 
        > > -----Original Message-----
        > > From: Jerin Jacob <jerin.ja...@caviumnetworks.com>
        > > Sent: Saturday, September 29, 2018 6:49 PM
        > > To: Gavin Hu (Arm Technology China) <gavin...@arm.com>
        > > Cc: dev@dpdk.org; Honnappa Nagarahalli
        > > <honnappa.nagaraha...@arm.com>; Steve Capper
        > > <steve.cap...@arm.com>; Ola Liljedahl <ola.liljed...@arm.com>; nd
        > > <n...@arm.com>; sta...@dpdk.org
        > > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load
        > >
        > > -----Original Message-----
        > > > Date: Mon, 17 Sep 2018 16:17:22 +0800
        > > > From: Gavin Hu <gavin...@arm.com>
        > > > To: dev@dpdk.org
        > > > CC: gavin...@arm.com, honnappa.nagaraha...@arm.com,
        > > > steve.cap...@arm.com,  ola.liljed...@arm.com,
        > > > jerin.ja...@caviumnetworks.com, n...@arm.com,  sta...@dpdk.org
        > > > Subject: [PATCH v3 1/3] ring: read tail using atomic load
        > > > X-Mailer: git-send-email 2.7.4
        > > >
        > > > External Email
        > > >
        > > > In update_tail, read ht->tail using __atomic_load.Although the
        > > > compiler currently seems to be doing the right thing even without
        > > > _atomic_load, we don't want to give the compiler freedom to 
optimise
        > > > what should be an atomic load, it should not be arbitarily moved
        > > > around.
        > > >
        > > > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier 
option")
        > > > Cc: sta...@dpdk.org
        > > >
        > > > Signed-off-by: Gavin Hu <gavin...@arm.com>
        > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
        > > > Reviewed-by: Steve Capper <steve.cap...@arm.com>
        > > > Reviewed-by: Ola Liljedahl <ola.liljed...@arm.com>
        > > > ---
        > > >  lib/librte_ring/rte_ring_c11_mem.h | 3 ++-
        > > >  1 file changed, 2 insertions(+), 1 deletion(-)
        > > > 
        > The read of ht->tail needs to be atomic, a non-atomic read would not 
be correct.
        
        That's a 32bit value load.
        AFAIK on all CPUs that we support it is an atomic operation.
    [Ola] But that the ordinary C load is translated to an atomic load for the 
target architecture is incidental.
    
    If the design requires an atomic load (which is the case here), we should 
use an atomic load on the language level. Then we can be sure it will always be 
translated to an atomic load for the target in question or compilation will 
fail. We don't have to depend on assumptions.
    
    
        
        > But there are no memory ordering requirements (with
        > regards to other loads and/or stores by this thread) so relaxed 
memory order is sufficient.
        > Another aspect of using __atomic_load_n() is that the compiler cannot 
"optimise" this load (e.g. combine, hoist etc), it has to be done as
        > specified in the source code which is also what we need here.
        
        I think Jerin points that rte_pause() acts here as compiler barrier too,
        so no need to worry that compiler would optimize out the loop.
    [Ola] Sorry missed that. But the barrier behaviour of rte_pause() is not 
part of C11, is it essentially a hand-made feature to support the legacy 
multithreaded memory model (which uses explicit HW and compiler barriers). I'd 
prefer code using the C11 memory model not to depend on such legacy features.
    
    
    
        Konstantin
        
        > 
        > One point worth mentioning though is that this change is for the 
rte_ring_c11_mem.h file, not the legacy ring. It may be worth persisting
        > with getting the C11 code right when people are less excited about 
sending a release out?
        > 
        > We can explain that for C11 we would prefer to do loads and stores as 
per the C11 memory model. In the case of rte_ring, the code is
        > separated cleanly into C11 specific files anyway.
        > 
        > I think reading ht->tail using __atomic_load_n() is the most 
appropriate way. We show that ht->tail is used for synchronization, we
        > acknowledge that ht->tail may be written by other threads without any 
other kind of synchronization (e.g. no lock involved) and we require
        > an atomic load (any write to ht->tail must also be atomic).
        > 
        > Using volatile and explicit compiler (or processor) memory barriers 
(fences) is the legacy pre-C11 way of accomplishing these things. There's
        > a reason why C11/C++11 moved away from the old ways.
        > > >
        > > >         __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
        > > > --
        > > > 2.7.4
        > > >
        
    
    

Reply via email to