> > > On 07/10/2018, 06:03, "Jerin Jacob" <jerin.ja...@caviumnetworks.com> > wrote: > > > > > > In arm64 case, it will have ATOMIC_RELAXED followed by asm > > volatile > ("":::"memory") of rte_pause(). > > > I would n't have any issue, if the generated code code is same or > better than the exiting case. but it not the case, Right? > > > The existing case is actually not interesting (IMO) as it exposes > undefined behaviour which allows the compiler to do anything. But you seem > to be satisfied with "works for me, right here right now". I think the cost of > avoiding undefined behaviour is acceptable (actually I don't think it even > will > be noticeable). > > > > I am not convinced because of use of volatile in head and tail indexes. > > For me that brings the defined behavior. > > As long as you don't mix in C11 atomic accesses (just use "plain" > > accesses to volatile objects), it is AFAIK defined behaviour (but not > > necessarily using atomic loads and stores). But I quoted the C11 spec > > where it explicitly mentions that mixing atomic and non-atomic accesses to > the same object is undefined behaviour. Don't argue with me, argue with the > C11 spec. > > If you want to disobey the spec, this should at least be called out for in > > the > code with a comment. > > That's boils down only one question, should we follow C11 spec? Why not > only take load acquire and store release semantics only just like Linux kernel > and FreeBSD. > Does not look like C11 memory model is super efficient in term of gcc > implementation. > > > > > > > That the reason why I shared > > the generated assembly code. If you think other way, Pick any compiler > > and see generated output. > > This is what one compiler for one architecture generates today. These > > things change. Other things that used to work or worked for some > > specific architecture has stopped working in newer versions of the compiler. > > > > > > And > > > > Freebsd implementation of ring buffer(Which DPDK derived from), Don't > have > > such logic, See > > https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L108 > > It looks like FreeBSD uses some kind of C11 atomic memory > > model-inspired API although I don't see exactly how e.g. > > atomic_store_rel_int() is implemented. The code also mixes in explicit > > barriers so definitively not pure C11 memory model usage. And finally, > > it doesn't establish the proper load-acquire/store-release relationships > > (e.g. > store-release cons_tail requires a load-acquire cons_tail, same for > prod_tail). > > > > "* multi-producer safe lock-free ring buffer enqueue" > > The comment is also wrong. This design is not lock-free, how could it > > be when there is spinning > > (waiting) for other threads in the code? If a thread must wait for > > other threads, then by definition the design is blocking. > > > > So you are saying that because FreeBSD is doing it wrong, DPDK can also do > it wrong? > > > > > > See below too. > > > > > > > > Skipping the compiler memory barrier in rte_pause() potentially allows > for optimisations that provide much more benefit, e.g. hiding some cache > miss latency for later loads. The DPDK ring buffer implementation is defined > so to enable inlining of enqueue/dequeue functions into the caller, any code > could immediately follow these calls. > > > > > > From INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x > > > Programming languages — C > > > > > > 5.1.2.4 > > > 4 Two expression evaluations conflict if one of them modifies a memory > location and the other one reads or modifies the same memory location. > > > > > > 25 The execution of a program contains a data race if it contains two > conflicting actions in different threads, at least one of which is not atomic, > and neither happens before the other. Any such data race results in undefined > behavior. > > > > IMO, Both condition will satisfy if the variable is volatile and 32bit > > read > will atomic > > for 32b and 64b machines. If not, the problem persist for generic case > > as well(lib/librte_ring/rte_ring_generic.h) > > The read from a volatile object is not an atomic access per the C11 > > spec. It just happens to be translated to an instruction (on x86-64 and > AArch64/A64) that implements an atomic load. > > I don't think any compiler would change this code generation and > > suddenly generate some non-atomic load instruction for a program that > *only* uses volatile to do "atomic" accesses. > > But a future compiler could detect the mix of atomic and non-atomic > > accesses and mark this expression as causing undefined behaviour and that > would have consequences for code generation. > > > > > > I agree with you on C11 memory model semantics usage. The reason why > I > > propose name for the file as rte_ring_c11_mem.h as DPDK it self did not > > had definitions for load acquire and store release semantics. > > I was looking for taking load acquire and store release semantics > > from C11 instead of creating new API like Linux kernel for FreeBSD(APIs > > like atomic_load_acq_32(), atomic_store_rel_32()). If the file name is > your > > concern then we could create new abstractions as well. That would help > > exiting KNI problem as well. We tried this in KNI. Creating these abstractions with optimal performance is not possible as release/acquire semantics are one-way barriers. We will end up using full memory-barriers.
> > I appreciate your embrace of the C11 memory model. I think it is > > better for describing (both to the compiler and to humans) which and how > objects are used for synchronisation. > > > > However, I don't think an API as you suggest (and others have > > suggested before, e.g. as done in ODP) is a good idea. There is an > > infinite amount of possible base types, an increasing number of operations > and a bunch of different memory orderings, a "complete" > > API would be very large and difficult to test, and most members of the API > would never be used. > > GCC and Clang both support the __atomic intrinsics. This API avoids > > the problems I described above. Or we could use the official C11 > > syntax (stdatomic.h). But then we have the problem with using pre-C11 > compilers... > > I have no objection, if everyone agrees to move C11 memory model with > __atomic intrinsics. But if we need to keep both have then > atomic_load_acq_32() kind of API make sense. > > > > > > > > > > > > I think, currently it mixed usage because, the same variable declaration > > used for C11 vs non C11 usage.Ideally we wont need "volatile" for C11 > > case. Either we need to change only to C11 mode OR have APIs for > > atomic_load_acq_() and atomic_store_rel_() to allow both models like > > Linux kernel and FreeBSD. > > > > > > > > -- Ola > > > > > > > > > > > > >