Hi Jerin, > > > > > > > The CPU also > > > knows already the value that will be written to cons.tail and that > > > value does not depend on the previous read either. The CPU does not know > > > we are planning to do a spinlock there, so it might do things > out-of-order without proper dependencies. > > > > > > > For __rte_ring_sc_do_dequeue(), I think you right, we might need > > > > something stronger. > > > > I don't want to put rte_smp_mb() here as it would cause full HW > > > > barrier even on machines with strong memory order (IA). > > > > I think that rte_smp_wmb() might be enough here: > > > > it would force cpu to wait till writes in DEQUEUE_PTRS() are > > > > become visible, which means reads have to be completed too. > > > > > > In practice I think that rte_smp_wmb() would work fine, even though > > > it is not strictly according to the book. Below solution would be my > > > proposal as a fix to the issue of sc dequeueing (and also to mc > > > dequeueing, if we have the problem of CPU completely ignoring the > spinlock in reality there): > > > > > > DEQUEUE_PTRS(); > > > .. > > > rte_smp_wmb(); > > > r->cons.tail = cons_next; > > > > As I said in previous email - it looks good for me for > > _rte_ring_sc_do_dequeue(), but I am interested to hear what ARM and PPC > > maintainers think about it. > > Jan, Jerin do you have any comments on it? > > Actually it is NOT performance effective and difficult to capture the ORDER > dependency with plane store and load barriers on WEAK > ordered machines. > Beyond plane store and load barriers, We need to express #LoadLoad, > #LoadStore,#StoreStore barrier dependency with Acquire and > Release Semantics in Arch neutral code(Looks like this is compiler barrier on > IA) http://preshing.com/20120913/acquire-and-release- > semantics/ > > For instance, Full barrier CAS(__sync_bool_compare_and_swap) will not be > required for weak ordered machine in MP case. > I can send out a RFC version of ring implementation changes required with > acquire-and-release-semantics. > If it has performance degradation on IA then we can separate it out through > conditional compilation flag. > > GCC Built-in Functions for Memory Model Aware Atomic Operations > https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
I am not sure what exactly changes you are planning, but I suppose I'd just wait for your RFC here. Though my question was: what do you think about current _rte_ring_sc_do_dequeue()? Do you agree that rmb() is not sufficient here and does Juhamatti patch: http://dpdk.org/dev/patchwork/patch/14846/ looks good to you? It looks good to me ,and I am going to ACK it, but thought you'd better have a look too. Thanks Konstantin > > Thoughts ? > > Jerin > > > Chao, sorry but I still not sure why PPC is considered as architecture with > > strong memory ordering? > > Might be I am missing something obvious here. > > Thank > > Konstantin > >