-- Damjan
On 7 May 2019, at 13:29, Honnappa Nagarahalli <honnappa.nagaraha...@arm.com<mailto:honnappa.nagaraha...@arm.com>> wrote: On 3/22/2019 11:57 AM, Jakub Grajciar wrote: Memory interface (memif), provides high performance packet transfer over shared memory. Signed-off-by: Jakub Grajciar <jgraj...@cisco.com<mailto:jgraj...@cisco.com>> <...> With that in mind, I believe that 23Mpps is fine performance. No performance target is defined, the goal is to be as fast as possible. Use of C11 atomics have proven to provide better performance on weakly ordered architectures (at least on Arm). IMO, C11 atomics should be used to implement the fast path functions at least. This ensures optimal performance on all supported architectures in DPDK. Atomics are not required by memif driver. Correct, only thing we need is store barrier once per batch of packets, to make sure that descriptor changes are globally visible before we bump head pointer. May be I was not clear in my comments, I meant that the use of GCC C++11 memory model aware atomic operations [1] show better performance. So, instead of using full memory barriers you can use store-release and load-acquire semantics. A similar change was done to svm_fifo data structure in VPP [2] (though the original algorithm used was different from the one used in this memif patch). [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html [2] https://gerrit.fd.io/r/#/c/18223/ Typically we have hundreds of normal memory stores into the packet ring, then single store fence and then finally one more store to bump head pointer. Sorry. I’m not getting what are you suggesting here, and how that can be faster... [Honnappa] I will get back with a patch and we can go from there