Hi Oliver, >> So with the following fragment of code: >> extern int *x; >> extern __128i a, *p; >> L0: >> _mm_stream_si128( p, a); >> rte_compiler_barrier(); >> L1: >> *x = 0; >> >> There is no guarantee that store at L0 will always be finished >> before store at L1.
>This code fragment looks very similar to what is done in >__rte_ring_sp_do_enqueue(): > > [...] > ENQUEUE_PTRS(); /* I expect it is converted to an SSE store */ > rte_compiler_barrier(); > [...] > r->prod.tail = prod_next; >So, according to your previous explanation, I understand that >this code would require a write memory barrier in place of the >compiler barrier. Am I wrong? No, right now compiler barrier is enough here. ENQUEUE_PTRS() doesn't use Non-Temporal stores (MOVNT*), so write order should be guaranteed. Though, if in future we'll change ENQUEUE_PTRS() to use non-tempral stores, we'll have to use sfence(or mfence). >Moreover, if I understand well, a real wmb() is needed only if >a SSE store is issued. But the programmer may not control that, >it's the job of the compiler. 'Normal' SIMD writes are not reordered. So it is ok for the compiler to use them if appropriate. > > But now, there seems a confusion: everyone has to remember that >> smp_mb() and smp_wmb() are 'real' fences, while smp_rmb() is not. >> That's why my suggestion was to simply keep using compiler_barrier() >> for all cases, when we don't need real fence. >I'm not sure the programmer has to know which smp_*mb() is a real fence >or not. He just expects that it generates the proper CPU instructions >that guarantees the effectiveness of the memory barrier. In most cases just a compiler barrier is enough, but there are few exceptions. Always using fence instructions - means introduce unnecessary slowdown for cases, when order is guaranteed. No using fences in cases, when they are needed - means introduce race window and possible data corruption. That's why right now people can use either rte_compiler_barrier() or mb/rmb/wmb - whatever is appropriate for particular case. Konstantin