On 04/15/2014 04:08 PM, Richardson, Bruce wrote: >> -----Original Message----- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of David Marchand >> Sent: Tuesday, April 15, 2014 2:51 PM >> To: dev at dpdk.org >> Subject: [dpdk-dev] [PATCH 1/2] mem: add write memory barrier before >> changing heap state >> >> From: Didier Pallard <didier.pallard at 6wind.com> >> >> a write memory barrier is needed before changing heap state value, else some >> concurrent core may see state changing before all initialization values are >> written to memory, causing unpredictable results in malloc function. >> >> Signed-off-by: Didier Pallard <didier.pallard at 6wind.com> > No barrier should be necessary here. As in a number of other places, such as > rings, compiler barriers can be used in place of write memory barriers, due > to IA ordering rules. However, in this case, both variables referenced are > volatile variables and so the assignments to them cannot be reordered by the > compiler so no compiler barrier is necessary either. > > Regards, > /Bruce
Hi bruce, Indeed a compiler barrier is absolutely needed here. volatile variable use is absolutely not a serializing instruction from compiler point of view; only atomic variable use is serializing, due to asm volatile (memory) directive use. Here is the assembler generated with and without rte_wmb: With rte_wmb 142: f0 45 0f b1 07 lock cmpxchg %r8d,(%r15) 147: 0f 94 c0 sete %al 14a: 84 c0 test %al,%al 14c: 74 ea je 138 <malloc_heap_alloc+0x68> 14e: 49 c7 47 10 00 00 00 movq $0x0,0x10(%r15) 155: 00 156: 41 c7 47 18 00 00 00 movl $0x0,0x18(%r15) 15d: 00 15e: 41 c7 47 08 00 00 00 movl $0x0,0x8(%r15) 165: 00 166: 41 c7 47 1c 00 00 00 movl $0x0,0x1c(%r15) 16d: 00 16e: 49 c7 47 20 00 00 00 movq $0x0,0x20(%r15) 175: 00 176: 45 89 57 04 mov %r10d,0x4(%r15) 17a: 0f ae f8 sfence * 17d: 41 c7 07 02 00 00 00 movl $0x2,(%r15)** ** 184: 41 8b 37 mov (%r15),%esi** * 187: 83 fe 02 cmp $0x2,%esi 18a: 75 b4 jne 140 <malloc_heap_alloc+0x70> 18c: 0f 1f 40 00 nopl 0x0(%rax) 190: 48 83 c3 3f add $0x3f,%rbx Without rte_wmb 142: f0 45 0f b1 07 lock cmpxchg %r8d,(%r15) 147: 0f 94 c0 sete %al 14a: 84 c0 test %al,%al 14c: 74 ea je 138 <malloc_heap_alloc+0x68> 14e: 49 c7 47 10 00 00 00 movq $0x0,0x10(%r15) 155: 00 156: 41 c7 47 08 00 00 00 movl $0x0,0x8(%r15) 15d: 00 * 15e: 41 c7 07 02 00 00 00 movl $0x2,(%r15)** ** 165: 41 8b 37 mov (%r15),%esi** * 168: 41 c7 47 18 00 00 00 movl $0x0,0x18(%r15) 16f: 00 170: 41 c7 47 1c 00 00 00 movl $0x0,0x1c(%r15) 177: 00 178: 49 c7 47 20 00 00 00 movq $0x0,0x20(%r15) 17f: 00 180: 45 89 57 04 mov %r10d,0x4(%r15) 184: 83 fe 02 cmp $0x2,%esi 187: 75 b7 jne 140 <malloc_heap_alloc+0x70> 189: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 190: 48 83 c3 3f add $0x3f,%rbx It's clear that the *heap->initialised = INITIALISED;* instruction has been reordered by the compiler. About rte_wmb and rte_rmb use, i agree with you that on intel architecture those macro should do nothing more than compiler barrier, due to Intel architecture choices. But for code completness, i think those memory barriers should remain in place in the code, and rte_*mb should map to compiler barrier on intel architecture. didier