On 04/15/2014 04:08 PM, Richardson, Bruce wrote:
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of David Marchand
>> Sent: Tuesday, April 15, 2014 2:51 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH 1/2] mem: add write memory barrier before
>> changing heap state
>>
>> From: Didier Pallard <didier.pallard at 6wind.com>
>>
>> a write memory barrier is needed before changing heap state value, else some
>> concurrent core may see state changing before all initialization values are
>> written to memory, causing unpredictable results in malloc function.
>>
>> Signed-off-by: Didier Pallard <didier.pallard at 6wind.com>
> No barrier should be necessary here. As in a number of other places, such as 
> rings, compiler barriers can be used in place of write memory barriers, due 
> to IA ordering rules. However, in this case, both variables referenced are 
> volatile variables and so the assignments to them cannot be reordered by the 
> compiler so no compiler barrier is necessary either.
>
> Regards,
> /Bruce

Hi bruce,

Indeed a compiler barrier is absolutely needed here. volatile variable 
use is absolutely not a serializing instruction from compiler point of 
view; only atomic variable use is serializing, due to asm volatile 
(memory) directive use.
Here is the assembler generated with and without rte_wmb:


With rte_wmb

  142:   f0 45 0f b1 07          lock cmpxchg %r8d,(%r15)
  147:   0f 94 c0                sete   %al
  14a:   84 c0                   test   %al,%al
  14c:   74 ea                   je     138 <malloc_heap_alloc+0x68>
  14e:   49 c7 47 10 00 00 00    movq   $0x0,0x10(%r15)
  155:   00
  156:   41 c7 47 18 00 00 00    movl   $0x0,0x18(%r15)
  15d:   00
  15e:   41 c7 47 08 00 00 00    movl   $0x0,0x8(%r15)
  165:   00
  166:   41 c7 47 1c 00 00 00    movl   $0x0,0x1c(%r15)
  16d:   00
  16e:   49 c7 47 20 00 00 00    movq   $0x0,0x20(%r15)
  175:   00
  176:   45 89 57 04             mov    %r10d,0x4(%r15)
  17a:   0f ae f8                sfence
* 17d:   41 c7 07 02 00 00 00    movl   $0x2,(%r15)**
** 184:   41 8b 37                mov    (%r15),%esi**
* 187:   83 fe 02                cmp    $0x2,%esi
  18a:   75 b4                   jne    140 <malloc_heap_alloc+0x70>
  18c:   0f 1f 40 00             nopl   0x0(%rax)
  190:   48 83 c3 3f             add    $0x3f,%rbx


Without rte_wmb

  142:   f0 45 0f b1 07          lock cmpxchg %r8d,(%r15)
  147:   0f 94 c0                sete   %al
  14a:   84 c0                   test   %al,%al
  14c:   74 ea                   je     138 <malloc_heap_alloc+0x68>
  14e:   49 c7 47 10 00 00 00    movq   $0x0,0x10(%r15)
  155:   00
  156:   41 c7 47 08 00 00 00    movl   $0x0,0x8(%r15)
  15d:   00
* 15e:   41 c7 07 02 00 00 00    movl   $0x2,(%r15)**
** 165:   41 8b 37                mov    (%r15),%esi**
* 168:   41 c7 47 18 00 00 00    movl $0x0,0x18(%r15)
  16f:   00
  170:   41 c7 47 1c 00 00 00    movl   $0x0,0x1c(%r15)
  177:   00
  178:   49 c7 47 20 00 00 00    movq   $0x0,0x20(%r15)
  17f:   00
  180:   45 89 57 04             mov    %r10d,0x4(%r15)
  184:   83 fe 02                cmp    $0x2,%esi
  187:   75 b7                   jne    140 <malloc_heap_alloc+0x70>
  189:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  190:   48 83 c3 3f             add    $0x3f,%rbx

It's clear that the *heap->initialised = INITIALISED;* instruction has 
been reordered by the compiler.

About rte_wmb and rte_rmb use, i agree with you that on intel 
architecture those macro should do nothing more than compiler barrier, 
due to Intel architecture choices.
But for code completness, i think those memory barriers should remain in 
place in the code, and rte_*mb should map to compiler barrier on intel 
architecture.

didier


Reply via email to