On Jan 11, 2012, at 9:29 AM, Luigi Rizzo wrote:

> On Wed, Jan 11, 2012 at 10:05:28AM -0500, John Baldwin wrote:
>> On Tuesday, January 10, 2012 5:41:00 pm Luigi Rizzo wrote:
>>> On Tue, Jan 10, 2012 at 01:52:49PM -0800, Adrian Chadd wrote:
>>>> On 10 January 2012 13:37, Luigi Rizzo <ri...@iet.unipi.it> wrote:
>>>>> I was glancing through manpages and implementations of bus_dma(9)
>>>>> and i am a bit unclear on what this API (in particular, bus_dmamap_sync() 
>>>>> )
>>>>> does in terms of memory barriers.
>>>>> 
>>>>> I see that the x86/amd64 and ia64 code only does the bounce buffers.
>> 
>> That is because x86 in general does not need memory barriers. ...
> 
> maybe they are not called memory barriers but for instance
> how do i make sure, even on the x86, that a write to the NIC ring
> is properly flushed before the write to the 'start' register occurs ?
> 

Flushed from where?  The CPU's cache or the device memory and pci bus?  I 
already told you that x86/64 is fundamentally designed around bus snooping, and 
John already told you that we map device memory to be uncached.  Also, PCI 
guarantees that reads and writes are retired in order, and that reads are 
therefore flushing barriers.  So lets take two scenarios.  In the first 
scenario, the NIC descriptors are in device memory, so the DMA has to do 
bus_space accesses to write them.

Scenario 1
1.  driver writes to the descriptors.  These may or may not hang out in the 
cpu's cache, though they probably won't because we map PCI device memory as 
uncachable.  But let's say for the sake of argument that they are cached.
2. driver writes to the 'go' register on the card.  This may or may not be in 
the cpu's cache, as in step 1.
3. The writes get flushed out of the cpu and onto the host bus.  Again, the 
x86/64 architecture guarantees that these writes won't be reordered.
4. The writes get onto the PCI bus and buffered at the first bridge.
5. PCI ordering rules keep the writes in order, and they eventually make it to 
the card in the same order that the driver executed them.

Scenario 2
1. driver writes to the descriptors in host memory.  This memory is mapped as 
cache-able, so these writes hang out in the CPU.
2. driver writes to the 'go' register on the card.  This may or may not hang 
out in the cpu's cache, but likely won't as discussed previously.
3. The 'go' write eventually makes its way down to the card, and the card 
starts its processing.
4. the card masters a PCI read for the descriptor data, and the request goes up 
the pci bus to the host bridge
5. thanks to the fundamental design guarantees on x86/64, the pci host bridge, 
memory controller, and cpu all snoop each other.  In this case, the cpu sees 
the read come from the pci host bridge, knows that its for data that's in its 
cache, and intercepts and fills the request.  Coherency is preserved!

Explicit barriers aren't needed in either scenario; everything will retire 
correctly and in order.  The only caveat is the buffering that happens on the 
PCI bus.  A write by the host might take a relatively long and indeterminate 
time to reach the card thanks to this buffering and the bus being busy.  To 
guarantee that you know when the write has been delivered and retired, you can 
do a read immediately after the write.  On some systems, this might also boost 
the transaction priority of the write and get it down faster, but that's really 
not a reliably guarantee.  All you'll know is that when the read completes, the 
write prior to it has also completed.

Where barriers _are_ needed is in interrupt handlers, and I can discuss that if 
you're interested.

Scott

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to