On Jan 11, 2012, at 10:10 AM, Ian Lepore wrote:

> On Wed, 2012-01-11 at 09:59 -0700, Scott Long wrote:
>> 
>> Where barriers _are_ needed is in interrupt handlers, and I can
>> discuss that if you're interested.
>> 
>> Scott
>> 
> 
> I'd be interested in hearing about that (and in general I'm loving the
> details coming out in your explanations -- thanks!).
> 
> -- Ian
> 
> 

Well, I unfortunately wasn't as clear as I should have been.  Interrupt 
handlers need bus barriers, not cpu cache/instruction barriers.  This is 
because the interrupt signal can arrive at the CPU before data and control 
words are finished being DMA's up from the controller.  Also, many controllers 
require an acknowledgement write to be performed before leaving the interrupt 
handler, so the driver needs to do a bus barrier to ensure that the write 
flushes.  But these are two different topics, so let me start with the 
interrupt handler.

Legacy interrupts in PCI are carried on discrete pins and are level triggered.  
When the device wants to signal an interrupt, it asserts the pin.  That 
assertion is seen at the IOAPIC on the host bridge and converted to an 
interrupt message, which is then sent immediately to the CPU's lAPIC.  This all 
happened very, very quickly.  Meanwhile, the interrupt condition could have 
been predicated on the device DMA'ing bytes up to host memory, and those DMA 
writes could have gotten stalled and buffered on the way up the PCI topology.  
The end result is often that the driver interrupt handler runs before those 
writes have hit host memory.  To fix this, drivers do a read of a card register 
as the first step in the interrupt handler, even if the read is just a dummy 
and the result is thrown away.  Thanks to PCI ordering, the read will ensure 
that any pending writes from the card have flushed all the way up, and 
everything will be coherent by the time the read completes.

MSI and MSIX interrupts on modern PCI and PCIe fix this.  These interrupts are 
sent as byte messages that are DMA'd to the host bridge.  Since they are 
in-band data, they are subject to the same ordering rules as all other data on 
the bus, and thus ordering for them is implicit.  When the MSI message reaches 
the host bridge, it's converted into an lAPIC message just like before.  
However, the driver doesn't need to do a flushing read because it knows that 
the MSI message was the last write on the bus, therefore everything prior to it 
has arrived and everything is coherent.  Since reads are expensive in PCI, this 
saves a considerable amount of time in the driver.  Unfortunately, it adds 
non-deterministic latency to the interrupt since the MSI message is in-band and 
has no way to force priority flushing on a busy bus.  So while MSI/MSIX save 
some time in the interrupt handler, they actually make the overall latency 
situation potentially worse (thanks Intel!).

The acknowledgement write issue is a little more straight forward.  If the card 
requires an acknowledgment write from the driver to know that the interrupt has 
been serviced (so that it'll then know to de-assert the interrupt line), that 
write has to be flushed to the hardware before the interrupt handler completes. 
 Otherwise, the write could get stalled, the interrupt remain asserted, and in 
the interrupt erroneously re-trigger on the host CPU.  I've seen cases where 
this devolves into the card getting out of sync with the driver to the point 
that interrupts get missed.  Also, this gets a little weird sometimes with 
buggy MSI hacks in both device and PCI bridge hardware.

Scott



_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to