On Sun, Nov 22, 2009 at 08:31:07PM -0800, Roland Dreier wrote:
> The interrupt handling in ral(4) for RT2661 has a couple of problems,
> which causes the interface to get stuck under heavy load with OACTIVE
> set (the problems are likely especially severe on slow systems such as
> my 600MHz VIA system); bouncing the interface down and back up fixes
> things.  As I describe below, I think I've been able to fix it, and
> I'd be happy to see the patch below reviewed and applied.
> 
> I've seen other reports that look similar to the problems I was
> having; eg bug kernel/5958 starts out talking about RT2860 (which is
> completely different code) but some of the "me too" replies are for
> RT2561S, which I hope this patch fixes (I've cc'ed those reporters;
> test reports welcome!).  I've not looked at the RT2860 code due to
> lack of hardware, but if someone wants to send me a PCI card....

I've found an unused RT 2561 and did some tests with it.

> 
> The first problem is that multiple TX completions may happen before
> the interrupt handler gets to rt2661_tx_intr().  When this happens,
> the TX interrupt handler only completes one entry in the TX ring,
> which leads to the driver getting behind the hardware.  To fix this, I
> extended the qid field in the TX descriptor to contain the index in
> the TX ring as well as the queue ID, and then when an interrupt is
> missed, free the earlier TX entries as well as the entry that the
> interrupt is for.  (I did see this code trigger under load)
> 
> This exposes the second problem: there is a race that is inherent in
> separating TX completion handling between TX DMA interrupts and TX
> interrupts -- the driver may handle all the TX DMAs that finished when
> it called rt2661_tx_dma_intr(), but by the time it gets to
> rt2661_tx_intr(), another TX may have completed and the driver may end
> up processing a TX completion for which it hasn't handled the TX DMA
> completion.  This ends up leaking mbufs if a new send is enqueued
> before the TX DMA interrupt has a chance to "catch up."  (This happens
> in practice on my system as well)
> 
> It is probably possible to fix this and keep the split DMA/TX
> handling, but that seems to require unneeded complexity.  Instead, we
> can just ignore TX DMA interrupts and handle everything when the TX
> actually completes.  This means we don't free the mbuf quite as soon,
> but since we can't reuse the slot in the TX ring anyway, I don't see
> this as a problem in practice.
> 
> With this patch applied, the ral interface on my access point is able
> to continue operating under load that would cause the interface to get
> stuck with the stock driver fairly quickly.

I don't see any difference between your patch and -current (but it
does work, no issues)

Mind sharing your hostname.ral0 and the tools you use to trigger this
situation? I've tried hping, tcpbench, ping -f, rsync, etc to no avail.

max ~8000 intr/s with hping
2.5MB/s with scp


OpenBSD 4.6-current (GENERIC) #0: Sat Dec  5 16:13:19 CET 2009
    tobi...@neodym.tmux.org:/home/tobiasu/obsd/src/sys/arch/i386/compile/GENERIC
cpu0: AMD Athlon(tm) XP 2500+ ("AuthenticAMD" 686-class, 512KB L2 cache) 1.84 
GHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
real mem  = 1610117120 (1535MB)
avail mem = 1551433728 (1479MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 05/17/05, BIOS32 rev. 0 @ 0xfa390, SMBIOS 
rev. 2.3 @ 0xf0100 (38 entries)
bios0: vendor Award Software International, Inc. version "F6" date 05/17/2005
bios0: Gigabyte Technology Co., Ltd. GA-7S748
apm0 at bios0: Power Management spec V1.2 (slowidle)
apm0: AC on, battery charge unknown
acpi at bios0 function 0x0 not configured
pcibios0 at bios0: rev 2.1 @ 0xf0000/0xc784
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfc6f0/144 (7 entries)
pcibios0: PCI Exclusive IRQs: 5 6 9 10 11
pcibios0: PCI Interrupt Router at 000:02:0 ("SiS 85C503 System" rev 0x00)
pcibios0: PCI bus #1 is the last bus
bios0: ROM list: 0xc0000/0xf600 0xd0000/0x8000!
cpu0 at mainbus0: (uniprocessor)
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "SiS 746 PCI" rev 0x10
sisagp0 at pchb0
agp0 at sisagp0: aperture at 0xe0000000, size 0x4000000
ppb0 at pci0 dev 1 function 0 "SiS 86C202 VGA" rev 0x00
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 vendor "ATI", unknown product 0x9505 rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
pcib0 at pci0 dev 2 function 0 "SiS 85C503 System" rev 0x25
pciide0 at pci0 dev 2 function 5 "SiS 5513 EIDE" rev 0x00: 746: DMA, channel 0 
wired to compatibility, channel 1 wired to compatibility
atapiscsi0 at pciide0 channel 0 drive 1
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: <PLEXTOR, CD-R PX-W1210A, 1.02> ATAPI 5/cdrom 
removable
cd0(pciide0:0:1): using PIO mode 4, DMA mode 2
wd0 at pciide0 channel 1 drive 0: <ST3250620A>
wd0: 16-sector PIO, LBA48, 238474MB, 488395055 sectors
wd1 at pciide0 channel 1 drive 1: <ST3250620A>
wd1: 16-sector PIO, LBA48, 238475MB, 488397168 sectors
wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 5
wd1(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 5
auich0 at pci0 dev 2 function 7 "SiS 7012 AC97" rev 0xa0: irq 11, SiS7012 AC97
ac97: codec id 0x414c4760 (Avance Logic ALC655 rev 0)
audio0 at auich0
ohci0 at pci0 dev 3 function 0 "SiS 5597/5598 USB" rev 0x0f: irq 10, version 
1.0, legacy support
ohci1 at pci0 dev 3 function 1 "SiS 5597/5598 USB" rev 0x0f: irq 11, version 
1.0, legacy support
ehci0 at pci0 dev 3 function 3 "SiS 7002 USB" rev 0x00: irq 6
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "SiS EHCI root hub" rev 2.00/1.00 addr 1
pciide1 at pci0 dev 10 function 0 "Promise PDC20271" rev 0x02: DMA, channel 0 
configured to native-PCI, channel 1 configured to native-PCI
pciide1: using irq 11 for native-PCI interrupt
ral0 at pci0 dev 11 function 0 "Ralink RT2561" rev 0x00: irq 5, address 
00:80:5a:38:c4:0b
ral0: MAC/BBP RT2661B, RF RT2527
rl0 at pci0 dev 12 function 0 "Realtek 8139" rev 0x10: irq 10, address 
00:08:54:01:0a:00
rlphy0 at rl0 phy 0: RTL internal PHY
"Philips SAA7134 TV" rev 0x01 at pci0 dev 13 function 0 not configured
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
it0 at isa0 port 0x2e/2: IT8705F rev 2, EC port 0x290
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 "SiS OHCI root hub" rev 1.00/1.00 addr 1
usb2 at ohci1: USB revision 1.0
uhub2 at usb2 "SiS OHCI root hub" rev 1.00/1.00 addr 1
biomask ff45 netmask ff65 ttymask ffff
mtrr: Pentium Pro MTRR support
uhidev0 at uhub2 port 2 configuration 1 interface 0 "Logitech USB-PS/2 Optical 
Mouse" rev 2.00/20.00 addr 2
uhidev0: iclass 3/1
ums0 at uhidev0: 3 buttons, Z dir
wsmouse0 at ums0 mux 0
vscsi0 at root
scsibus1 at vscsi0: 256 targets
softraid0 at root
root on wd1a swap on wd1b dump on wd1b

Reply via email to