On Wed, Dec 10, 2008 at 09:07:19PM +0900, Pyun YongHyeon wrote: > On Wed, Dec 10, 2008 at 12:32:25PM +0100, Victor Balada Diaz wrote: > > On Wed, Dec 10, 2008 at 07:28:00PM +0900, Pyun YongHyeon wrote: > > > On Wed, Dec 10, 2008 at 09:59:35AM +0100, Victor Balada Diaz wrote: > > > > On Wed, Dec 10, 2008 at 03:12:26PM +0900, Pyun YongHyeon wrote: > > > > > On Tue, Dec 09, 2008 at 07:52:37PM +0100, Victor Balada Diaz wrote: > > > > > > Hello, > > > > > > > > > > > > I got various machines[1] at hetzner.de and I've been having > problems > > > > > > with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in > amd64. I've > > > > > > been trying to narrow the problem so someone more knowledgeable > than me > > > > > > is able to fix it. This mail is an other attempt to ask a > question > > > > > > with regards ATA code to see if this time i got something. > > > > > > > > > > > > For the ones that don't actually know what happened: > > > > > > > > > > > > With FreeBSD 7.0 -RELEASE for amd64 and default kernel > > > > > > the system shared re0 interrupt with OHCI and this caused > > > > > > re(4) to corrupt packets and create interrupt storms. Tried > > > > > > > > > > re(4) in 7.0-RELEASE had bus_dma(9) bug which could be easily > > > > > triggered on systems with > 4GB memory. But I dont' know whether > > > > > this is related with interrupt storms. > > > > > > > > > > > updating to 7.1 -BETA2 and still had some problems with it. > > > > > > > > > > > > I've opened the PR kern/128287[2] and Remko quickly answered > > > > > > with a workaround: that workaround was removing USB support from > > > > > > my kernel. I did it and re(4) wasn't sharing interrupts > anylonger, > > > > > > and the interrupt storms were gone. Now sometime later the > interface > > > > > > goes up and down from time to time, but less often. Also > sometimes > > > > > > the machine losts the network interface but continues to work. > > > > > > > > > > > > > > > > It seems that your controller supports MSI so you can set a tunable > > > > > hw.re.msi_disable to 0 to enable MSI. With MSI you can remove > > > > > interrupt sharing(e.g. add hw.re.msi_disable="0" to > > > > > /boot/loader.conf file.) However there were several issues on re(4) > > > > > w.r.t MSI so it was off by default. > > > > > > > > This is undocumented and with sysctl -a i can't find the tunable. Is > this > > > > a HEAD feature or it's also in 7.1 -BETA2? Should i add > > > > > > Yeah it's an undocmented feature. But most drivers written by me > > > have similar kobs. Both HEAD and stable/7 including 7.1 BETA2 have > > > the tunable. > > > > I think it could be great if you could document it or at least > > show it by default when you do sysctl -ad with a small description. > > > > If MSI worked as expected I would have documented it as I did > in msk(4)/nfe(4)/ale(4)/age(4)/jme(4) etc. > Using MSI on RealTek does not seem to stable. I tried hard to fix > that but some users still reported watchdog timeouts. Working > without documentation and hardware also made it hard to complete > the work. This was the main reason why MSI was disabled on re(4).
What do you think about adding a note in the man page telling that it's experimental and in some cases it could improve the situation but in others it will give errors? > > > > > > > > hw.re_msi_disable="0" to /boot/loader.conf? > > > ^^^^^^^^^^^^^^^^^^^^^ > > > Shoule be hw.re.msi_disable="0" > > > > > > > > > > Yes, just add it to /boot/loader.conf. Note, you should not disable > > > system-wide MSI control(e.g. hw.pci.enable_msi == 1). > > > > > > > This was sharing interrupt with USB, does USB need any special MSI > handling > > > > or with re using MSI is enough to not share the interrupt? > > > > > > If re(4) can use MSI, you don't need to worry about interrupt > > > sharing with USB. Check the output of "vmstat -i". You normally get > > > an irq256 or higher for MSI enabled driver. > > > > > > > > > > > > > > > > > > > > > > I know it continues to work because some days later i can see > that > > > > > > it tried to deliver the status reports but was unable to resolve > the > > > > > > aliases hostnames. I can't ping the machine and i know the > network > > > > > > is OK. If i reboot the machine everything is working again. > > > > > > > > > > > > > > > > Recently I've made small changes to re(4) which may help to detect > > > > > link state change event. Would you try re(4) in HEAD? > > > > > > > > Can i just drop HEAD's /stable/7/sys/dev/re/ in -STABLE and test that > > > > > > Yes, you can. It should build without problems. Just replace re(4) on > > > stable/7 with HEAD version. > > > > > > > or do i need to test the whole HEAD kernel? > > > > > > > > > > No you don't have to that. > > > > Backporting the changes i've found that it didn't compile so in > > the end i got from HEAD the following files: > > > > base/head/sys/dev/re/if_re.c > > base/head/sys/pci/if_rl.c > > base/head/sys/pci/if_rlreg.h > > > > Ah,, sorry about that. Recently there was some changes. I forgot > that. > > > After that i've recompiled 7.1 -BETA2 GENERIC kernel and enabled > > the knob you suggested in /boot/loader.conf. > > > > With the new kernel and MSI the interrupts are like this: > > > > # vmstat -i > > interrupt total rate > > irq9: acpi0 1 0 > > irq16: ohci0 1 0 > > irq17: ohci1 ohci3 1 0 > > irq18: ohci2 ohci4 1 0 > > irq22: atapci0 19215 15 > > cpu0: timer 2502718 1998 > > irq256: re0 4967726 3967 > > cpu1: timer 2502525 1998 > > Total 9992188 7980 > > > > The high interrupt numbers are because i've been running iperf to > > check everything it's fine, not because of interrupt storms. So far > > i didn't find any interrupt storms related to USB or re(4) driver > > but while doing the tests i've found this error: > > > > re0: watchdog timeout (missed Tx interrupts) -- recovering > > > > This didn't create any error on the interfaces (netstat -i). > > > > This was triggered by new code in HEAD. It indicates re(4) missed > Tx completion interrupt. It could be a bug in driver or hardware > bug. If you can live with that message you can safely ignore that > as now re(4) does not reinitialize the hardware if it detect > missing Tx completion interrupt. Yeah, just happened once, and i'm used to receiving a lot of interface UP/DOWN messages that now are gone, so this is an improvement. > > > Also i didn't see any problem with interfaces going up and down, > > but that usually happen after some hours of uptime, so i'll let > > you know if the error happens again. > > > > Ok. > > > As these seems to improve the current situation, is there any > > chance of merging -current driver in 7.1 before release? > > > > I think re(4) in HEAD needs more testing. As you might know RealTek > produced too many chipsets. :-( Ok, i'll use the backported driver as it works better for me :-) If i can help you testing any patches i'm more than welcome to do it. Thanks a lot for your help Pyun YongHyeon. Regards. -- La prueba más fehaciente de que existe vida inteligente en otros planetas, es que no han intentado contactar con nosotros. _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"