Guy Brand wrote:
Craig Boston ([EMAIL PROTECTED]) on 29/09/2006 at 20:19 wrote:
One thing this patch definitely did do though, is break the nvidia
driver pretty badly. Couldn't keep the X server running for more than a
minute before it froze solid. Lots of Xid: blah blah blah messages.
Yes I remembered to rebuild the kernel module ;)
Hi,
Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon
Oct 2 15:24:04 CEST 2006 DEBUG i386 on a box having em sharing
IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756):
interrupt total rate
irq1: atkbd0 5 0
irq14: ata0 47 0
irq16: nvidia0 em+ 86545 185
irq17: fwohci0 7 0
irq21: twe0 6426 13
cpu0: timer 927735 1986
Total 1020765 2185
I freeze the box by starting firefox which reloads a few tabs I keep
open in my session when under X. This is perfectly reproductible.
From the logs, first I see:
Oct 2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
00010597
Oct 2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 00000000
Oct 2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
00010598
Oct 2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
00010599
Oct 2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
0001059a
Oct 2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
0001059b
Oct 2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
0001059c
Oct 2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
0001059d
Oct 2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
0001059e
Oct 2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
0001059f
Oct 2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
000105a0
then come the watchdogs:
Oct 2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:48:56 mojito kernel: em0: link state changed to DOWN
Oct 2 16:48:58 mojito kernel: em0: link state changed to UP
Oct 2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
000105a1
Oct 2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:49:06 mojito kernel: em0: link state changed to DOWN
Oct 2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
000105a2
Oct 2 16:49:08 mojito kernel: em0: link state changed to UP
Oct 2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
000105a3
Oct 2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:49:16 mojito kernel: em0: link state changed to DOWN
Oct 2 16:49:18 mojito kernel: em0: link state changed to UP
Oct 2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
000105a4
Oct 2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:49:26 mojito kernel: em0: link state changed to DOWN
Oct 2 16:49:29 mojito kernel: em0: link state changed to UP
Oct 2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count
000105a5
Oct 2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:49:36 mojito kernel: em0: link state changed to DOWN
Oct 2 16:49:39 mojito kernel: em0: link state changed to UP
Oct 2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:49:47 mojito kernel: em0: link state changed to DOWN
Oct 2 16:49:49 mojito kernel: em0: link state changed to UP
and the box ends up frozen less than a minute later. The traffic
on the Intel card can be low (pinging a host for a few dozen of
seconds), medium (reloading a few pages in the tabs of Firefox) or
high (downloading several iso images from our local FTP mirror):
whatever I do, if both nvidia and em0 are used, the box freezes.
Note that I can't freeze the box when doing several simultaneous big
downloads or taring up a lot of files but NOT running X. So I guess
it is a shared nvidia/em IRQ issue.
FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem.
The "DEBUG" kernconf is GENERIC + witness options enabled (but they
do not help in this case).
I traced back to find which changeset introduced the trouble. The
results are:
#*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00
# OK
...
#*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
# OK
#
#*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
# BROKEN
...
#*default release=cvs tag=RELENG_6
# BROKEN
From sys commitlogs the culprit commits are:
glebius 2006-08-08 09:19:25 utc
freebsd src repository
modified files: (branch: releng_6)
sys/dev/em if_em.c
log:
sync with head. this includes the following changes in chronological
order:
o a significant performance improvements. the interrupt handler
schedules work to a private taskqueue. the em_rxeof() function
runs lockless.
rev. 1.98 - 1.101 by scottl.
rev. 1.103 by mux
rev. 1.106 by glebius, from andrey v. elsukov <bu7cher yandex.ru>
rev. 1.116 by glebius
o style cleanups:
- rev. 1.102, 1.108, 1.109 by glebius
- rev. 1.124 by pdeuskar
o vendor merges:
- merged with vendor driver version 5.1.5 by jack vogel.
rev. 1.115 by glebius
- merged with vendor driver version 6.0.5 by jack vogel.
rev. 1.123 by glebius
o various fixes:
- invalid use of bus_dma_allocnow
rev. 1.104 by scott, 1.121 by yongari
- link state handling cleanup.
rev. 1.110 by glebius
- fix if_baudrate handling.
rev. 1.111 by glebius
- honor iff_drv_oactive in em_start_locked().
rev. 1.117 by yongari
- protect eeprom access with the driver lock.
rev. 1.118 by yongari
- fix link flap on siocgifaddr.
rev. 1.119 by yongari
- fix dma map handling in em_encap().
rev. 1.120,1.122 by yongari
revision changes path
1.65.2.17 +1587 -1443 src/sys/dev/em/if_em.c
glebius 2006-08-08 09:20:26 utc
freebsd src repository
modified files: (branch: releng_6)
sys/dev/em license readme if_em.h if_em_hw.c
if_em_hw.h if_em_osdep.h
log:
sync with head, merging vendor drivers updates 5.1.5, 6.0.5 by jack vogel.
revision changes path
1.3.2.1 +1 -1 src/sys/dev/em/license
1.10.2.1 +71 -30 src/sys/dev/em/readme
1.32.2.3 +133 -157 src/sys/dev/em/if_em.h
1.16.2.2 +3186 -906 src/sys/dev/em/if_em_hw.c
1.15.2.3 +712 -48 src/sys/dev/em/if_em_hw.h
1.14.2.2 +46 -15 src/sys/dev/em/if_em_osdep.h
I confirmed that by building a kernel from 2006.08.08.09.21.00 which
shows the problem and a kernel from 2006.08.08.09.18.00 which works
like a charm.
Dunno if this could be linked to the em* watchdogs reported in this
thread. Let me know if I can do something useful to help fixing this
issue.
So you tested before these two changes and after these two changes, yes?
What about with just the first change and not the second? Anyways, I'm
starting to see a trend here. Problem reports are clustering around UP
systems, not SMP systems. I don't know if that's just coincidence or not.
Can you try a quick test? Reboot and press '6' at the FreeBSD loader
menu. That will drop you to a prompt. Then enter the following line:
set hint.apic.0.disabled=1
Then continue the boot by entering:
boot
The machine should boot up normally. If it doesn't boot, just reset the
machine and allow it to boot without the apic change. With the change,
as well as the up to date em driver, see if you still get the nvidia and
other problems.
Scott
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"