Re: FreeBSD 4.9 losing mbufs!!!
On Tue, 18 Apr 2006, Stephen Clark wrote: I have discovered that if I disable quaqqa/ospfd then I don't lose mbufs! This makes it appear that the mbuf leak is in the multicast routing logic. In fact I lose mbufs even with the both system basically idle but with a 100 vpn/gre with multicast going on thru the gre then the vpn. Any ideas on where to focus my continued investigation? Thanks to everybody who has responded. Steve, Sorry not to have caught this thread earlier; I've been on travel for the last few weeks. My general suggestion would be to try to narrow the code paths traversed to try to eliminate as much code as possible from the search. It sounds like you've done that pretty effectively :-). Typically, memory leaks occur in edge error cases, where the memory is not properly released, or ownership is unclear. My suggestion would be to add counters (or look at existing counters where already present) and see if there's an error case being triggered in about the same quantity that mbuf leakage is occuring. Chances are, there's an error being returned and a missing m_freem(). Based on your comments above, I might also pay attention to the routing socket path -- the rate of leak could correspond to the routing daemons talking to the network stack, rather than the rate of traffic. For example, it could be that one of the routing messages is handled improperly resulting in a leak. Unfortunately, tracking down memory leaks can be quite difficult, and tends to require a combination of dogged persistence and luck... Robert N M Watson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.1RC system nearly freezing
On Thu, 20 Apr 2006, Henri Hennebert wrote: I upgrade a web, squid, mail server (under SMP with 2 pentium III) to 6.1-RC (Apr 9 2006) and encounter 2 `freezing'. The system is still responding to http requests but I can't login on the console or through ssh -- no shell prompt. No more mail delivery. I break to KDB and found more then 1000 sendmail processes waiting for devfs... call boot(0) can't complete the shutdown process. I join the KDB informations. Let me know if more informations are needed. Are you running with WITNESS and INVARIANTS enabled? If not, could you do so and see if the problem is reproduceable, and if so, whether or not WITNESS (and friends) generate any warnings? It looks like something has leaked a lock, resulting in deadlock. The question is, however, which lock, and where. WITNESS may be able to provide some insight into this; if you could run "show alllocks" with WITNESS in place, that would be helpful also. Robert N M Watson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD 6.1-PRERELEASE (April 5, 2006) randomly rebooting on Dell Poweredge 650
On Tue, 18 Apr 2006, Matt Watson wrote: I'm not sure if this is the right place to be sending this or not but I figured I'd give it a shot. This (freebsd-stable) is a good place to send the message. Do you have a serial console on the box? If not, could you try putting one on there? Some kernel output, especially in the event of a crash, doesn't end up in the system log as (for example) the file system may not be available. If you configure a serial console being logged to another machine, you can more reliably log crash information. Information on setting up a serial console can be found in the FreeBSD Handbook, but the short of it is that you can put a null modem cable to another box on the first serial port, and add: console="comconsole" to /boot/loader.conf. FreeBSD will normally reboot after a kernel panic if debugging isn't enabled, in which case you should see the output. And if you don't see output, it's probably a bad hardware interaction rather than a kernel panic. Robert N M Watson The subject line pretty much says it all, I have a Dell Poweredge 650 box running 6.1-PRERELEASE which was cvsup'd on April 5, 2006. The box has now twice rebooted on its own for no aparant reason. Its a fresh install as well, and appears to have been doing this ever since it was installled. The first time the box was only up for approximently 2 days and rebooted, the 2nd time it was up for approximently 10 days. I have all.log setup to log all syslog messages however when the reboot occured there is no information in the log which indicates anything going wrong... Here is a small cut from the log at the time of the reboot. As can be seen, one minute there is an imapd process, the next entry is the system restarting. Apr 16 20:55:34 clearwater imapd: LOGOUT, user=XX, ip=[:::WWW.XXX.YYY.ZZZ], headers=0, body=0, time=0 Apr 16 20:59:36 clearwater syslogd: restart Apr 16 20:59:36 clearwater syslogd: kernel boot file is /boot/kernel/kernel Apr 16 20:59:36 clearwater kernel: Copyright (c) 1992-2006 The FreeBSD Project. Apr 16 20:59:36 clearwater kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Apr 16 20:59:36 clearwater kernel: The Regents of the University of California. All rights reserved. Apr 16 20:59:36 clearwater kernel: FreeBSD 6.1-PRERELEASE #0: Wed Apr 5 20:46:37 EDT 2006 This machine previously had Linux installed on the box and did not display the same problems, so I'm going on the assumption that its not a hardware failure. Aside from the reboots the box has been preforming extermely well. If anybody can provide some insights or suggestions I'd greatly appreciate it. Thanks, Matt Watson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.1 prerelease graid3 livelock?
On Sun, Apr 23, 2006 at 12:04:33PM -0700, Bradley W. Dutton wrote: +> Hi, +> +> I'm experiencing a sort of livelock on a 6.1 prerelease box. It appears +> all of the IO related activity hangs but the box continues to do +> routing/NAT/etc for internet access from my other boxes. I can usually +> get the lockup to occur within about 12 hours of booting. +> +> I've narrowed down the commits to those on March 20 (kernel before then +> works, kernel after then causes problems) and I think the problem is +> geom/raid related. Besides a small gmirrored root partition the rest of my +> partitions are all graid3. I'm not sure what information to provide to +> help troubleshoot but I'm happy to do what's needed. +> +> On an unrelated note the rebuild speed was about 50% faster on my box when +> using the new geom/raid code introduced on March 20th. Can you break into DDB (alt+ctrl+esc or send break via serial console) and send me the output of 'traceall' command? -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpzFaU6jsST4.pgp Description: PGP signature
HP DL145G2 SCSC Raid Controlle Q
Hello, I hope, this question is not off topic for this list. In the HP config tool for the DL145 I can select a HP PL100 SCSI RAID Controller (PN 355671-B21) Does this controller work with FreeBSD? I'm not sure, if this is a relabled LSI controller. Kind regards, Thomas. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD 4.9 losing mbufs!!!
Robert Watson wrote: On Tue, 18 Apr 2006, Stephen Clark wrote: I have discovered that if I disable quaqqa/ospfd then I don't lose mbufs! This makes it appear that the mbuf leak is in the multicast routing logic. In fact I lose mbufs even with the both system basically idle but with a 100 vpn/gre with multicast going on thru the gre then the vpn. Any ideas on where to focus my continued investigation? Thanks to everybody who has responded. Steve, Sorry not to have caught this thread earlier; I've been on travel for the last few weeks. My general suggestion would be to try to narrow the code paths traversed to try to eliminate as much code as possible from the search. It sounds like you've done that pretty effectively :-). Typically, memory leaks occur in edge error cases, where the memory is not properly released, or ownership is unclear. My suggestion would be to add counters (or look at existing counters where already present) and see if there's an error case being triggered in about the same quantity that mbuf leakage is occuring. Chances are, there's an error being returned and a missing m_freem(). Based on your comments above, I might also pay attention to the routing socket path -- the rate of leak could correspond to the routing daemons talking to the network stack, rather than the rate of traffic. For example, it could be that one of the routing messages is handled improperly resulting in a leak. Unfortunately, tracking down memory leaks can be quite difficult, and tends to require a combination of dogged persistence and luck... Robert N M Watson Robert, Thanks for your response. I am in the process of moving our app to 6. stable to see if the problem still exists. If it does then maybe I can't generate some enthusiasm form the FreeBSD community to take moew of an interest in the problem. I have a lot of C experience but not with the *BSD network stack, still trying to get a good understanding of the flow of the packets thru the stack. Our next release will be based on 6 but that is months away. We have some Athon 64 X2 we are putting in that will handling 100 to 200 vpn/gre tunnels and right now ipintrq slowly grows which eventually forces a reboot of the systems. Fortune 2000 companies don't like see that happen. Regards, Steve -- "They that give up essential liberty to obtain temporary safety, deserve neither liberty nor safety." (Ben Franklin) "The course of history shows that as a government grows, liberty decreases." (Thomas Jefferson) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Stable Build Error
I've been seeing this for a week or so, and have deleted /usr/obj and run make ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Stable Build Error
I've been seeing the following for a week or so, and have deleted /usr/obj , re-cvsupped and run cd /usr/src && make cleandir && make cleandir make.conf is empty. Anyone got any ideas? ===> kerberos5/tools/make-roken (all) ===> kerberos5/tools/asn1_compile (all) cd /usr/src/kerberos5/tools/asn1_compile/../make-roken && make cc -O2 -fno-strict-aliasing -pipe -I/usr/src/kerberos5/tools/asn1_compile/../../../crypto/heimdal/lib/roken -I/usr/src/kerberos5/tools/asn1_compile/../../../crypto/heimdal/lib/asn1 -I. -DHAVE_CONFIG_H -I/usr/src/kerberos5/tools/asn1_comp ile/../../include -DINET6 -I/usr/obj/usr/src/tmp/legacy/usr/include -c /usr/src/kerberos5/tools/asn1_compile/../../../crypto/heimdal/lib/asn1/gen.c In file included from ./roken.h:61, from /usr/src/kerberos5/tools/asn1_compile/../../../crypto/heimdal/lib/asn1/gen_l ocl.h:51, from /usr/src/kerberos5/tools/asn1_compile/../../../crypto/heimdal/lib/asn1/gen.c :34: /usr/include/resolv.h:320: error: syntax error before "ns_tsig_key" *** Error code 1 Stop in /usr/src/kerberos5/tools/asn1_compile. *** Error code 1 Stop in /usr/src/kerberos5/tools. *** Error code 1 Stop in /usr/src. *** Error code 1 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Stable Build Error
On Mon, Apr 24, 2006 at 04:18:06PM +0100, Lawrence Farr wrote: > I've been seeing this for a week or so, and have deleted > /usr/obj and run make My telepathy powers aren't working. What's the error? :) -- Brooks -- Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 pgpkwSKxKYoVC.pgp Description: PGP signature
Re: HP DL145G2 SCSC Raid Controlle Q
On Mon, 24 Apr 2006, Thomas Krause wrote: TK> Hello, TK> I hope, this question is not off topic for this list. TK> In the HP config tool for the DL145 I can select a TK> HP PL100 SCSI RAID Controller (PN 355671-B21) TK> Does this controller work with FreeBSD? I'm not sure, if this TK> is a relabled LSI controller. Should work: mpt0: port 0x2000-0x20ff mem 0xd812-0xd813,0xd810-0xd811 irq 32 at device 1.0 on pci134 Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fsck_ufs locked in snaplk
On Mon, 24 Apr 2006, Dmitry Morozovsky wrote: DM> kKK> > one of my servers had to be rebooted uncleanly and then I have backgrounded DM> KK> > fsck locked for more than an our in snaplk: DM> KK> > DM> KK> > 742 root 1 -44 1320K 688K snaplk 0:02 0.00% fsck_ufs DM> KK> > DM> KK> > File system in question is 200G gmirror on SATA. Usually making a snapshot DM> KK> > (e.g., for making dumps) consumes 3-4 minutes for that fs, so it seems to me DM> KK> > that filesystem is in a deadlock. DM> KK> DM> KK> Is the process performing I/O? Background fsck deliberately runs at a DM> KK> slow rate so it does not destroy I/O performance on the rest of the DM> KK> system. DM> DM> Nope. For that case, 50+ smbds had been locked in 'ufs' state, so I've been DM> urged to revive the machine and reboot, turning off bgfsck. DM> DM> This night, dump -L locks in the same position on the same filesystem: DM> DM> 0 2887 2886 0 -4 0 1260 692 snaplk D ??0:01.28 DM> /sbin/mksnap_ffs root0.0 0.1 5:19AM DM> DM> it has been started at 5:19am, and now is 9:20 - no disk activity DM> DM> DM> For the reference: it's fresh RELENG_6_1/i386. Just rechecked it: did mksnap_ffs on an otherwise idle file system: [EMAIL PROTECTED]:/> mksnap_ffs /st /st/.snap/test_snapshot load: 0.02 cmd: mksnap_ffs 4012 [biord] 0.00u 0.04s 0% 696k load: 0.04 cmd: mksnap_ffs 4012 [biord] 0.00u 0.44s 0% 696k load: 0.21 cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.17s 0% 696k load: 0.20 cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.23s 0% 696k load: 0.13 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k load: 0.08 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k load: 0.01 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k (I hit ^T several times) biord phase consumes about 1.5-2 mins, snaprdb phase - about 30-40 secs, and then process died. Most disk requests succeeds; however, accessing /st/.snap locks process in ufs state forever. What bothers me most is that it is the only machine reproducibly hangs in snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. Other RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!) Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP DL145G2 SCSC Raid Controlle Q
Dmitry Morozovsky schrieb: On Mon, 24 Apr 2006, Thomas Krause wrote: TK> Hello, TK> I hope, this question is not off topic for this list. TK> In the HP config tool for the DL145 I can select a TK> HP PL100 SCSI RAID Controller (PN 355671-B21) TK> Does this controller work with FreeBSD? I'm not sure, if this TK> is a relabled LSI controller. Should work: mpt0: port 0x2000-0x20ff mem 0xd812-0xd813,0xd810-0xd811 irq 32 at device 1.0 on pci134 mhh, isn't mpt only a SCSI controller (not a RAID controller)? best regards, Thomas. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fsck_ufs locked in snaplk
On Mon, Apr 24, 2006 at 10:04:57PM +0400, Dmitry Morozovsky wrote: > On Mon, 24 Apr 2006, Dmitry Morozovsky wrote: > > DM> kKK> > one of my servers had to be rebooted uncleanly and then I have > backgrounded > DM> KK> > fsck locked for more than an our in snaplk: > DM> KK> > > DM> KK> > 742 root 1 -44 1320K 688K snaplk 0:02 0.00% > fsck_ufs > DM> KK> > > DM> KK> > File system in question is 200G gmirror on SATA. Usually making a > snapshot > DM> KK> > (e.g., for making dumps) consumes 3-4 minutes for that fs, so it > seems to me > DM> KK> > that filesystem is in a deadlock. > DM> KK> > DM> KK> Is the process performing I/O? Background fsck deliberately runs at a > DM> KK> slow rate so it does not destroy I/O performance on the rest of the > DM> KK> system. > DM> > DM> Nope. For that case, 50+ smbds had been locked in 'ufs' state, so I've > been > DM> urged to revive the machine and reboot, turning off bgfsck. > DM> > DM> This night, dump -L locks in the same position on the same filesystem: > DM> > DM> 0 2887 2886 0 -4 0 1260 692 snaplk D ??0:01.28 > DM> /sbin/mksnap_ffs root0.0 0.1 5:19AM > DM> > DM> it has been started at 5:19am, and now is 9:20 - no disk activity > DM> > DM> > DM> For the reference: it's fresh RELENG_6_1/i386. > > Just rechecked it: did mksnap_ffs on an otherwise idle file system: > > [EMAIL PROTECTED]:/> mksnap_ffs /st /st/.snap/test_snapshot > load: 0.02 cmd: mksnap_ffs 4012 [biord] 0.00u 0.04s 0% 696k > load: 0.04 cmd: mksnap_ffs 4012 [biord] 0.00u 0.44s 0% 696k > load: 0.21 cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.17s 0% 696k > load: 0.20 cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.23s 0% 696k > load: 0.13 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k > load: 0.08 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k > load: 0.01 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k > > (I hit ^T several times) > > biord phase consumes about 1.5-2 mins, > snaprdb phase - about 30-40 secs, and then process died. Most disk requests > succeeds; however, accessing /st/.snap locks process in ufs state forever. > > What bothers me most is that it is the only machine reproducibly hangs in > snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. Other > RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!) Are you quite certain it's running up-to-date RELENG_6_1? All known snapshot deadlock issues were believed to have been fixed a few weeks ago. If so, we might need you to enable extra debugging to track this down. Kris pgpUe2Ty1ssUR.pgp Description: PGP signature
Re: fsck_ufs locked in snaplk
Dmitry Morozovsky wrote: one of my servers had to be rebooted uncleanly and then I have backgrounded fsck locked for more than an our in snaplk: Given that this system came down uncleanly, have you tried starting up in single-user and manually doing an fsck (without '-p') on the afflicted file-system? My guess is that there's something there which can't be resolved automagically, -- Michael Butler, CISSP Security Architect Protected Networks http://www.protected-networks.net smime.p7s Description: S/MIME Cryptographic Signature
Re: 6.1RC system nearly freezing
Robert Watson wrote: On Thu, 20 Apr 2006, Henri Hennebert wrote: I upgrade a web, squid, mail server (under SMP with 2 pentium III) to 6.1-RC (Apr 9 2006) and encounter 2 `freezing'. The system is still responding to http requests but I can't login on the console or through ssh -- no shell prompt. No more mail delivery. I break to KDB and found more then 1000 sendmail processes waiting for devfs... call boot(0) can't complete the shutdown process. I join the KDB informations. Let me know if more informations are needed. Are you running with WITNESS and INVARIANTS enabled? If not, could you do so and see if the problem is reproduceable, and if so, whether or not WITNESS (and friends) generate any warnings? It looks like something has leaked a lock, resulting in deadlock. The question is, however, which lock, and where. WITNESS may be able to provide some insight into this; if you could run "show alllocks" with WITNESS in place, that would be helpful also. I add WITNESS and INVARIANTS to my config and the next freeze/boot will have it [see PS]. This server is in production and running with a newer kernel for more than 5 days now. The diff (from apr 13) with the previous kernel [the one with the last freeze] are: Connected to cvsup.ciger.be Updating collection src-all/cvs Edit src/etc/sendmail/freebsd.mc Edit src/etc/sendmail/freebsd.submit.mc Edit src/lib/libc/gen/vis.3 Edit src/release/doc/en_US.ISO8859-1/hardware/common/dev.sgml Edit src/release/doc/share/misc/dev.archlist.txt Edit src/sbin/geom/core/geom.c Edit src/share/man/man4/Makefile Checkout src/share/man/man4/bce.4 Edit src/share/man/man4/miibus.4 Edit src/sys/amd64/conf/GENERIC Edit src/sys/conf/files Edit src/sys/conf/options Checkout src/sys/dev/bce/if_bce.c Checkout src/sys/dev/bce/if_bcefw.h Checkout src/sys/dev/bce/if_bcereg.h Edit src/sys/dev/ipw/if_ipw.c Edit src/sys/dev/ipw/if_ipwvar.h Edit src/sys/dev/mii/brgphy.c Edit src/sys/dev/mii/miidevs Edit src/sys/i386/conf/GENERIC Edit src/sys/modules/Makefile Checkout src/sys/modules/bce/Makefile Edit src/usr.sbin/wpa/wpa_supplicant/Packet32.c Finished successfully Maybe something in this changes make things better ? Anyway, I will reboot this night (with WITNESS and friends) but maybe have to reverse it if the performances are too bad :-/ Thanks for your concern, Henri P.S. ARGH... buildkernel failed with: cc -c -O -pipe -march=pentium3 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -g -nostdinc -I- -I. -I/usr/src/sys -I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter -I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm -I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Werror /usr/src/sys/dev/ata/atapi-cd.c /usr/src/sys/dev/ata/atapi-cd.c: In function `acd_geom_attach': /usr/src/sys/dev/ata/atapi-cd.c:179: warning: implicit declaration of function `_sx_assert' /usr/src/sys/dev/ata/atapi-cd.c:179: warning: nested extern declaration of `_sx_assert' *** Error code 1 Stop in /usr/obj/usr/src/sys/MORZINE. *** Error code 1 Stop in /usr/src. *** Error code 1 I run cvsup and retry... I keep you posted. Robert N M Watson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.1RC system nearly freezing
On Mon, 24 Apr 2006, Henri Hennebert wrote: Are you running with WITNESS and INVARIANTS enabled? If not, could you do so and see if the problem is reproduceable, and if so, whether or not WITNESS (and friends) generate any warnings? It looks like something has leaked a lock, resulting in deadlock. The question is, however, which lock, and where. WITNESS may be able to provide some insight into this; if you could run "show alllocks" with WITNESS in place, that would be helpful also. I add WITNESS and INVARIANTS to my config and the next freeze/boot will have it [see PS]. This server is in production and running with a newer kernel for more than 5 days now. The diff (from apr 13) with the previous kernel [the one with the last freeze] are: Make sure you are also compiling in INVARIANT_SUPPORT and WITNESS_SKIPSPIN. Robert N M Watson Connected to cvsup.ciger.be Updating collection src-all/cvs Edit src/etc/sendmail/freebsd.mc Edit src/etc/sendmail/freebsd.submit.mc Edit src/lib/libc/gen/vis.3 Edit src/release/doc/en_US.ISO8859-1/hardware/common/dev.sgml Edit src/release/doc/share/misc/dev.archlist.txt Edit src/sbin/geom/core/geom.c Edit src/share/man/man4/Makefile Checkout src/share/man/man4/bce.4 Edit src/share/man/man4/miibus.4 Edit src/sys/amd64/conf/GENERIC Edit src/sys/conf/files Edit src/sys/conf/options Checkout src/sys/dev/bce/if_bce.c Checkout src/sys/dev/bce/if_bcefw.h Checkout src/sys/dev/bce/if_bcereg.h Edit src/sys/dev/ipw/if_ipw.c Edit src/sys/dev/ipw/if_ipwvar.h Edit src/sys/dev/mii/brgphy.c Edit src/sys/dev/mii/miidevs Edit src/sys/i386/conf/GENERIC Edit src/sys/modules/Makefile Checkout src/sys/modules/bce/Makefile Edit src/usr.sbin/wpa/wpa_supplicant/Packet32.c Finished successfully Maybe something in this changes make things better ? Anyway, I will reboot this night (with WITNESS and friends) but maybe have to reverse it if the performances are too bad :-/ Thanks for your concern, Henri P.S. ARGH... buildkernel failed with: cc -c -O -pipe -march=pentium3 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -g -nostdinc -I- -I. -I/usr/src/sys -I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter -I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm -I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Werror /usr/src/sys/dev/ata/atapi-cd.c /usr/src/sys/dev/ata/atapi-cd.c: In function `acd_geom_attach': /usr/src/sys/dev/ata/atapi-cd.c:179: warning: implicit declaration of function `_sx_assert' /usr/src/sys/dev/ata/atapi-cd.c:179: warning: nested extern declaration of `_sx_assert' *** Error code 1 Stop in /usr/obj/usr/src/sys/MORZINE. *** Error code 1 Stop in /usr/src. *** Error code 1 I run cvsup and retry... I keep you posted. Robert N M Watson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fsck_ufs locked in snaplk
On Mon, 24 Apr 2006, Kris Kennaway wrote: KK> > What bothers me most is that it is the only machine reproducibly hangs in KK> > snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. Other KK> > RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!) KK> KK> Are you quite certain it's running up-to-date RELENG_6_1? All known KK> snapshot deadlock issues were believed to have been fixed a few weeks KK> ago. If so, we might need you to enable extra debugging to track this KK> down. Yes, I'm sure it's recent RELENG_6_1: [EMAIL PROTECTED]:/usr/src> cvs stat sys/ufs/ffs/ffs_snapshot.c === File: ffs_snapshot.cStatus: Up-to-date Working revision:1.103.2.5 Repository revision: 1.103.2.5 /home/ncvs/src/sys/ufs/ffs/ffs_snapshot.c,v Sticky Tag: RELENG_6_1 (branch: 1.103.2.5.2) Sticky Date: (none) Sticky Options: (none) [EMAIL PROTECTED]:/usr/src> cvs -R up P etc/rc.d/SERVERS P release/doc/en_US.ISO8859-1/errata/article.sgml P release/doc/en_US.ISO8859-1/relnotes/common/new.sgml P release/doc/share/sgml/release.ent P release/doc/zh_CN.GB2312/errata/article.sgml P release/doc/zh_CN.GB2312/relnotes/common/new.sgml P sys/amd64/amd64/identcpu.c P sys/amd64/amd64/initcpu.c P sys/amd64/amd64/pmap.c P sys/amd64/include/md_var.h P sys/amd64/include/specialreg.h P sys/i386/i386/identcpu.c P sys/i386/i386/initcpu.c P sys/i386/include/md_var.h P sys/i386/include/specialreg.h P sys/ia64/ia64/nexus.c (all these changes are non-relevant, are they?) I'll try to build DDB kernel tomorrow evening to check. Which commands should I issue in ddb ? Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fsck_ufs locked in snaplk
On Mon, 24 Apr 2006, Michael Butler wrote: MB> Dmitry Morozovsky wrote: MB> > one of my servers had to be rebooted uncleanly and then I have MB> > backgrounded fsck locked for more than an our in snaplk: MB> MB> Given that this system came down uncleanly, have you tried starting up in MB> single-user and manually doing an fsck (without '-p') on the afflicted MB> file-system? My guess is that there's something there which can't be MB> resolved automagically, Yes I did, and I'd completely disabled bgfsck. However, after it I got snapshot lock in nightly backup, and a second one on a manual test. Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.1RC system nearly freezing
Robert Watson wrote: On Mon, 24 Apr 2006, Henri Hennebert wrote: Are you running with WITNESS and INVARIANTS enabled? If not, could you do so and see if the problem is reproduceable, and if so, whether or not WITNESS (and friends) generate any warnings? It looks like something has leaked a lock, resulting in deadlock. The question is, however, which lock, and where. WITNESS may be able to provide some insight into this; if you could run "show alllocks" with WITNESS in place, that would be helpful also. I add WITNESS and INVARIANTS to my config and the next freeze/boot will have it [see PS]. This server is in production and running with a newer kernel for more than 5 days now. The diff (from apr 13) with the previous kernel [the one with the last freeze] are: Make sure you are also compiling in INVARIANT_SUPPORT and WITNESS_SKIPSPIN. Sorry. I add this and kernel is builded now. At the first boot I got a panic on a gif interface. I don't need it now so I commented it out of rc.conf.local and reboot. New kernel is running with WITNESS and INVARIANT... System seems not too loaded ... I am at home and the serial console is not connected. I snapshot the KVM of the panic and attatch it to this mail for the record. Henri --- remaining of previous mail clipped --- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fsck_ufs locked in snaplk
On Tue, Apr 25, 2006 at 12:24:07AM +0400, Dmitry Morozovsky wrote: > On Mon, 24 Apr 2006, Kris Kennaway wrote: > > KK> > What bothers me most is that it is the only machine reproducibly hangs > in > KK> > snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. > Other > KK> > RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!) > KK> > KK> Are you quite certain it's running up-to-date RELENG_6_1? All known > KK> snapshot deadlock issues were believed to have been fixed a few weeks > KK> ago. If so, we might need you to enable extra debugging to track this > KK> down. > > Yes, I'm sure it's recent RELENG_6_1: > > [EMAIL PROTECTED]:/usr/src> cvs stat sys/ufs/ffs/ffs_snapshot.c > === > File: ffs_snapshot.cStatus: Up-to-date > >Working revision:1.103.2.5 >Repository revision: 1.103.2.5 > /home/ncvs/src/sys/ufs/ffs/ffs_snapshot.c,v >Sticky Tag: RELENG_6_1 (branch: 1.103.2.5.2) >Sticky Date: (none) >Sticky Options: (none) > > [EMAIL PROTECTED]:/usr/src> cvs -R up > P etc/rc.d/SERVERS > P release/doc/en_US.ISO8859-1/errata/article.sgml > P release/doc/en_US.ISO8859-1/relnotes/common/new.sgml > P release/doc/share/sgml/release.ent > P release/doc/zh_CN.GB2312/errata/article.sgml > P release/doc/zh_CN.GB2312/relnotes/common/new.sgml > P sys/amd64/amd64/identcpu.c > P sys/amd64/amd64/initcpu.c > P sys/amd64/amd64/pmap.c > P sys/amd64/include/md_var.h > P sys/amd64/include/specialreg.h > P sys/i386/i386/identcpu.c > P sys/i386/i386/initcpu.c > P sys/i386/include/md_var.h > P sys/i386/include/specialreg.h > P sys/ia64/ia64/nexus.c > > (all these changes are non-relevant, are they?) Yes. > I'll try to build DDB kernel tomorrow evening to check. Which commands should > I > issue in ddb ? 'show lockedvnods', 'ps' and 'alltrace' are important. Kris pgp1WVY2UNUz8.pgp Description: PGP signature
Re: 6.1RC system nearly freezing
On Mon, Apr 24, 2006 at 10:28:11PM +0200, Henri Hennebert wrote: > At the first boot I got a panic on a gif interface. I don't need it now so > I commented it out of rc.conf.local and reboot. What panic? This shouldn't happen, naturally. > I am at home and the serial console is not connected. I snapshot the KVM of > the panic > and attatch it to this mail for the record. Binary attachments are stripped by the list, so you'll need to put this online. Kris pgp7GRJ8OgX26.pgp Description: PGP signature
Re: fsck_ufs locked in snaplk
On Mon, 24 Apr 2006, Kris Kennaway wrote: KK> > I'll try to build DDB kernel tomorrow evening to check. Which commands should I KK> > issue in ddb ? KK> KK> 'show lockedvnods', 'ps' and 'alltrace' are important. Last note: are these lines added enough? Or some are unneeded? options KDB options KDB_TRACE options KDB_UNATTENDED options DDB options INVARIANTS options INVARIANT_SUPPORT options WITNESS Thanks. Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fsck_ufs locked in snaplk
On Tue, Apr 25, 2006 at 12:45:08AM +0400, Dmitry Morozovsky wrote: > On Mon, 24 Apr 2006, Kris Kennaway wrote: > > KK> > I'll try to build DDB kernel tomorrow evening to check. Which commands > should I > KK> > issue in ddb ? > KK> > KK> 'show lockedvnods', 'ps' and 'alltrace' are important. > > Last note: are these lines added enough? Or some are unneeded? > > options KDB_TRACE > options KDB_UNATTENDED These two aren't needed. Also you should add DEBUG_LOCKS and DEBUG_VFS_LOCKS on the off chance they catch the problem. Kris pgp7mERt9bwOA.pgp Description: PGP signature