Re: FreeBSD 4.9 losing mbufs!!!

2006-04-24 Thread Robert Watson

On Tue, 18 Apr 2006, Stephen Clark wrote:

I have discovered that if I disable quaqqa/ospfd then I don't lose mbufs! 
This makes it appear that the mbuf leak is in the multicast routing logic. 
In fact I lose mbufs even with the both system basically idle but with a 100 
vpn/gre with multicast going on thru the gre then the vpn.


Any ideas on where to focus my continued investigation?

Thanks to everybody who has responded.


Steve,

Sorry not to have caught this thread earlier; I've been on travel for the last 
few weeks.  My general suggestion would be to try to narrow the code paths 
traversed to try to eliminate as much code as possible from the search.  It 
sounds like you've done that pretty effectively :-).


Typically, memory leaks occur in edge error cases, where the memory is not 
properly released, or ownership is unclear.  My suggestion would be to add 
counters (or look at existing counters where already present) and see if 
there's an error case being triggered in about the same quantity that mbuf 
leakage is occuring.  Chances are, there's an error being returned and a 
missing m_freem().


Based on your comments above, I might also pay attention to the routing socket 
path -- the rate of leak could correspond to the routing daemons talking to 
the network stack, rather than the rate of traffic.  For example, it could be 
that one of the routing messages is handled improperly resulting in a leak.


Unfortunately, tracking down memory leaks can be quite difficult, and tends to 
require a combination of dogged persistence and luck...


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.1RC system nearly freezing

2006-04-24 Thread Robert Watson


On Thu, 20 Apr 2006, Henri Hennebert wrote:

I upgrade a web, squid, mail server (under SMP with 2 pentium III) to 6.1-RC 
(Apr 9 2006) and encounter 2 `freezing'.


The system is still responding to http requests but I can't login on the 
console or through ssh -- no shell prompt. No more mail delivery.


I break to KDB and found more then 1000 sendmail processes waiting for 
devfs...


call boot(0) can't complete the shutdown process.

I join the KDB informations. Let me know if more informations are needed.


Are you running with WITNESS and INVARIANTS enabled?  If not, could you do so 
and see if the problem is reproduceable, and if so, whether or not WITNESS 
(and friends) generate any warnings?


It looks like something has leaked a lock, resulting in deadlock.  The 
question is, however, which lock, and where.  WITNESS may be able to provide 
some insight into this; if you could run "show alllocks" with WITNESS in 
place, that would be helpful also.


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 6.1-PRERELEASE (April 5, 2006) randomly rebooting on Dell Poweredge 650

2006-04-24 Thread Robert Watson

On Tue, 18 Apr 2006, Matt Watson wrote:

I'm not sure if this is the right place to be sending this or not but I 
figured I'd give it a shot.


This (freebsd-stable) is a good place to send the message.

Do you have a serial console on the box?  If not, could you try putting one on 
there?  Some kernel output, especially in the event of a crash, doesn't end up 
in the system log as (for example) the file system may not be available.  If 
you configure a serial console being logged to another machine, you can more 
reliably log crash information.  Information on setting up a serial console 
can be found in the FreeBSD Handbook, but the short of it is that you can put 
a null modem cable to another box on the first serial port, and add:


console="comconsole"

to /boot/loader.conf.  FreeBSD will normally reboot after a kernel panic if 
debugging isn't enabled, in which case you should see the output.  And if you 
don't see output, it's probably a bad hardware interaction rather than a 
kernel panic.


Robert N M Watson



The subject line pretty much says it all, I have a Dell Poweredge 650 box 
running 6.1-PRERELEASE which was cvsup'd on April 5, 2006.  The box has now 
twice rebooted on its own for no aparant reason.  Its a fresh install as 
well, and appears to have been doing this ever since it was installled. The 
first time the box was only up for approximently 2 days and rebooted, the 2nd 
time it was up for approximently 10 days.  I have all.log setup to log all 
syslog messages however when the reboot occured there is no information in 
the log which indicates anything going wrong...  Here is a small cut from the 
log at the time of the reboot.  As can be seen, one minute there is an imapd 
process, the next entry is the system restarting.



Apr 16 20:55:34 clearwater imapd: LOGOUT, user=XX, 
ip=[:::WWW.XXX.YYY.ZZZ], headers=0, body=0, time=0

Apr 16 20:59:36 clearwater syslogd: restart
Apr 16 20:59:36 clearwater syslogd: kernel boot file is /boot/kernel/kernel
Apr 16 20:59:36 clearwater kernel: Copyright (c) 1992-2006 The FreeBSD 
Project.
Apr 16 20:59:36 clearwater kernel: Copyright (c) 1979, 1980, 1983, 1986, 
1988, 1989, 1991, 1992, 1993, 1994
Apr 16 20:59:36 clearwater kernel: The Regents of the University of 
California. All rights reserved.
Apr 16 20:59:36 clearwater kernel: FreeBSD 6.1-PRERELEASE #0: Wed Apr  5 
20:46:37 EDT 2006


This machine previously had Linux installed on the box and did not display 
the same problems, so I'm going on the assumption that its not a hardware 
failure.


Aside from the reboots the box has been preforming extermely well.

If anybody can provide some insights or suggestions I'd greatly appreciate 
it.


Thanks,
Matt Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.1 prerelease graid3 livelock?

2006-04-24 Thread Pawel Jakub Dawidek
On Sun, Apr 23, 2006 at 12:04:33PM -0700, Bradley W. Dutton wrote:
+> Hi,
+> 
+> I'm experiencing a sort of livelock on a 6.1 prerelease box. It appears
+> all  of the IO related activity hangs but the box continues to do
+> routing/NAT/etc  for internet access from my other boxes. I can usually
+> get the lockup to occur within about 12 hours of booting.
+> 
+> I've narrowed down the commits to those on March 20 (kernel before then
+> works, kernel after then causes problems) and I think the problem is
+> geom/raid related. Besides a small gmirrored root partition the rest of my
+> partitions are all graid3. I'm not sure what information to provide to
+> help troubleshoot but I'm happy to do what's needed.
+> 
+> On an unrelated note the rebuild speed was about 50% faster on my box when
+> using the new geom/raid code introduced on March 20th.

Can you break into DDB (alt+ctrl+esc or send break via serial console)
and send me the output of 'traceall' command?

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpzFaU6jsST4.pgp
Description: PGP signature


HP DL145G2 SCSC Raid Controlle Q

2006-04-24 Thread Thomas Krause
Hello,
I hope, this question is not off topic for this list.
In the HP config tool for the DL145 I can select a
HP PL100 SCSI RAID Controller (PN 355671-B21)
Does this controller work with FreeBSD? I'm not sure, if this
is a relabled LSI controller.

Kind regards,
Thomas.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 4.9 losing mbufs!!!

2006-04-24 Thread Stephen Clark

Robert Watson wrote:


On Tue, 18 Apr 2006, Stephen Clark wrote:

 

I have discovered that if I disable quaqqa/ospfd then I don't lose mbufs! 
This makes it appear that the mbuf leak is in the multicast routing logic. 
In fact I lose mbufs even with the both system basically idle but with a 100 
vpn/gre with multicast going on thru the gre then the vpn.


Any ideas on where to focus my continued investigation?

Thanks to everybody who has responded.
   



Steve,

Sorry not to have caught this thread earlier; I've been on travel for the last 
few weeks.  My general suggestion would be to try to narrow the code paths 
traversed to try to eliminate as much code as possible from the search.  It 
sounds like you've done that pretty effectively :-).


Typically, memory leaks occur in edge error cases, where the memory is not 
properly released, or ownership is unclear.  My suggestion would be to add 
counters (or look at existing counters where already present) and see if 
there's an error case being triggered in about the same quantity that mbuf 
leakage is occuring.  Chances are, there's an error being returned and a 
missing m_freem().


Based on your comments above, I might also pay attention to the routing socket 
path -- the rate of leak could correspond to the routing daemons talking to 
the network stack, rather than the rate of traffic.  For example, it could be 
that one of the routing messages is handled improperly resulting in a leak.


Unfortunately, tracking down memory leaks can be quite difficult, and tends to 
require a combination of dogged persistence and luck...


Robert N M Watson

 


Robert,

Thanks for your response. I am in the process of moving our app to 6. 
stable to see if the problem still exists. If it does then maybe I can't 
generate some enthusiasm form the FreeBSD
community to take moew of an interest in the problem. I have a lot of C 
experience but not with the *BSD network stack, still trying to get a 
good understanding of the flow of the packets thru the stack.


Our next release will be based on 6 but that is months away. We have 
some Athon 64 X2 we are putting in that will handling 100 to 200 vpn/gre 
tunnels and right now ipintrq slowly grows
which eventually forces a reboot of the systems. Fortune 2000 companies 
don't like see that happen.


Regards,
Steve

--

"They that give up essential liberty to obtain temporary safety, 
deserve neither liberty nor safety."  (Ben Franklin)


"The course of history shows that as a government grows, liberty 
decreases."  (Thomas Jefferson)




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Stable Build Error

2006-04-24 Thread Lawrence Farr
I've been seeing this for a week or so, and have deleted
/usr/obj and run make


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Stable Build Error

2006-04-24 Thread Lawrence Farr
I've been seeing the following for a week or so, and have 
deleted /usr/obj , re-cvsupped and run 

cd /usr/src && make cleandir && make cleandir

make.conf is empty. Anyone got any ideas? 

===> kerberos5/tools/make-roken (all)
===> kerberos5/tools/asn1_compile (all)
cd /usr/src/kerberos5/tools/asn1_compile/../make-roken && make
cc -O2 -fno-strict-aliasing -pipe
-I/usr/src/kerberos5/tools/asn1_compile/../../../crypto/heimdal/lib/roken
-I/usr/src/kerberos5/tools/asn1_compile/../../../crypto/heimdal/lib/asn1 -I.
-DHAVE_CONFIG_H -I/usr/src/kerberos5/tools/asn1_comp
ile/../../include -DINET6  -I/usr/obj/usr/src/tmp/legacy/usr/include -c
/usr/src/kerberos5/tools/asn1_compile/../../../crypto/heimdal/lib/asn1/gen.c
In file included from ./roken.h:61,
 from
/usr/src/kerberos5/tools/asn1_compile/../../../crypto/heimdal/lib/asn1/gen_l
ocl.h:51,
 from
/usr/src/kerberos5/tools/asn1_compile/../../../crypto/heimdal/lib/asn1/gen.c
:34:
/usr/include/resolv.h:320: error: syntax error before "ns_tsig_key"
*** Error code 1

Stop in /usr/src/kerberos5/tools/asn1_compile.
*** Error code 1

Stop in /usr/src/kerberos5/tools.
*** Error code 1

Stop in /usr/src.
*** Error code 1

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Stable Build Error

2006-04-24 Thread Brooks Davis
On Mon, Apr 24, 2006 at 04:18:06PM +0100, Lawrence Farr wrote:
> I've been seeing this for a week or so, and have deleted
> /usr/obj and run make

My telepathy powers aren't working.  What's the error? :)

-- Brooks

-- 
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4


pgpkwSKxKYoVC.pgp
Description: PGP signature


Re: HP DL145G2 SCSC Raid Controlle Q

2006-04-24 Thread Dmitry Morozovsky
On Mon, 24 Apr 2006, Thomas Krause wrote:

TK> Hello,
TK> I hope, this question is not off topic for this list.
TK> In the HP config tool for the DL145 I can select a
TK> HP PL100 SCSI RAID Controller (PN 355671-B21)
TK> Does this controller work with FreeBSD? I'm not sure, if this
TK> is a relabled LSI controller.

Should work:

mpt0:  port 0x2000-0x20ff mem 
0xd812-0xd813,0xd810-0xd811 irq 32 at device 1.0 on pci134


Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: fsck_ufs locked in snaplk

2006-04-24 Thread Dmitry Morozovsky
On Mon, 24 Apr 2006, Dmitry Morozovsky wrote:

DM> kKK> > one of my servers had to be rebooted uncleanly and then I have 
backgrounded 
DM> KK> > fsck locked for more than an our in snaplk:
DM> KK> > 
DM> KK> > 742 root 1  -44  1320K   688K snaplk   0:02  0.00% 
fsck_ufs
DM> KK> > 
DM> KK> > File system in question is 200G gmirror on SATA. Usually making a 
snapshot 
DM> KK> > (e.g., for making dumps) consumes 3-4 minutes for that fs, so it 
seems to me 
DM> KK> > that filesystem is in a deadlock.
DM> KK> 
DM> KK> Is the process performing I/O?  Background fsck deliberately runs at a
DM> KK> slow rate so it does not destroy I/O performance on the rest of the
DM> KK> system.
DM> 
DM> Nope. For that case, 50+ smbds had been locked in 'ufs' state, so I've been 
DM> urged to revive the machine and reboot, turning off bgfsck.
DM> 
DM> This night, dump -L locks in the same position on the same filesystem:
DM> 
DM> 0  2887  2886   0  -4  0  1260   692 snaplk D ??0:01.28 
DM> /sbin/mksnap_ffs root0.0  0.1  5:19AM
DM> 
DM> it has been started at 5:19am, and now is 9:20 - no disk activity
DM> 
DM> 
DM> For the reference: it's fresh RELENG_6_1/i386.

Just rechecked it: did mksnap_ffs on an otherwise idle file system:

[EMAIL PROTECTED]:/> mksnap_ffs /st /st/.snap/test_snapshot
load: 0.02  cmd: mksnap_ffs 4012 [biord] 0.00u 0.04s 0% 696k
load: 0.04  cmd: mksnap_ffs 4012 [biord] 0.00u 0.44s 0% 696k
load: 0.21  cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.17s 0% 696k
load: 0.20  cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.23s 0% 696k
load: 0.13  cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k
load: 0.08  cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k
load: 0.01  cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k

(I hit ^T several times)

biord phase consumes about 1.5-2 mins,
snaprdb phase - about 30-40 secs, and then process died. Most disk requests
succeeds; however, accessing /st/.snap locks process in ufs state forever.

What bothers me most is that it is the only machine reproducibly hangs in 
snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. Other 
RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!)

Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: HP DL145G2 SCSC Raid Controlle Q

2006-04-24 Thread Thomas Krause

Dmitry Morozovsky schrieb:

On Mon, 24 Apr 2006, Thomas Krause wrote:

TK> Hello,
TK> I hope, this question is not off topic for this list.
TK> In the HP config tool for the DL145 I can select a
TK> HP PL100 SCSI RAID Controller (PN 355671-B21)
TK> Does this controller work with FreeBSD? I'm not sure, if this
TK> is a relabled LSI controller.

Should work:

mpt0:  port 0x2000-0x20ff mem 
0xd812-0xd813,0xd810-0xd811 irq 32 at device 1.0 on pci134


mhh, isn't mpt only a SCSI controller (not a RAID controller)?

best regards,
Thomas.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: fsck_ufs locked in snaplk

2006-04-24 Thread Kris Kennaway
On Mon, Apr 24, 2006 at 10:04:57PM +0400, Dmitry Morozovsky wrote:
> On Mon, 24 Apr 2006, Dmitry Morozovsky wrote:
> 
> DM> kKK> > one of my servers had to be rebooted uncleanly and then I have 
> backgrounded 
> DM> KK> > fsck locked for more than an our in snaplk:
> DM> KK> > 
> DM> KK> > 742 root 1  -44  1320K   688K snaplk   0:02  0.00% 
> fsck_ufs
> DM> KK> > 
> DM> KK> > File system in question is 200G gmirror on SATA. Usually making a 
> snapshot 
> DM> KK> > (e.g., for making dumps) consumes 3-4 minutes for that fs, so it 
> seems to me 
> DM> KK> > that filesystem is in a deadlock.
> DM> KK> 
> DM> KK> Is the process performing I/O?  Background fsck deliberately runs at a
> DM> KK> slow rate so it does not destroy I/O performance on the rest of the
> DM> KK> system.
> DM> 
> DM> Nope. For that case, 50+ smbds had been locked in 'ufs' state, so I've 
> been 
> DM> urged to revive the machine and reboot, turning off bgfsck.
> DM> 
> DM> This night, dump -L locks in the same position on the same filesystem:
> DM> 
> DM> 0  2887  2886   0  -4  0  1260   692 snaplk D ??0:01.28 
> DM> /sbin/mksnap_ffs root0.0  0.1  5:19AM
> DM> 
> DM> it has been started at 5:19am, and now is 9:20 - no disk activity
> DM> 
> DM> 
> DM> For the reference: it's fresh RELENG_6_1/i386.
> 
> Just rechecked it: did mksnap_ffs on an otherwise idle file system:
> 
> [EMAIL PROTECTED]:/> mksnap_ffs /st /st/.snap/test_snapshot
> load: 0.02  cmd: mksnap_ffs 4012 [biord] 0.00u 0.04s 0% 696k
> load: 0.04  cmd: mksnap_ffs 4012 [biord] 0.00u 0.44s 0% 696k
> load: 0.21  cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.17s 0% 696k
> load: 0.20  cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.23s 0% 696k
> load: 0.13  cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k
> load: 0.08  cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k
> load: 0.01  cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k
> 
> (I hit ^T several times)
> 
> biord phase consumes about 1.5-2 mins,
> snaprdb phase - about 30-40 secs, and then process died. Most disk requests
> succeeds; however, accessing /st/.snap locks process in ufs state forever.
> 
> What bothers me most is that it is the only machine reproducibly hangs in 
> snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. Other 
> RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!)

Are you quite certain it's running up-to-date RELENG_6_1?  All known
snapshot deadlock issues were believed to have been fixed a few weeks
ago.  If so, we might need you to enable extra debugging to track this
down.

Kris


pgpUe2Ty1ssUR.pgp
Description: PGP signature


Re: fsck_ufs locked in snaplk

2006-04-24 Thread Michael Butler

Dmitry Morozovsky wrote:
one of my servers had to be rebooted uncleanly and then I have backgrounded 
fsck locked for more than an our in snaplk:


Given that this system came down uncleanly, have you tried starting up 
in single-user and manually doing an fsck (without '-p') on the 
afflicted file-system? My guess is that there's something there which 
can't be resolved automagically,


--
Michael Butler, CISSP
Security Architect
Protected Networks
http://www.protected-networks.net


smime.p7s
Description: S/MIME Cryptographic Signature


Re: 6.1RC system nearly freezing

2006-04-24 Thread Henri Hennebert

Robert Watson wrote:


On Thu, 20 Apr 2006, Henri Hennebert wrote:

I upgrade a web, squid, mail server (under SMP with 2 pentium III) to 
6.1-RC (Apr 9 2006) and encounter 2 `freezing'.


The system is still responding to http requests but I can't login on 
the console or through ssh -- no shell prompt. No more mail delivery.


I break to KDB and found more then 1000 sendmail processes waiting for 
devfs...


call boot(0) can't complete the shutdown process.

I join the KDB informations. Let me know if more informations are needed.


Are you running with WITNESS and INVARIANTS enabled?  If not, could you 
do so and see if the problem is reproduceable, and if so, whether or not 
WITNESS (and friends) generate any warnings?


It looks like something has leaked a lock, resulting in deadlock.  The 
question is, however, which lock, and where.  WITNESS may be able to 
provide some insight into this; if you could run "show alllocks" with 
WITNESS in place, that would be helpful also.


I add WITNESS and INVARIANTS to my config and the next freeze/boot will have it 
[see PS].

This server is in production and running with a newer kernel for more than 5 
days now.

The diff (from apr 13) with the previous kernel [the one with the last freeze] 
are:

Connected to cvsup.ciger.be
Updating collection src-all/cvs
 Edit src/etc/sendmail/freebsd.mc
 Edit src/etc/sendmail/freebsd.submit.mc
 Edit src/lib/libc/gen/vis.3
 Edit src/release/doc/en_US.ISO8859-1/hardware/common/dev.sgml
 Edit src/release/doc/share/misc/dev.archlist.txt
 Edit src/sbin/geom/core/geom.c
 Edit src/share/man/man4/Makefile
 Checkout src/share/man/man4/bce.4
 Edit src/share/man/man4/miibus.4
 Edit src/sys/amd64/conf/GENERIC
 Edit src/sys/conf/files
 Edit src/sys/conf/options
 Checkout src/sys/dev/bce/if_bce.c
 Checkout src/sys/dev/bce/if_bcefw.h
 Checkout src/sys/dev/bce/if_bcereg.h
 Edit src/sys/dev/ipw/if_ipw.c
 Edit src/sys/dev/ipw/if_ipwvar.h
 Edit src/sys/dev/mii/brgphy.c
 Edit src/sys/dev/mii/miidevs
 Edit src/sys/i386/conf/GENERIC
 Edit src/sys/modules/Makefile
 Checkout src/sys/modules/bce/Makefile
 Edit src/usr.sbin/wpa/wpa_supplicant/Packet32.c
Finished successfully

Maybe something in this changes make things better ?

Anyway, I will reboot this night (with WITNESS and friends)
but maybe have to reverse it if the performances are too bad :-/

Thanks for your concern,

Henri

P.S.

ARGH...

buildkernel failed with:

cc -c -O -pipe -march=pentium3 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes 
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -fformat-extensions -std=c99 -g -nostdinc 
-I-  -I. -I/usr/src/sys -I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter 
-I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd 
-I/usr/src/sys/contrib/ngatm -I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include 
opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --param 
large-function-growth=1000  -mno-align-long-strings -mpreferred-stack-boundary=2  -mno-mmx 
-mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Werror  /usr/src/sys/dev/ata/atapi-cd.c

/usr/src/sys/dev/ata/atapi-cd.c: In function `acd_geom_attach':
/usr/src/sys/dev/ata/atapi-cd.c:179: warning: implicit declaration of function 
`_sx_assert'
/usr/src/sys/dev/ata/atapi-cd.c:179: warning: nested extern declaration of 
`_sx_assert'
*** Error code 1

Stop in /usr/obj/usr/src/sys/MORZINE.
*** Error code 1

Stop in /usr/src.
*** Error code 1

I run cvsup and retry...

I keep you posted.



Robert N M Watson

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.1RC system nearly freezing

2006-04-24 Thread Robert Watson


On Mon, 24 Apr 2006, Henri Hennebert wrote:

Are you running with WITNESS and INVARIANTS enabled?  If not, could you do 
so and see if the problem is reproduceable, and if so, whether or not 
WITNESS (and friends) generate any warnings?


It looks like something has leaked a lock, resulting in deadlock.  The 
question is, however, which lock, and where.  WITNESS may be able to 
provide some insight into this; if you could run "show alllocks" with 
WITNESS in place, that would be helpful also.


I add WITNESS and INVARIANTS to my config and the next freeze/boot will have 
it [see PS].


This server is in production and running with a newer kernel for more than 5 
days now.


The diff (from apr 13) with the previous kernel [the one with the last 
freeze] are:


Make sure you are also compiling in INVARIANT_SUPPORT and WITNESS_SKIPSPIN.

Robert N M Watson



Connected to cvsup.ciger.be
Updating collection src-all/cvs
Edit src/etc/sendmail/freebsd.mc
Edit src/etc/sendmail/freebsd.submit.mc
Edit src/lib/libc/gen/vis.3
Edit src/release/doc/en_US.ISO8859-1/hardware/common/dev.sgml
Edit src/release/doc/share/misc/dev.archlist.txt
Edit src/sbin/geom/core/geom.c
Edit src/share/man/man4/Makefile
Checkout src/share/man/man4/bce.4
Edit src/share/man/man4/miibus.4
Edit src/sys/amd64/conf/GENERIC
Edit src/sys/conf/files
Edit src/sys/conf/options
Checkout src/sys/dev/bce/if_bce.c
Checkout src/sys/dev/bce/if_bcefw.h
Checkout src/sys/dev/bce/if_bcereg.h
Edit src/sys/dev/ipw/if_ipw.c
Edit src/sys/dev/ipw/if_ipwvar.h
Edit src/sys/dev/mii/brgphy.c
Edit src/sys/dev/mii/miidevs
Edit src/sys/i386/conf/GENERIC
Edit src/sys/modules/Makefile
Checkout src/sys/modules/bce/Makefile
Edit src/usr.sbin/wpa/wpa_supplicant/Packet32.c
Finished successfully

Maybe something in this changes make things better ?

Anyway, I will reboot this night (with WITNESS and friends)
but maybe have to reverse it if the performances are too bad :-/

Thanks for your concern,

Henri

P.S.

ARGH...

buildkernel failed with:

cc -c -O -pipe -march=pentium3 -Wall -Wredundant-decls -Wnested-externs 
-Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual 
-fformat-extensions -std=c99 -g -nostdinc -I-  -I. -I/usr/src/sys 
-I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter 
-I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath 
-I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm 
-I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include 
opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 
--param large-function-growth=1000  -mno-align-long-strings 
-mpreferred-stack-boundary=2  -mno-mmx -mno-3dnow -mno-sse -mno-sse2 
-ffreestanding -Werror  /usr/src/sys/dev/ata/atapi-cd.c

/usr/src/sys/dev/ata/atapi-cd.c: In function `acd_geom_attach':
/usr/src/sys/dev/ata/atapi-cd.c:179: warning: implicit declaration of 
function `_sx_assert'
/usr/src/sys/dev/ata/atapi-cd.c:179: warning: nested extern declaration of 
`_sx_assert'

*** Error code 1

Stop in /usr/obj/usr/src/sys/MORZINE.
*** Error code 1

Stop in /usr/src.
*** Error code 1

I run cvsup and retry...

I keep you posted.



Robert N M Watson



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: fsck_ufs locked in snaplk

2006-04-24 Thread Dmitry Morozovsky
On Mon, 24 Apr 2006, Kris Kennaway wrote:

KK> > What bothers me most is that it is the only machine reproducibly hangs in 
KK> > snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. Other 
KK> > RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!)
KK> 
KK> Are you quite certain it's running up-to-date RELENG_6_1?  All known
KK> snapshot deadlock issues were believed to have been fixed a few weeks
KK> ago.  If so, we might need you to enable extra debugging to track this
KK> down.

Yes, I'm sure it's recent RELENG_6_1:

[EMAIL PROTECTED]:/usr/src> cvs stat sys/ufs/ffs/ffs_snapshot.c 
===
File: ffs_snapshot.cStatus: Up-to-date

   Working revision:1.103.2.5
   Repository revision: 1.103.2.5   
/home/ncvs/src/sys/ufs/ffs/ffs_snapshot.c,v
   Sticky Tag:  RELENG_6_1 (branch: 1.103.2.5.2)
   Sticky Date: (none)
   Sticky Options:  (none)

[EMAIL PROTECTED]:/usr/src> cvs -R up
P etc/rc.d/SERVERS
P release/doc/en_US.ISO8859-1/errata/article.sgml
P release/doc/en_US.ISO8859-1/relnotes/common/new.sgml
P release/doc/share/sgml/release.ent
P release/doc/zh_CN.GB2312/errata/article.sgml
P release/doc/zh_CN.GB2312/relnotes/common/new.sgml
P sys/amd64/amd64/identcpu.c
P sys/amd64/amd64/initcpu.c
P sys/amd64/amd64/pmap.c
P sys/amd64/include/md_var.h
P sys/amd64/include/specialreg.h
P sys/i386/i386/identcpu.c
P sys/i386/i386/initcpu.c
P sys/i386/include/md_var.h
P sys/i386/include/specialreg.h
P sys/ia64/ia64/nexus.c

(all these changes are non-relevant, are they?)

I'll try to build DDB kernel tomorrow evening to check. Which commands should I 
issue in ddb ?

Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: fsck_ufs locked in snaplk

2006-04-24 Thread Dmitry Morozovsky
On Mon, 24 Apr 2006, Michael Butler wrote:

MB> Dmitry Morozovsky wrote:
MB> > one of my servers had to be rebooted uncleanly and then I have
MB> > backgrounded fsck locked for more than an our in snaplk:
MB> 
MB> Given that this system came down uncleanly, have you tried starting up in
MB> single-user and manually doing an fsck (without '-p') on the afflicted
MB> file-system? My guess is that there's something there which can't be
MB> resolved automagically,

Yes I did, and I'd completely disabled bgfsck. However, after it I got snapshot 
lock in nightly backup, and a second one on a manual test.


Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.1RC system nearly freezing

2006-04-24 Thread Henri Hennebert

Robert Watson wrote:


On Mon, 24 Apr 2006, Henri Hennebert wrote:

Are you running with WITNESS and INVARIANTS enabled?  If not, could 
you do so and see if the problem is reproduceable, and if so, whether 
or not WITNESS (and friends) generate any warnings?


It looks like something has leaked a lock, resulting in deadlock.  
The question is, however, which lock, and where.  WITNESS may be able 
to provide some insight into this; if you could run "show alllocks" 
with WITNESS in place, that would be helpful also.


I add WITNESS and INVARIANTS to my config and the next freeze/boot 
will have it [see PS].


This server is in production and running with a newer kernel for more 
than 5 days now.


The diff (from apr 13) with the previous kernel [the one with the last 
freeze] are:


Make sure you are also compiling in INVARIANT_SUPPORT and WITNESS_SKIPSPIN.



Sorry. I add this and kernel is builded now.

At the first boot I got a panic on a gif interface. I don't need it now so
I commented it out of rc.conf.local and reboot.

New kernel is running with WITNESS and INVARIANT... System seems not too loaded 
...

I am at home and the serial console is not connected. I snapshot the KVM of the 
panic
and attatch it to this mail for the record.

Henri


--- remaining of previous mail clipped ---


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: fsck_ufs locked in snaplk

2006-04-24 Thread Kris Kennaway
On Tue, Apr 25, 2006 at 12:24:07AM +0400, Dmitry Morozovsky wrote:
> On Mon, 24 Apr 2006, Kris Kennaway wrote:
> 
> KK> > What bothers me most is that it is the only machine reproducibly hangs 
> in 
> KK> > snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. 
> Other 
> KK> > RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!)
> KK> 
> KK> Are you quite certain it's running up-to-date RELENG_6_1?  All known
> KK> snapshot deadlock issues were believed to have been fixed a few weeks
> KK> ago.  If so, we might need you to enable extra debugging to track this
> KK> down.
> 
> Yes, I'm sure it's recent RELENG_6_1:
> 
> [EMAIL PROTECTED]:/usr/src> cvs stat sys/ufs/ffs/ffs_snapshot.c 
> ===
> File: ffs_snapshot.cStatus: Up-to-date
> 
>Working revision:1.103.2.5
>Repository revision: 1.103.2.5   
> /home/ncvs/src/sys/ufs/ffs/ffs_snapshot.c,v
>Sticky Tag:  RELENG_6_1 (branch: 1.103.2.5.2)
>Sticky Date: (none)
>Sticky Options:  (none)
> 
> [EMAIL PROTECTED]:/usr/src> cvs -R up
> P etc/rc.d/SERVERS
> P release/doc/en_US.ISO8859-1/errata/article.sgml
> P release/doc/en_US.ISO8859-1/relnotes/common/new.sgml
> P release/doc/share/sgml/release.ent
> P release/doc/zh_CN.GB2312/errata/article.sgml
> P release/doc/zh_CN.GB2312/relnotes/common/new.sgml
> P sys/amd64/amd64/identcpu.c
> P sys/amd64/amd64/initcpu.c
> P sys/amd64/amd64/pmap.c
> P sys/amd64/include/md_var.h
> P sys/amd64/include/specialreg.h
> P sys/i386/i386/identcpu.c
> P sys/i386/i386/initcpu.c
> P sys/i386/include/md_var.h
> P sys/i386/include/specialreg.h
> P sys/ia64/ia64/nexus.c
> 
> (all these changes are non-relevant, are they?)

Yes.

> I'll try to build DDB kernel tomorrow evening to check. Which commands should 
> I 
> issue in ddb ?

'show lockedvnods', 'ps' and 'alltrace' are important.

Kris


pgp1WVY2UNUz8.pgp
Description: PGP signature


Re: 6.1RC system nearly freezing

2006-04-24 Thread Kris Kennaway
On Mon, Apr 24, 2006 at 10:28:11PM +0200, Henri Hennebert wrote:

> At the first boot I got a panic on a gif interface. I don't need it now so
> I commented it out of rc.conf.local and reboot.

What panic?  This shouldn't happen, naturally.

> I am at home and the serial console is not connected. I snapshot the KVM of 
> the panic
> and attatch it to this mail for the record.

Binary attachments are stripped by the list, so you'll need to put
this online.

Kris


pgp7GRJ8OgX26.pgp
Description: PGP signature


Re: fsck_ufs locked in snaplk

2006-04-24 Thread Dmitry Morozovsky
On Mon, 24 Apr 2006, Kris Kennaway wrote:

KK> > I'll try to build DDB kernel tomorrow evening to check. Which commands 
should I 
KK> > issue in ddb ?
KK> 
KK> 'show lockedvnods', 'ps' and 'alltrace' are important.

Last note: are these lines added enough? Or some are unneeded?

options KDB
options KDB_TRACE
options KDB_UNATTENDED
options DDB

options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS

Thanks.

Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: fsck_ufs locked in snaplk

2006-04-24 Thread Kris Kennaway
On Tue, Apr 25, 2006 at 12:45:08AM +0400, Dmitry Morozovsky wrote:
> On Mon, 24 Apr 2006, Kris Kennaway wrote:
> 
> KK> > I'll try to build DDB kernel tomorrow evening to check. Which commands 
> should I 
> KK> > issue in ddb ?
> KK> 
> KK> 'show lockedvnods', 'ps' and 'alltrace' are important.
> 
> Last note: are these lines added enough? Or some are unneeded?
> 

> options KDB_TRACE
> options KDB_UNATTENDED

These two aren't needed.

Also you should add DEBUG_LOCKS and DEBUG_VFS_LOCKS on the off chance
they catch the problem.

Kris


pgp7mERt9bwOA.pgp
Description: PGP signature