Re: repeatable crash with 5.4-RELEASE and PAE
As seen here: http://groups-beta.google.com/group/mailing.freebsd.stable/browse_thread/thread/ad8029efaa6efe95/8f3269d6819ef0b8?lnk=st&q=panic+vm_pageout+fork_trampoline+fork_exit+freebsd+5.4&rnum=1&hl=en#8f3269d6819ef0b8 If you'd prefer, I can mail this to another list, in case stable isn't appropriate. We're seeing exactly the same panic here, with slight changes to the addresses referenced (phew, I don't have to transcribe it!) Our setup is: Dual Xeon 3.0Ghz 4 gigs of ram 3ware RAID10 (4 x 400GB) FreeBSD 5.4-RELEASE-p5 We're running a PAE kernel as well. The main difference between his kernel and ours is that we're not using acpi due to extremely poor network performance. Unfortunately the kernel will not dump core (even with a 16GB swap device). When I do a 'call doadump', it gives this error: Dumping 4608MB twa0: SCSI cmd=0x2a: ERROR (0x3: 0x0100): SGL entry contains zero data: address=0x0, length=0x1, cmd=W The original panic line is: panic: lockmgr: thread 0xfff0, not exclusive lock holder c5f5fa80 unlocking Here's the lockmgr line from the panic: lockmgr(c61f5e14,6,c61f5d68,0,eb858a1c) lockmgr+0x421 lockmgr's 2nd argument was 6, LK_RELEASE, and lockmgr's 4th argument, the thread, was 0, so the panic was caused because a kernel thread was trying to release a lock for a non-kernel thread. Am I interpreting that correctly? 'show lockedvnods' reveals: 0xc61f5d68: tag devfs, type VCHR, usecount 4846, writecount 0, refcount 78, flags (VV_OBJBUF), lock type devfs: EXCL (count 1) by 0xc5f5fa80 (pid 8) dev da0s1a pid 8 is 'pagedaemon'. Any ideas/pointers? I've been going through the source and trying to figure out what's calling the lockmgr. I keep coming back to spec_write(), which doesn't appear to be used by any filesystems (unless it's used by procfs?) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Kernel debugging, 5.4-RELEASE
What method do kernel developers employ to debug kernel panics? The gdb that comes with 5.4-RELEASE does not have kernel debugging support and the handbook appears to be out of date with regards to KDB. I'm trying to use another gdb, out of ports, that has kernel debugging support but I'm getting the following results: /usr/ports/devel/gdb6/work/gdb+dejagnu-20040810/gdb/gdb -k kernel.debug GNU gdb 20040810 [GDB v6.x for FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-portbld-freebsd5.4"... (kgdb) target remote /dev/cuaa0 Remote debugging using /dev/cuaa0 0xc035646c in kdb_enter () warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. warning: shared library handler failed to enable breakpoint (kgdb) bt #0 0xc035646c in kdb_enter () #1 0xc033ea1f in panic () #2 0xc0333181 in lockmgr () #3 0xc038b08b in vop_stdunlock () #4 0xc038af3b in vop_defaultop () #5 0xc03010bb in spec_vnoperate () #6 0xc0301648 in spec_write () etc IE, it's not giving me any argument information. The target machine was booted with kernel.debug, an unstripped kernel built with -g. What additional steps do the kernel developers take to get the arguments? (Unfortunately I cannot get this machine to dump core to swap, so it has to be done over remote) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Kernel debugging, 5.4-RELEASE
On Tue, 2 Aug 2005, Mitch Parks wrote: > man kgdb > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html Thank you. I'm using kgdb now, and I get the following error: $ kgdb -r /dev/cuaa0 kernel.debug [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Switching to remote protocol 0xc035646c in kdb_enter () Segmentation fault The two servers (the panic one and the one running kgdb) are identical, except the one running kgdb is not using PAE. The kernel.debug file is identical to the remote server however. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Kernel debugging, 5.4-RELEASE
On Tue, 2 Aug 2005, dpk wrote: > On Tue, 2 Aug 2005, Mitch Parks wrote: > > > man kgdb > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html > > Thank you. I'm using kgdb now, and I get the following error: > > $ kgdb -r /dev/cuaa0 kernel.debug > [GDB will not be able to debug user-mode threads: > /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "i386-marcel-freebsd". > Switching to remote protocol > 0xc035646c in kdb_enter () > Segmentation fault > > The two servers (the panic one and the one running kgdb) are identical, > except the one running kgdb is not using PAE. The kernel.debug file is > identical to the remote server however. If I run it without arguments, and then enter "file kernel.debug" and "target remote /dev/cuaa0" it does not crash. However, when the remote server is set to use the GDB backend, all that comes up is: 0x in ?? () bt and others do nothing to help determine the problem. If I boot the machine with "boot -d", and attach then, hit "continue" and let it run until it panics, I get this: Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to PID 100127 TID 0] 0x in ?? () (kgdb) Seems I can't extract anything useful from that prompt, either. Is there anything else I can try, to get some debugging information? The steps outlined in the handbook don't seem to be working. I've been at this for a couple days, so I apologize if I am curt, I'm just trying to get a useful backtrace to submit as a PR. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Kernel debugging, 5.4-RELEASE
On Tue, 2 Aug 2005, Frank McConnell wrote: > I've been using plain bog-standard /usr/bin/gdb, not out of lack > of knowledge of kgdb but because I also find that kgdb fails with > a segmentation fault after connecting. Ah, OK. That solves part of the problem. > If I'm getting more stuff out of my backtrace, it is likely because I > have this in my kernel config: > > makeoptions DEBUG="-g" > > And likewise, sorry if I come across as being a bit too verbose or > grumpy. Guidance is welcome. > > -Frank McConnell I had that option, but one thing I did not do was a "make clean" before rebuilding with -g. I assumed "make depend" would handle what was necessary to rebuild any files that could use the -g, and I guess I just didn't pay close attention to the build process. How embarassing. gdb now gives a proper backtrace and soon I hope to have some useful data to submit in a PR. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: RELENG_5 PAE panic
On Thu, 4 Aug 2005, Frank McConnell wrote: > Further debugging led me to the conclusion that the problem is in > pmap_protect(), in src/sys/i386/i386/pmap.c; and has to do with a > 32-bit-truncated pt_entry_t being passed to PHYS_TO_VM_PAGE(). > (pt_entry_t is 64 bits if the kernel is built with PAE.) This caused > a page fault in vm_page_flag_set() which left the thread deadlocked > while holding vm_page_queue_mtx and in turn led to a panic when > another thread tried to acquire vm_page_queue_mtx. > > Then I checked the cvs logs, and saw rev 1.524, which looks like what > I was thinking about as a fix, so I'm giving it a spin on top of > earlier-this-week's RELENG_5. Thus far I'll say that with that change > my usual way of provoking the problem hasn't, yet. > > I'm going to try to get this PC put back into co-lo where it can > get some production-like testing this weekend. It'd be nice to get > this fix MFC'd to RELENG_5 too. > > -Frank McConnell FWIW, on a server we have which was panicing quite frequently, performing the above mentioned modification seems to have resolved the issue. The server has been repeatedly building kernels while having another process run the server out of RAM. Before, this would cause it to panic with one of 2 (maybe 3) messages in well under an hour. Now it's been going for 24 hours straight without even a stray bus error. This appears to resolve i386/84563, and I believe it should resolve related bugs kern/82846 (identical panic) and i386/84306. The specific fix Frank has mentioned is this: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/i386/pmap.c.diff?r1=1.523&r2=1.524&f=h committed by jhb and submitted by Greg Taleck. Even though this pmap.c change was applied to a later version than distributed with FreeBSD 5.4, the modifications still apply. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Create 2.5TB file system on 5.4S?
On Sun, 14 Aug 2005, Brandon Fosdick wrote: > Now that my shiny new 9500S is installed and not fighting for IRQs, I've > created and initialized a ~2.5TB array using the bios utility. So the > next step is mounting the new array. > > I naively tried following the regular handbook instructions for adding a > new drive and failed miserably. And after googling a bit I now know why, > and realized that I knew why before, but I was being stupid. > > I've seen a few mentions of using gpt(8) and some vague references to > using dedicated mode. But I haven't seen anything that says "this is the > Right Way to do it". So...what's the proper way to make a large file > system? Whatever you end up doing (we used auto-carving here, had to unfortunately) be sure to test the partitions fully before proceeding. >2TB support in FreeBSD is not yet perfect, so it'd be worth your time to find out what works and what doesn't work. The most recent general data available appears to be at: http://www.freebsd.org/projects/bigdisk/ Things to check, in particular, are background fscks, which are automatically enabled, regular fscks with a full/nearly full partition, and backups if you use some sort of 'raw disk' method like with 'dump'. If you can boot off another device other than the RAID, you'll save yourself a lot of hassle, for sure. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Create 2.5TB file system on 5.4S?
On Mon, 15 Aug 2005, Brandon Fosdick wrote: > Any suggestions? Are there specialized raid test suites or should I just > write a script that writes/deletes a lot of files? That's what I did -- lots of dds running with high block sizes, filling the partition, then forcing an unclean flag (umount, run 'fsck /dev/da0', ^C it before it's done) and rebooting. Roughly trying to reproduce what the array would go through under "normal" circumstances (while not having the clean flag is not "normal", it would be pretty awful to lose significant data because of a panic or power loss). I still don't entirely trust the system after seeing what I saw in PR i386/84589 (I have not yet seen the UFS2 problems I alluded to in that PR) but we're at about 38% full on the two carved partitions (3 months of data, doh! :)) and it hasn't crashed yet. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: broken fxp driver in 4.x ...
On Mon, 22 Aug 2005, Marc G. Fournier wrote: > Now, on one of our newer servers runnign a fairly recent cvsup of 4.x, the > fxp driver is doing the exact same thing ... all the other fxp based > servers do a nice quick 'ifconfig alias' for an IP, and arp broadcasts are > sent out, but on this one, I get the '60 second hang' and have to get our > network guys to clear arp caches for the changes to take effect :( > > Has anyone else running 4.x experienced this? Or am I just unlucky with > these things? Is there a way of fixing it? > > Thanks ... We have a similar problem with our servers using em and bge drivers, but we don't experience that exact problem with fxp (although we have a different but similarly annoying problem). I was told for our problem we should enable 'portfast' on our switch ports -- we're in the process of trying to get that done. I don't know if the alias problem is the same, though. It's definitely still present in 5.4-R. We aren't in a position to try 6.0 or 7.0 or whatever version people are moving on to, so I don't know if it's fixed there. I was thinking we might be the only people seeing the huge delay in getting networking up -- wasn't finding anything on Google about it. Can you try having your network guys enable 'portfast' on your server port, and then see if adding an IP alias still causes the hang? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: broken fxp driver in 4.x ...
On Mon, 22 Aug 2005, Marc G. Fournier wrote: > > Can you try having your network guys enable 'portfast' on your server > > port, and then see if adding an IP alias still causes the hang? > > I'd love to, but I know nothing about the Cisco switches other then the > very bare minimal ... if you can give me some instructions, I can look at > getting it done though ... > > Thanks ... > This only applies if they're using the Spanning Tree Protocol option (otherwise there's no reason I'm aware of that making a switch change would cause the problem to disappear, unfortunately). set spantree portfast M/P enable where M is the module and P is the port. http://www.cisco.com/univercd/cc/td/doc/product/lan/cat5000/rel_5_2/config/stp_enha.htm#32976 The switch will display a significant caution statement when the command is issued, but it should only apply if you're attaching a switch or some device that talks to the network like it's a switch (I think we have some hosts like that, 802.1q aware load balancers). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"