Re: repeatable crash with 5.4-RELEASE and PAE

2005-08-01 Thread dpk
As seen here:

http://groups-beta.google.com/group/mailing.freebsd.stable/browse_thread/thread/ad8029efaa6efe95/8f3269d6819ef0b8?lnk=st&q=panic+vm_pageout+fork_trampoline+fork_exit+freebsd+5.4&rnum=1&hl=en#8f3269d6819ef0b8

If you'd prefer, I can mail this to another list, in case stable isn't
appropriate.

We're seeing exactly the same panic here, with slight changes to the
addresses referenced (phew, I don't have to transcribe it!) Our setup is:

Dual Xeon 3.0Ghz
4 gigs of ram
3ware RAID10 (4 x 400GB)
FreeBSD 5.4-RELEASE-p5

We're running a PAE kernel as well. The main difference between his
kernel and ours is that we're not using acpi due to extremely poor
network performance.

Unfortunately the kernel will not dump core (even with a 16GB swap
device). When I do a 'call doadump', it gives this error:

Dumping 4608MB
twa0: SCSI cmd=0x2a: ERROR (0x3: 0x0100): SGL entry contains zero data:
address=0x0, length=0x1, cmd=W

The original panic line is:

panic: lockmgr: thread 0xfff0, not exclusive lock holder c5f5fa80
unlocking

Here's the lockmgr line from the panic:

lockmgr(c61f5e14,6,c61f5d68,0,eb858a1c) lockmgr+0x421

lockmgr's 2nd argument was 6, LK_RELEASE, and lockmgr's 4th argument, the
thread, was 0, so the panic was caused because a kernel thread was trying
to release a lock for a non-kernel thread. Am I interpreting that
correctly?

'show lockedvnods' reveals:

0xc61f5d68: tag devfs, type VCHR, usecount 4846, writecount 0, refcount
78, flags (VV_OBJBUF), lock type devfs: EXCL (count 1) by 0xc5f5fa80 (pid
8)
dev da0s1a

pid 8 is 'pagedaemon'.

Any ideas/pointers? I've been going through the source and trying to
figure out what's calling the lockmgr. I keep coming back to spec_write(),
which doesn't appear to be used by any filesystems (unless it's used by
procfs?)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Kernel debugging, 5.4-RELEASE

2005-08-02 Thread dpk
What method do kernel developers employ to debug kernel panics? The gdb
that comes with 5.4-RELEASE does not have kernel debugging support and the
handbook appears to be out of date with regards to KDB.

I'm trying to use another gdb, out of ports, that has kernel debugging
support but I'm getting the following results:

/usr/ports/devel/gdb6/work/gdb+dejagnu-20040810/gdb/gdb -k kernel.debug
GNU gdb 20040810 [GDB v6.x for FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-portbld-freebsd5.4"...
(kgdb) target remote /dev/cuaa0
Remote debugging using /dev/cuaa0
0xc035646c in kdb_enter ()
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
warning: shared library handler failed to enable breakpoint
(kgdb) bt
#0  0xc035646c in kdb_enter ()
#1  0xc033ea1f in panic ()
#2  0xc0333181 in lockmgr ()
#3  0xc038b08b in vop_stdunlock ()
#4  0xc038af3b in vop_defaultop ()
#5  0xc03010bb in spec_vnoperate ()
#6  0xc0301648 in spec_write ()
etc

IE, it's not giving me any argument information.

The target machine was booted with kernel.debug, an unstripped kernel
built with -g. What additional steps do the kernel developers take to get
the arguments?

(Unfortunately I cannot get this machine to dump core to swap, so it has
to be done over remote)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Kernel debugging, 5.4-RELEASE

2005-08-02 Thread dpk
On Tue, 2 Aug 2005, Mitch Parks wrote:

> man kgdb
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html

Thank you. I'm using kgdb now, and I get the following error:

$ kgdb -r /dev/cuaa0 kernel.debug
[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-marcel-freebsd".
Switching to remote protocol
0xc035646c in kdb_enter ()
Segmentation fault

The two servers (the panic one and the one running kgdb) are identical,
except the one running kgdb is not using PAE. The kernel.debug file is
identical to the remote server however.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Kernel debugging, 5.4-RELEASE

2005-08-02 Thread dpk
On Tue, 2 Aug 2005, dpk wrote:

> On Tue, 2 Aug 2005, Mitch Parks wrote:
>
> > man kgdb
> > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html
>
> Thank you. I'm using kgdb now, and I get the following error:
>
> $ kgdb -r /dev/cuaa0 kernel.debug
> [GDB will not be able to debug user-mode threads:
> /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "i386-marcel-freebsd".
> Switching to remote protocol
> 0xc035646c in kdb_enter ()
> Segmentation fault
>
> The two servers (the panic one and the one running kgdb) are identical,
> except the one running kgdb is not using PAE. The kernel.debug file is
> identical to the remote server however.

If I run it without arguments, and then enter "file kernel.debug" and
"target remote /dev/cuaa0" it does not crash. However, when the remote
server is set to use the GDB backend, all that comes up is:

0x in ?? ()

bt and others do nothing to help determine the problem.

If I boot the machine with "boot -d", and attach then, hit "continue" and
let it run until it panics, I get this:

Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to PID 100127 TID 0]
0x in ?? ()
(kgdb)

Seems I can't extract anything useful from that prompt, either.

Is there anything else I can try, to get some debugging information? The
steps outlined in the handbook don't seem to be working. I've been at this
for a couple days, so I apologize if I am curt, I'm just trying to get a
useful backtrace to submit as a PR.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Kernel debugging, 5.4-RELEASE

2005-08-02 Thread dpk
On Tue, 2 Aug 2005, Frank McConnell wrote:

> I've been using plain bog-standard /usr/bin/gdb, not out of lack
> of knowledge of kgdb but because I also find that kgdb fails with
> a segmentation fault after connecting.

Ah, OK. That solves part of the problem.

> If I'm getting more stuff out of my backtrace, it is likely because I
> have this in my kernel config:
>
> makeoptions DEBUG="-g"
>
> And likewise, sorry if I come across as being a bit too verbose or
> grumpy.  Guidance is welcome.
>
> -Frank McConnell

I had that option, but one thing I did not do was a "make clean" before
rebuilding with -g. I assumed "make depend" would handle what was
necessary to rebuild any files that could use the -g, and I guess I just
didn't pay close attention to the build process. How embarassing.

gdb now gives a proper backtrace and soon I hope to have some useful data
to submit in a PR.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: RELENG_5 PAE panic

2005-08-06 Thread dpk
On Thu, 4 Aug 2005, Frank McConnell wrote:

> Further debugging led me to the conclusion that the problem is in
> pmap_protect(), in src/sys/i386/i386/pmap.c; and has to do with a
> 32-bit-truncated pt_entry_t being passed to PHYS_TO_VM_PAGE().
> (pt_entry_t is 64 bits if the kernel is built with PAE.)  This caused
> a page fault in vm_page_flag_set() which left the thread deadlocked
> while holding vm_page_queue_mtx and in turn led to a panic when
> another thread tried to acquire vm_page_queue_mtx.
>
> Then I checked the cvs logs, and saw rev 1.524, which looks like what
> I was thinking about as a fix, so I'm giving it a spin on top of
> earlier-this-week's RELENG_5.  Thus far I'll say that with that change
> my usual way of provoking the problem hasn't, yet.
>
> I'm going to try to get this PC put back into co-lo where it can
> get some production-like testing this weekend.  It'd be nice to get
> this fix MFC'd to RELENG_5 too.
>
> -Frank McConnell

FWIW, on a server we have which was panicing quite frequently, performing
the above mentioned modification seems to have resolved the issue. The
server has been repeatedly building kernels while having another process
run the server out of RAM. Before, this would cause it to panic with one
of 2 (maybe 3) messages in well under an hour. Now it's been going for 24
hours straight without even a stray bus error.

This appears to resolve i386/84563, and I believe it should resolve
related bugs kern/82846 (identical panic) and i386/84306.

The specific fix Frank has mentioned is this:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/i386/pmap.c.diff?r1=1.523&r2=1.524&f=h

committed by jhb and submitted by Greg Taleck.

Even though this pmap.c change was applied to a later version than
distributed with FreeBSD 5.4, the modifications still apply.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Create 2.5TB file system on 5.4S?

2005-08-15 Thread dpk
On Sun, 14 Aug 2005, Brandon Fosdick wrote:

> Now that my shiny new 9500S is installed and not fighting for IRQs, I've
> created and initialized a ~2.5TB array using the bios utility. So the
> next step is mounting the new array.
>
> I naively tried following the regular handbook instructions for adding a
> new drive and failed miserably. And after googling a bit I now know why,
> and realized that I knew why before, but I was being stupid.
>
> I've seen a few mentions of using gpt(8) and some vague references to
> using dedicated mode. But I haven't seen anything that says "this is the
> Right Way to do it". So...what's the proper way to make a large file
> system?

Whatever you end up doing (we used auto-carving here, had to
unfortunately) be sure to test the partitions fully before proceeding.
>2TB support in FreeBSD is not yet perfect, so it'd be worth your time
to find out what works and what doesn't work.

The most recent general data available appears to be at:

http://www.freebsd.org/projects/bigdisk/

Things to check, in particular, are background fscks, which are
automatically enabled, regular fscks with a full/nearly full partition,
and backups if you use some sort of 'raw disk' method like with 'dump'.

If you can boot off another device other than the RAID, you'll save
yourself a lot of hassle, for sure.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Create 2.5TB file system on 5.4S?

2005-08-15 Thread dpk
On Mon, 15 Aug 2005, Brandon Fosdick wrote:

> Any suggestions? Are there specialized raid test suites or should I just
> write a script that writes/deletes a lot of files?

That's what I did -- lots of dds running with high block sizes, filling
the partition, then forcing an unclean flag (umount, run 'fsck /dev/da0',
^C it before it's done) and rebooting. Roughly trying to reproduce what
the array would go through under "normal" circumstances (while not having
the clean flag is not "normal", it would be pretty awful to lose
significant data because of a panic or power loss).

I still don't entirely trust the system after seeing what I saw in PR
i386/84589 (I have not yet seen the UFS2 problems I alluded to in that PR)
but we're at about 38% full on the two carved partitions (3 months of
data, doh! :)) and it hasn't crashed yet.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: broken fxp driver in 4.x ...

2005-08-22 Thread dpk
On Mon, 22 Aug 2005, Marc G. Fournier wrote:

> Now, on one of our newer servers runnign a fairly recent cvsup of 4.x, the
> fxp driver is doing the exact same thing ... all the other fxp based
> servers do a nice quick 'ifconfig alias' for an IP, and arp broadcasts are
> sent out, but on this one, I get the '60 second hang' and have to get our
> network guys to clear arp caches for  the changes to take effect :(
>
> Has anyone else running 4.x experienced this?  Or am I just unlucky with
> these things?  Is there a way of fixing it?
>
> Thanks ...

We have a similar problem with our servers using em and bge drivers, but
we don't experience that exact problem with fxp (although we have a
different but similarly annoying problem). I was told for our problem we
should enable 'portfast' on our switch ports -- we're in the process of
trying to get that done.

I don't know if the alias problem is the same, though. It's definitely
still present in 5.4-R. We aren't in a position to try 6.0 or 7.0 or
whatever version people are moving on to, so I don't know if it's fixed
there.

I was thinking we might be the only people seeing the huge delay in
getting networking up -- wasn't finding anything on Google about it.

Can you try having your network guys enable 'portfast' on your server
port, and then see if adding an IP alias still causes the hang?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: broken fxp driver in 4.x ...

2005-08-22 Thread dpk
On Mon, 22 Aug 2005, Marc G. Fournier wrote:

> > Can you try having your network guys enable 'portfast' on your server
> > port, and then see if adding an IP alias still causes the hang?
>
> I'd love to, but I know nothing about the Cisco switches other then the
> very bare minimal ... if you can give me some instructions, I can look at
> getting it done though ...
>
> Thanks ... >

This only applies if they're using the Spanning Tree Protocol option
(otherwise there's no reason I'm aware of that making a switch change
would cause the problem to disappear, unfortunately).

set spantree portfast M/P enable

where M is the module and P is the port.

http://www.cisco.com/univercd/cc/td/doc/product/lan/cat5000/rel_5_2/config/stp_enha.htm#32976

The switch will display a significant caution statement when the command
is issued, but it should only apply if you're attaching a switch or some
device that talks to the network like it's a switch (I think we have some
hosts like that, 802.1q aware load balancers).
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"