On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote: > I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last > built 2012-02-08). It will panic during the daily periodic scripts that run > at 3am. Here is the most recent panic message: > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 0; apic id = 00 > instruction pointer = 0x20:0xffffffff8069d266 > stack pointer = 0x28:0xffffff8094b90390 > frame pointer = 0x28:0xffffff8094b903a0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = resume, IOPL = 0 > current process = 72566 (ps) > trap number = 9 > panic: general protection fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff8062cf8e at kdb_backtrace+0x5e > #1 0xffffffff805facd3 at panic+0x183 > #2 0xffffffff808e6c20 at trap_fatal+0x290 > #3 0xffffffff808e715a at trap+0x10a > #4 0xffffffff808cec64 at calltrap+0x8 > #5 0xffffffff805ee034 at fill_kinfo_thread+0x54 > #6 0xffffffff805eee76 at fill_kinfo_proc+0x586 > #7 0xffffffff805f22b8 at sysctl_out_proc+0x48 > #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278 > #9 0xffffffff8060473f at sysctl_root+0x14f > #10 0xffffffff80604a2a at userland_sysctl+0x14a > #11 0xffffffff80604f1a at __sysctl+0xaa > #12 0xffffffff808e62d4 at amd64_syscall+0x1f4 > #13 0xffffffff808cef5c at Xfast_syscall+0xfc > Uptime: 3d19h6m0s > Dumping 1308 out of 2028 MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91% > Dump complete > Automatic reboot in 15 seconds - press a key on the console to abort > Rebooting... > > > The reason for the subject line is that I have another RELENG_8 system that > uses ZFS + nullfs but doesn't panic, leading me to believe that ZFS + nullfs > is not the problem. I am wondering if it is the combination of the three > that is deadly, here. > > Both RELENG_8 systems are root-on-ZFS installs. Each night there is a > separate backup script that runs and completes before the regular "periodic > daily" run. This script takes a recursive snapshot of the ZFS pool and then > mounts these snapshots via mount_nullfs to provide a coherent view of the > filesystem under /backup. The only difference between the two RELENG_8 > systems is that one uses rsync to back up /backup to another machine and the > other uses the Linux Tivoli TSM client to back up /backup to a TSM server. > After the backup is completed, a script runs that unmounts the nullfs file > systems and then destroys the ZFS snapshot. > > The first (rsync backup) RELENG_8 system does not panic. It has been running > the ZFS + nullfs rsync backup job without incident for weeks now. The second > (Tivoli TSM) RELENG_8 will reliably panic when the subsequent "periodic > daily" job runs. (It is using the 32-bit TSM 6.2.4 Linux client running > "dsmc schedule" via the linux_base-f10-10_4 package.) The actual ZFS + > nullfs Tivoli TSM backup job appears to run successfully, making me wonder if > perhaps it has some memory leak or other subtle corruption that sets up the > ensuing panic when the "periodic daily" job later gives the system a workout. > > If I can provide more information about the panic, please let me know. > Despite the message about dumping in the panic output above, when the system > reboots I get a "No core dumps found" message during boot. (I have > dumpdev="AUTO" set in /etc/rc.conf.) My swap device is on separate > partitions but is mirrored using geom_mirror as /dev/mirror/swap. Do crash > dumps to gmirror devices work on RELENG_8?
See gmirror(8) man page, section NOTES. Read the full thing. > Does anyone have any idea what is to blame for the panic, or how I can fix or > work around it? Does the panic always happen when "ps" is run? That's what's shown in the above panic message. Quoting: > current process = 72566 (ps) And I'm inclined to think it does, based on the backtrace: > #5 0xffffffff805ee034 at fill_kinfo_thread+0x54 > #6 0xffffffff805eee76 at fill_kinfo_proc+0x586 > #7 0xffffffff805f22b8 at sysctl_out_proc+0x48 > #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278 But if you can go through the previous panics and confirm that, it would be helpful to developers in tracking down the problem. Sorry I can't be of any more assistance than this. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"