On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:
> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last 
> built 2012-02-08).  It will panic during the daily periodic scripts that run 
> at 3am.  Here is the most recent panic message:
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer     = 0x20:0xffffffff8069d266
> stack pointer           = 0x28:0xffffff8094b90390
> frame pointer           = 0x28:0xffffff8094b903a0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 72566 (ps)
> trap number             = 9
> panic: general protection fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff8062cf8e at kdb_backtrace+0x5e
> #1 0xffffffff805facd3 at panic+0x183
> #2 0xffffffff808e6c20 at trap_fatal+0x290
> #3 0xffffffff808e715a at trap+0x10a
> #4 0xffffffff808cec64 at calltrap+0x8
> #5 0xffffffff805ee034 at fill_kinfo_thread+0x54
> #6 0xffffffff805eee76 at fill_kinfo_proc+0x586
> #7 0xffffffff805f22b8 at sysctl_out_proc+0x48
> #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278
> #9 0xffffffff8060473f at sysctl_root+0x14f
> #10 0xffffffff80604a2a at userland_sysctl+0x14a
> #11 0xffffffff80604f1a at __sysctl+0xaa
> #12 0xffffffff808e62d4 at amd64_syscall+0x1f4
> #13 0xffffffff808cef5c at Xfast_syscall+0xfc
> Uptime: 3d19h6m0s
> Dumping 1308 out of 2028 MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91%
> Dump complete
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting...
> 
> 
> The reason for the subject line is that I have another RELENG_8 system that 
> uses ZFS + nullfs but doesn't panic, leading me to believe that ZFS + nullfs 
> is not the problem.  I am wondering if it is the combination of the three 
> that is deadly, here.
> 
> Both RELENG_8 systems are root-on-ZFS installs.  Each night there is a 
> separate backup script that runs and completes before the regular "periodic 
> daily" run.  This script takes a recursive snapshot of the ZFS pool and then 
> mounts these snapshots via mount_nullfs to provide a coherent view of the 
> filesystem under /backup.  The only difference between the two RELENG_8 
> systems is that one uses rsync to back up /backup to another machine and the 
> other uses the Linux Tivoli TSM client to back up /backup to a TSM server.  
> After the backup is completed, a script runs that unmounts the nullfs file 
> systems and then destroys the ZFS snapshot.
> 
> The first (rsync backup) RELENG_8 system does not panic.  It has been running 
> the ZFS + nullfs rsync backup job without incident for weeks now.  The second 
> (Tivoli TSM) RELENG_8 will reliably panic when the subsequent "periodic 
> daily" job runs.  (It is using the 32-bit TSM 6.2.4 Linux client running 
> "dsmc schedule" via the linux_base-f10-10_4 package.)  The actual ZFS + 
> nullfs Tivoli TSM backup job appears to run successfully, making me wonder if 
> perhaps it has some memory leak or other subtle corruption that sets up the 
> ensuing panic when the "periodic daily" job later gives the system a workout.
> 
> If I can provide more information about the panic, please let me know.  
> Despite the message about dumping in the panic output above, when the system 
> reboots I get a "No core dumps found" message during boot.  (I have 
> dumpdev="AUTO" set in /etc/rc.conf.)  My swap device is on separate 
> partitions but is mirrored using geom_mirror as /dev/mirror/swap.  Do crash 
> dumps to gmirror devices work on RELENG_8?

See gmirror(8) man page, section NOTES.  Read the full thing.

> Does anyone have any idea what is to blame for the panic, or how I can fix or 
> work around it?

Does the panic always happen when "ps" is run?  That's what's shown in
the above panic message.  Quoting:

> current process         = 72566 (ps)

And I'm inclined to think it does, based on the backtrace:

> #5 0xffffffff805ee034 at fill_kinfo_thread+0x54
> #6 0xffffffff805eee76 at fill_kinfo_proc+0x586
> #7 0xffffffff805f22b8 at sysctl_out_proc+0x48
> #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278

But if you can go through the previous panics and confirm that, it would
be helpful to developers in tracking down the problem.

Sorry I can't be of any more assistance than this.

-- 
| Jeremy Chadwick                              jdc at parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to