Sorry for following up to myself. It was suggested off-list that I try to see if the slightly older code in the Belenix liveCD would maybe work with my pool. Nope -- because I upgraded my pool back when I thought OpenSolaris 2008.11 was going to work, so it's too new a version for the code on that liveCD. I *did* try the 2008.11 liveCD, in addition to my installation, and they both crash when I import the pool.
And I'm not going to do anything drastic and permanent today after all. Hoping for insight from somebody! On Sat, January 31, 2009 22:51, David Dyer-Bennet wrote: > As I've said, I've got this ZFS pool, with 450GB of my data in it, that > crashes the system when I import it. I've now got a dozen or so log > entries on this, having finally gotten it to happen in a controlled enough > environment that I can get at the log files and transfer them somewhere I > can access them. They all appear to be exactly the same thing. > > This crash happens nearly instantly after I do a "zpool import -f zp1". > Since the stack traceback suggests it's involved in a scrub, and indeed I > do think it was scrubbing when this first started happening, I've tried > doing "zpool import -f zp1; zpool scrub -s zp1" in hopes that it will get > in in time to stop the hypothetical scrub. > > I'm running OpenSolaris 2008.11. > > I'd like to rescue this pool. If I can't, though, I've got to destroy it > and restore from backup (which shouldn't make things much worse than they > already are; my backups are in decent shape so far as I know). And I've > spent far far too much time on this (busy at work, other things at home, > so this has dragged on interminably). And I'd love to provide useful > information to anybody interested in finding and fixing whatever is wrong > in the code that left me in this position, of course. Nobody has > responded to my image of the stack traceback from yesterday, but I'm > hoping that now that I've managed to get more information (including the > stuff before the stack traceback), somebody may be able to do something > with it. I do also seem to have a dump, if that's any use to anybody: > > Jan 31 22:34:11 fsfs genunix: [ID 111219 kern.notice] dumping to > /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel > Jan 31 22:34:21 fsfs genunix: [ID 409368 kern.notice] ^M100% done: 197356 > pages dumped, compression ratio 3.20, > Jan 31 22:34:21 fsfs genunix: [ID 851671 kern.notice] dump succeeded > > Here's how the pool shows before I try to import it: > local...@fsfs:~$ pfexec zpool import > pool: zp1 > id: 4278074723887284013 > state: ONLINE > action: The pool can be imported using its name or numeric identifier. > config: > > zp1 ONLINE > mirror ONLINE > c5t0d0 ONLINE > c5t1d0 ONLINE > mirror ONLINE > c6t0d0 ONLINE > c6t1d0 ONLINE > local...@fsfs:~$ > > Should that report say if a scrub was in progress? > > And here's the crash that happens immediately when I do import it: > > Jan 31 22:34:10 fsfs unix: [ID 836849 kern.notice] > Jan 31 22:34:10 fsfs ^Mpanic[cpu1]/thread=ffffff00045fdc80: > Jan 31 22:34:10 fsfs genunix: [ID 335743 kern.notice] BAD TRAP: type=e > (#pf Page fault) rp=ffffff00045fcd60 addr=4e8 occurred in module "unix" > due to a NULL pointer dereference > Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice] > Jan 31 22:34:10 fsfs unix: [ID 839527 kern.notice] sched: > Jan 31 22:34:10 fsfs unix: [ID 753105 kern.notice] #pf Page fault > Jan 31 22:34:10 fsfs unix: [ID 532287 kern.notice] Bad kernel fault at > addr=0x4e8 > Jan 31 22:34:10 fsfs unix: [ID 243837 kern.notice] pid=0, > pc=0xfffffffffb84e84b, sp=0xffffff00045fce58, eflags=0x10246 > Jan 31 22:34:10 fsfs unix: [ID 211416 kern.notice] cr0: > 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> > Jan 31 22:34:10 fsfs unix: [ID 624947 kern.notice] cr2: 4e8 > Jan 31 22:34:10 fsfs unix: [ID 625075 kern.notice] cr3: 3400000 > Jan 31 22:34:10 fsfs unix: [ID 625715 kern.notice] cr8: c > Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice] > Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice] rdi: 4e8 > rsi: a200 rdx: ffffff00045fdc80 > Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice] rcx: 1 > r8: ffffff01599c8540 r9: 0 > Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice] rax: 0 > rbx: 0 rbp: ffffff00045fced0 > Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice] r10: 5b40 > r11: 0 r12: ffffff0161e92040 > Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice] r13: ffffff0176684800 > r14: ffffff0161e92040 r15: 0 > Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice] fsb: 0 > gsb: ffffff0149f68500 ds: 4b > Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice] es: 4b > fs: 0 gs: 1c3 > Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice] trp: e > err: 2 rip: fffffffffb84e84b > Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice] cs: 30 > rfl: 10246 rsp: ffffff00045fce58 > Jan 31 22:34:10 fsfs unix: [ID 266532 kern.notice] ss: 38 > Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice] > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcc40 > unix:die+dd () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcd50 > unix:trap+1752 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcd60 > unix:_cmntrap+e9 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fced0 > unix:mutex_enter+b () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcfe0 > zfs:scrub_visitbp+61b () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd0f0 > zfs:scrub_visitbp+5b3 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd200 > zfs:scrub_visitbp+223 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd310 > zfs:scrub_visitbp+282 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd420 > zfs:scrub_visitbp+223 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd530 > zfs:scrub_visitbp+223 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd640 > zfs:scrub_visitbp+223 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd750 > zfs:scrub_visitbp+223 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd860 > zfs:scrub_visitbp+438 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd8b0 > zfs:scrub_visit_rootbp+4f () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd910 > zfs:scrub_visitds+7e () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdac0 > zfs:dsl_pool_scrub_sync+123 () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdb30 > zfs:dsl_pool_sync+18c () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdbc0 > zfs:spa_sync+2af () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdc60 > zfs:txg_sync_thread+1fc () > Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdc70 > unix:thread_start+8 () > Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice] > Jan 31 22:34:10 fsfs genunix: [ID 672855 kern.notice] syncing file > systems... > Jan 31 22:34:10 fsfs genunix: [ID 904073 kern.notice] done > Jan 31 22:34:11 fsfs genunix: [ID 111219 kern.notice] dumping to > /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel > Jan 31 22:34:21 fsfs genunix: [ID 409368 kern.notice] ^M100% done: 197356 > pages dumped, compression ratio 3.20, > Jan 31 22:34:21 fsfs genunix: [ID 851671 kern.notice] dump succeeded > Jan 31 22:34:50 fsfs genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 > Version snv_101b 64-bit > Jan 31 22:34:50 fsfs genunix: [ID 172908 kern.notice] Copyright 1983-2008 > Sun Microsystems, Inc. All rights reserved. > Jan 31 22:34:50 fsfs Use is subject to license terms. > Jan 31 22:34:50 fsfs unix: [ID 126719 kern.info] features: > 613f6fff<cpuid,tscp,cmp,cx16,sse3,nx,asysc,sse2,sse,pat,cx8,pae,mca,mmx,cmov,de,pge,mtrr,msr,tsc,lgpg> > Jan 31 22:34:50 fsfs unix: [ID 168242 kern.info] mem = 2095676K > (0x7fe8f000) > Jan 31 22:34:50 fsfs unix: [ID 972737 kern.info] Skipping psm: xpv_psm > Jan 31 22:34:50 fsfs rootnex: [ID 466748 kern.info] root nexus = i86pc > Jan 31 22:34:50 fsfs iommulib: [ID 321598 kern.info] NOTICE: > iommulib_nexus_register: rootnex-1: Succesfully registered NEXUS i86pc > nexops=fffffffffbceadb0 > Jan 31 22:34:50 fsfs rootnex: [ID 349649 kern.info] pseudo0 at root > Jan 31 22:34:50 fsfs genunix: [ID 936769 kern.info] pseudo0 is /pseudo > Jan 31 22:34:50 fsfs rootnex: [ID 349649 kern.info] scsi_vhci0 at root > Jan 31 22:34:50 fsfs genunix: [ID 936769 kern.info] scsi_vhci0 is > /scsi_vhci > Jan 31 22:34:50 fsfs rootnex: [ID 349649 kern.info] isa0 at root > > > > -- > David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ > Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ > Photos: http://dd-b.net/photography/gallery/ > Dragaera: http://dragaera.info > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss