Sorry for following up to myself.

It was suggested off-list that I try to see if the slightly older code in
the Belenix liveCD would maybe work with my pool.  Nope -- because I
upgraded my pool back when I thought OpenSolaris 2008.11 was going to
work, so it's too new a version for the code on that liveCD.  I *did* try
the 2008.11 liveCD, in addition to my installation, and they both crash
when I import the pool.

And I'm not going to do anything drastic and permanent today after all. 
Hoping for insight from somebody!

On Sat, January 31, 2009 22:51, David Dyer-Bennet wrote:
> As I've said, I've got this ZFS pool, with 450GB of my data in it, that
> crashes the system when I import it.  I've now got a dozen or so log
> entries on this, having finally gotten it to happen in a controlled enough
> environment that I can get at the log files and transfer them somewhere I
> can access them.  They all appear to be exactly the same thing.
>
> This crash happens nearly instantly after I do a "zpool import -f zp1".
> Since the stack traceback suggests it's involved in a scrub, and indeed I
> do think it was scrubbing when this first started happening, I've tried
> doing "zpool import -f zp1; zpool scrub -s zp1" in hopes that it will get
> in in time to stop the hypothetical scrub.
>
> I'm running OpenSolaris 2008.11.
>
> I'd like to rescue this pool.  If I can't, though, I've got to destroy it
> and restore from backup (which shouldn't make things much worse than they
> already are; my backups are in decent shape so far as I know).  And I've
> spent far far too much time on this (busy at work, other things at home,
> so this has dragged on interminably).  And I'd love to provide useful
> information to anybody interested in finding and fixing whatever is wrong
> in the code that left me in this position, of course.  Nobody has
> responded to my image of the stack traceback from yesterday, but I'm
> hoping that now that I've managed to get more information (including the
> stuff before the stack traceback), somebody may be able to do something
> with it.  I do also seem to have a dump, if that's any use to anybody:
>
> Jan 31 22:34:11 fsfs genunix: [ID 111219 kern.notice] dumping to
> /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
> Jan 31 22:34:21 fsfs genunix: [ID 409368 kern.notice] ^M100% done: 197356
> pages dumped, compression ratio 3.20,
> Jan 31 22:34:21 fsfs genunix: [ID 851671 kern.notice] dump succeeded
>
> Here's how the pool shows before I try to import it:
> local...@fsfs:~$ pfexec zpool import
>   pool: zp1
>     id: 4278074723887284013
>  state: ONLINE
> action: The pool can be imported using its name or numeric identifier.
> config:
>
>         zp1         ONLINE
>           mirror    ONLINE
>             c5t0d0  ONLINE
>             c5t1d0  ONLINE
>           mirror    ONLINE
>             c6t0d0  ONLINE
>             c6t1d0  ONLINE
> local...@fsfs:~$
>
> Should that report say if a scrub was in progress?
>
> And here's the crash that happens immediately when I do import it:
>
> Jan 31 22:34:10 fsfs unix: [ID 836849 kern.notice]
> Jan 31 22:34:10 fsfs ^Mpanic[cpu1]/thread=ffffff00045fdc80:
> Jan 31 22:34:10 fsfs genunix: [ID 335743 kern.notice] BAD TRAP: type=e
> (#pf Page fault) rp=ffffff00045fcd60 addr=4e8 occurred in module "unix"
> due to a NULL pointer dereference
> Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice]
> Jan 31 22:34:10 fsfs unix: [ID 839527 kern.notice] sched:
> Jan 31 22:34:10 fsfs unix: [ID 753105 kern.notice] #pf Page fault
> Jan 31 22:34:10 fsfs unix: [ID 532287 kern.notice] Bad kernel fault at
> addr=0x4e8
> Jan 31 22:34:10 fsfs unix: [ID 243837 kern.notice] pid=0,
> pc=0xfffffffffb84e84b, sp=0xffffff00045fce58, eflags=0x10246
> Jan 31 22:34:10 fsfs unix: [ID 211416 kern.notice] cr0:
> 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
> Jan 31 22:34:10 fsfs unix: [ID 624947 kern.notice] cr2: 4e8
> Jan 31 22:34:10 fsfs unix: [ID 625075 kern.notice] cr3: 3400000
> Jan 31 22:34:10 fsfs unix: [ID 625715 kern.notice] cr8: c
> Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice]
> Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]    rdi:              4e8
> rsi:             a200 rdx: ffffff00045fdc80
> Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]    rcx:                1
> r8: ffffff01599c8540  r9:                0
> Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]    rax:                0
> rbx:                0 rbp: ffffff00045fced0
> Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]    r10:             5b40
> r11:                0 r12: ffffff0161e92040
> Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]    r13: ffffff0176684800
> r14: ffffff0161e92040 r15:                0
> Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]    fsb:                0
> gsb: ffffff0149f68500  ds:               4b
> Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]     es:               4b
> fs:                0  gs:              1c3
> Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]    trp:                e
> err:                2 rip: fffffffffb84e84b
> Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]     cs:               30
> rfl:            10246 rsp: ffffff00045fce58
> Jan 31 22:34:10 fsfs unix: [ID 266532 kern.notice]     ss:               38
> Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice]
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcc40
> unix:die+dd ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcd50
> unix:trap+1752 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcd60
> unix:_cmntrap+e9 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fced0
> unix:mutex_enter+b ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcfe0
> zfs:scrub_visitbp+61b ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd0f0
> zfs:scrub_visitbp+5b3 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd200
> zfs:scrub_visitbp+223 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd310
> zfs:scrub_visitbp+282 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd420
> zfs:scrub_visitbp+223 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd530
> zfs:scrub_visitbp+223 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd640
> zfs:scrub_visitbp+223 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd750
> zfs:scrub_visitbp+223 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd860
> zfs:scrub_visitbp+438 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd8b0
> zfs:scrub_visit_rootbp+4f ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd910
> zfs:scrub_visitds+7e ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdac0
> zfs:dsl_pool_scrub_sync+123 ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdb30
> zfs:dsl_pool_sync+18c ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdbc0
> zfs:spa_sync+2af ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdc60
> zfs:txg_sync_thread+1fc ()
> Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdc70
> unix:thread_start+8 ()
> Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice]
> Jan 31 22:34:10 fsfs genunix: [ID 672855 kern.notice] syncing file
> systems...
> Jan 31 22:34:10 fsfs genunix: [ID 904073 kern.notice]  done
> Jan 31 22:34:11 fsfs genunix: [ID 111219 kern.notice] dumping to
> /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
> Jan 31 22:34:21 fsfs genunix: [ID 409368 kern.notice] ^M100% done: 197356
> pages dumped, compression ratio 3.20,
> Jan 31 22:34:21 fsfs genunix: [ID 851671 kern.notice] dump succeeded
> Jan 31 22:34:50 fsfs genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11
> Version snv_101b 64-bit
> Jan 31 22:34:50 fsfs genunix: [ID 172908 kern.notice] Copyright 1983-2008
> Sun Microsystems, Inc.  All rights reserved.
> Jan 31 22:34:50 fsfs Use is subject to license terms.
> Jan 31 22:34:50 fsfs unix: [ID 126719 kern.info] features:
> 613f6fff<cpuid,tscp,cmp,cx16,sse3,nx,asysc,sse2,sse,pat,cx8,pae,mca,mmx,cmov,de,pge,mtrr,msr,tsc,lgpg>
> Jan 31 22:34:50 fsfs unix: [ID 168242 kern.info] mem = 2095676K
> (0x7fe8f000)
> Jan 31 22:34:50 fsfs unix: [ID 972737 kern.info] Skipping psm: xpv_psm
> Jan 31 22:34:50 fsfs rootnex: [ID 466748 kern.info] root nexus = i86pc
> Jan 31 22:34:50 fsfs iommulib: [ID 321598 kern.info] NOTICE:
> iommulib_nexus_register: rootnex-1: Succesfully registered NEXUS i86pc
> nexops=fffffffffbceadb0
> Jan 31 22:34:50 fsfs rootnex: [ID 349649 kern.info] pseudo0 at root
> Jan 31 22:34:50 fsfs genunix: [ID 936769 kern.info] pseudo0 is /pseudo
> Jan 31 22:34:50 fsfs rootnex: [ID 349649 kern.info] scsi_vhci0 at root
> Jan 31 22:34:50 fsfs genunix: [ID 936769 kern.info] scsi_vhci0 is
> /scsi_vhci
> Jan 31 22:34:50 fsfs rootnex: [ID 349649 kern.info] isa0 at root
>
>
>
> --
> David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
> Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
> Photos: http://dd-b.net/photography/gallery/
> Dragaera: http://dragaera.info
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to