As I've said, I've got this ZFS pool, with 450GB of my data in it, that
crashes the system when I import it.  I've now got a dozen or so log
entries on this, having finally gotten it to happen in a controlled enough
environment that I can get at the log files and transfer them somewhere I
can access them.  They all appear to be exactly the same thing.

This crash happens nearly instantly after I do a "zpool import -f zp1". 
Since the stack traceback suggests it's involved in a scrub, and indeed I
do think it was scrubbing when this first started happening, I've tried
doing "zpool import -f zp1; zpool scrub -s zp1" in hopes that it will get
in in time to stop the hypothetical scrub.

I'm running OpenSolaris 2008.11.

I'd like to rescue this pool.  If I can't, though, I've got to destroy it
and restore from backup (which shouldn't make things much worse than they
already are; my backups are in decent shape so far as I know).  And I've
spent far far too much time on this (busy at work, other things at home,
so this has dragged on interminably).  And I'd love to provide useful
information to anybody interested in finding and fixing whatever is wrong
in the code that left me in this position, of course.  Nobody has
responded to my image of the stack traceback from yesterday, but I'm
hoping that now that I've managed to get more information (including the
stuff before the stack traceback), somebody may be able to do something
with it.  I do also seem to have a dump, if that's any use to anybody:

Jan 31 22:34:11 fsfs genunix: [ID 111219 kern.notice] dumping to
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Jan 31 22:34:21 fsfs genunix: [ID 409368 kern.notice] ^M100% done: 197356
pages dumped, compression ratio 3.20,
Jan 31 22:34:21 fsfs genunix: [ID 851671 kern.notice] dump succeeded

Here's how the pool shows before I try to import it:
local...@fsfs:~$ pfexec zpool import
  pool: zp1
    id: 4278074723887284013
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

        zp1         ONLINE
          mirror    ONLINE
            c5t0d0  ONLINE
            c5t1d0  ONLINE
          mirror    ONLINE
            c6t0d0  ONLINE
            c6t1d0  ONLINE
local...@fsfs:~$

Should that report say if a scrub was in progress?

And here's the crash that happens immediately when I do import it:

Jan 31 22:34:10 fsfs unix: [ID 836849 kern.notice]
Jan 31 22:34:10 fsfs ^Mpanic[cpu1]/thread=ffffff00045fdc80:
Jan 31 22:34:10 fsfs genunix: [ID 335743 kern.notice] BAD TRAP: type=e
(#pf Page fault) rp=ffffff00045fcd60 addr=4e8 occurred in module "unix"
due to a NULL pointer dereference
Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice]
Jan 31 22:34:10 fsfs unix: [ID 839527 kern.notice] sched:
Jan 31 22:34:10 fsfs unix: [ID 753105 kern.notice] #pf Page fault
Jan 31 22:34:10 fsfs unix: [ID 532287 kern.notice] Bad kernel fault at
addr=0x4e8
Jan 31 22:34:10 fsfs unix: [ID 243837 kern.notice] pid=0,
pc=0xfffffffffb84e84b, sp=0xffffff00045fce58, eflags=0x10246
Jan 31 22:34:10 fsfs unix: [ID 211416 kern.notice] cr0:
8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
Jan 31 22:34:10 fsfs unix: [ID 624947 kern.notice] cr2: 4e8
Jan 31 22:34:10 fsfs unix: [ID 625075 kern.notice] cr3: 3400000
Jan 31 22:34:10 fsfs unix: [ID 625715 kern.notice] cr8: c
Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice]
Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]      rdi:              4e8
rsi:             a200 rdx: ffffff00045fdc80
Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]      rcx:                1 
r8: ffffff01599c8540  r9:                0
Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]      rax:                0
rbx:                0 rbp: ffffff00045fced0
Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]      r10:             5b40
r11:                0 r12: ffffff0161e92040
Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]      r13: ffffff0176684800
r14: ffffff0161e92040 r15:                0
Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]      fsb:                0
gsb: ffffff0149f68500  ds:               4b
Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]       es:               4b 
fs:                0  gs:              1c3
Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]      trp:                e
err:                2 rip: fffffffffb84e84b
Jan 31 22:34:10 fsfs unix: [ID 592667 kern.notice]       cs:               30
rfl:            10246 rsp: ffffff00045fce58
Jan 31 22:34:10 fsfs unix: [ID 266532 kern.notice]       ss:               38
Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice]
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcc40
unix:die+dd ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcd50
unix:trap+1752 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcd60
unix:_cmntrap+e9 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fced0
unix:mutex_enter+b ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fcfe0
zfs:scrub_visitbp+61b ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd0f0
zfs:scrub_visitbp+5b3 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd200
zfs:scrub_visitbp+223 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd310
zfs:scrub_visitbp+282 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd420
zfs:scrub_visitbp+223 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd530
zfs:scrub_visitbp+223 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd640
zfs:scrub_visitbp+223 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd750
zfs:scrub_visitbp+223 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd860
zfs:scrub_visitbp+438 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd8b0
zfs:scrub_visit_rootbp+4f ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fd910
zfs:scrub_visitds+7e ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdac0
zfs:dsl_pool_scrub_sync+123 ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdb30
zfs:dsl_pool_sync+18c ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdbc0
zfs:spa_sync+2af ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdc60
zfs:txg_sync_thread+1fc ()
Jan 31 22:34:10 fsfs genunix: [ID 655072 kern.notice] ffffff00045fdc70
unix:thread_start+8 ()
Jan 31 22:34:10 fsfs unix: [ID 100000 kern.notice]
Jan 31 22:34:10 fsfs genunix: [ID 672855 kern.notice] syncing file systems...
Jan 31 22:34:10 fsfs genunix: [ID 904073 kern.notice]  done
Jan 31 22:34:11 fsfs genunix: [ID 111219 kern.notice] dumping to
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Jan 31 22:34:21 fsfs genunix: [ID 409368 kern.notice] ^M100% done: 197356
pages dumped, compression ratio 3.20,
Jan 31 22:34:21 fsfs genunix: [ID 851671 kern.notice] dump succeeded
Jan 31 22:34:50 fsfs genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11
Version snv_101b 64-bit
Jan 31 22:34:50 fsfs genunix: [ID 172908 kern.notice] Copyright 1983-2008
Sun Microsystems, Inc.  All rights reserved.
Jan 31 22:34:50 fsfs Use is subject to license terms.
Jan 31 22:34:50 fsfs unix: [ID 126719 kern.info] features:
613f6fff<cpuid,tscp,cmp,cx16,sse3,nx,asysc,sse2,sse,pat,cx8,pae,mca,mmx,cmov,de,pge,mtrr,msr,tsc,lgpg>
Jan 31 22:34:50 fsfs unix: [ID 168242 kern.info] mem = 2095676K (0x7fe8f000)
Jan 31 22:34:50 fsfs unix: [ID 972737 kern.info] Skipping psm: xpv_psm
Jan 31 22:34:50 fsfs rootnex: [ID 466748 kern.info] root nexus = i86pc
Jan 31 22:34:50 fsfs iommulib: [ID 321598 kern.info] NOTICE:
iommulib_nexus_register: rootnex-1: Succesfully registered NEXUS i86pc
nexops=fffffffffbceadb0
Jan 31 22:34:50 fsfs rootnex: [ID 349649 kern.info] pseudo0 at root
Jan 31 22:34:50 fsfs genunix: [ID 936769 kern.info] pseudo0 is /pseudo
Jan 31 22:34:50 fsfs rootnex: [ID 349649 kern.info] scsi_vhci0 at root
Jan 31 22:34:50 fsfs genunix: [ID 936769 kern.info] scsi_vhci0 is /scsi_vhci
Jan 31 22:34:50 fsfs rootnex: [ID 349649 kern.info] isa0 at root



-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to