Hi Carsten,

    Thanks for your reply. I would love to take a look at the core
    file. If there is a way this can somehow be transferred to
    the internal cores server, I can work on the bug.

    I am not sure about the modalities of transferring the core
    file though. I will ask around and see if I can help you here.

Thanks,
Deepak.

On Wednesday 28 March 2012 06:15 PM, Carsten John wrote:
-----Original message-----
To:     zfs-discuss@opensolaris.org;
From:   Deepak Honnalli<deepak.honna...@oracle.com>
Sent:   Wed 28-03-2012 09:12
Subject:        Re: [zfs-discuss] kernel panic during zfs import
Hi Carsten,

      This was supposed to be fixed in build 164 of Nevada (6742788). If
you are still seeing this
      issue in S11, I think you should raise a bug with relevant details.
As Paul has suggested,
      this could also be due to incomplete snapshot.

      I have seen interrupted zfs recv's causing weired bugs.

Thanks,
Deepak.

Hi Deepak,

I just spent about an hour (or two) trying to file a bug report regarding the 
issue without success.

Seems to me, that I'm too stupid to use this "MyOracleSupport" portal.

So, as I'm getting paid for keeping systems running and not clicking through 
flash overloaded support portals searching for CSIs, I'm giving the relevant 
information to the list now.

Perhaps, someone at Oracle, reading the list, is able to file a bug report, or 
contact me off list.



Background:

Machine A
- Sun X4270
- Opensolaris Build 111b
- zpool version 14
- primary file server
- sending snapshots via zfs send
- direct attached Sun J4400 SAS JBODs with totally 40 TB storage

Machine B
- Sun X4270
- Solaris 11
- zpool version 33
- mirror server
- receiving snapshots via zfs receive
- FC attached Storagetek FLX280 storage


Incident:

After a zfs send/receive run machine B had a hanging zfs receive process. To 
get rid of the process, I rebooted the machine. During reboot the kernel 
panics, resulting in a reboot loop.

To bring up the system, I rebooted single user, removed /etc/zfs/zpool.cache 
and rebooted again.

The damaged pool can imported readonly, giving a warning:

    $>zpool import -o readonly=on san_pool
    cannot set property for 'san_pool/home/someuser': dataset is read-only
    cannot set property for 'san_pool/home/someotheruser': dataset is read-only

The ZFS debugger zdb does not give any additional information:

    $>zdb -d -e san_pool
    Dataset san_pool [ZPL], ID 18, cr_txg 1, 36.0K, 11 objects


The issue can reproduced by trying to import the pool r/w, resulting in a 
kernel panic.


The fmdump utility gives the following information for the relevant UUID:

    $>fmdump -Vp -u 91da1503-74c5-67c2-b7c1-d4e245e4d968
    TIME                           UUID                                 
SUNW-MSG-ID
    Mar 28 2012 12:54:26.563203000 91da1503-74c5-67c2-b7c1-d4e245e4d968 
SUNOS-8000-KL

      TIME                 CLASS                                 ENA
      Mar 28 12:54:24.2698 ireport.os.sunos.panic.dump_available 
0x0000000000000000
      Mar 28 12:54:05.9826 ireport.os.sunos.panic.dump_pending_on_device 
0x0000000000000000

    nvlist version: 0
         version = 0x0
         class = list.suspect
         uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968
         code = SUNOS-8000-KL
         diag-time = 1332932066 541092
         de = fmd:///module/software-diagnosis
         fault-list-sz = 0x1
         __case_state = 0x1
         topo-uuid = 3b4117e0-0ac7-cde5-b434-b9735176d591
         fault-list = (array of embedded nvlists)
         (start fault-list[0])
         nvlist version: 0
                 version = 0x0
                 class = defect.sunos.kernel.panic
                 certainty = 0x64
                 asru = 
sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968
                 resource = 
sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968
                 savecore-succcess = 1
                 dump-dir = /var/crash
                 dump-files = vmdump.0
                 os-instance-uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968
                 panicstr = BAD TRAP: type=e (#pf Page fault) rp=ffffff002f6dcc50 addr=20 
occurred in module "zfs" due to a NULL pointer dereference
                 panicstack = unix:die+d8 () | unix:trap+152b () | 
unix:cmntrap+e6 () | zfs:zap_leaf_lookup_closest+45 () | 
zfs:fzap_cursor_retrieve+cd () | zfs:zap_cursor_retrieve+195 () | 
zfs:zfs_purgedir+4d () |       zfs:zfs_rmnode+57 () | zfs:zfs_zinactive+b4 () | 
zfs:zfs_inactive+1a3 () | genunix:fop_inactive+b1 () | genunix:vn_rele+58 () | 
zfs:zfs_unlinked_drain+a7 () | zfs:zfsvfs_setup+f1 () | zfs:zfs_domount+152 () 
| zfs:zfs_mount+4e3 () | genunix:fsop_mount+22 () | genunix:domount+d2f () | 
genunix:mount+c0 () | genunix:syscall_ap+92 () | unix:brand_sys_sysenter+1cf () 
|
                 crashtime = 1332931339
                 panic-time = March 28, 2012 12:42:19 PM CEST CEST
         (end fault-list[0])

         fault-status = 0x1
         severity = Major
         __ttl = 0x1
         __tod = 0x4f72ede2 0x2191cbb8


The 'first view' debugger output looks like:

    mdb unix.0 vmcore.0
    Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp 
scsi_vhci zfs mpt sd ip hook neti arp usba uhci sockfs qlc fctl s1394 kssl lofs 
random idm sppp crypto sata fcip cpc fcp ufs logindmux ptm ]
    >  $c
    zap_leaf_lookup_closest+0x45(ffffff0728eac588, 0, 0, ffffff002f6dcdb0)
    fzap_cursor_retrieve+0xcd(ffffff0728eac588, ffffff002f6dced0, 
ffffff002f6dcf10)
    zap_cursor_retrieve+0x195(ffffff002f6dced0, ffffff002f6dcf10)
    zfs_purgedir+0x4d(ffffff072806e810)
    zfs_rmnode+0x57(ffffff072806e810)
    zfs_zinactive+0xb4(ffffff072806e810)
    zfs_inactive+0x1a3(ffffff0728075080, ffffff0715079548, 0)
    fop_inactive+0xb1(ffffff0728075080, ffffff0715079548, 0)
    vn_rele+0x58(ffffff0728075080)
    zfs_unlinked_drain+0xa7(ffffff0728c43e00)
    zfsvfs_setup+0xf1(ffffff0728c43e00, 1)
    zfs_domount+0x152(ffffff0728cca310, ffffff071de80900)
    zfs_mount+0x4e3(ffffff0728cca310, ffffff0728eab600, ffffff002f6dde20, 
ffffff0715079548)
    fsop_mount+0x22(ffffff0728cca310, ffffff0728eab600, ffffff002f6dde20, 
ffffff0715079548)
    domount+0xd2f(0, ffffff002f6dde20, ffffff0728eab600, ffffff0715079548, 
ffffff002f6dde18)
    mount+0xc0(ffffff06fcf7fb38, ffffff002f6dde98)
    syscall_ap+0x92()
    _sys_sysenter_post_swapgs+0x149()


The relevant core files are available for investigation if someone is 
interested in it.



Carsten


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to