Re: [zfs-discuss] Zfs over iscsi bad status

Arnaud Brand Tue, 02 Feb 2010 11:40:22 -0800

Just for the record, using 127.0.0.1 as the target instead oflocalhost's external IP, the problem didn't show up anymore.


Le 16/01/10 15:55, Arnaud Brand a écrit :

OK, the third question (localhost transmission failure) should have been posted 
to storage-discuss.
I'll subscribe to this list and ask there.


Regarding the first question, after having removed the lun from the target, 
devfsadm -C removes the device and then the pool shows as unavailable. I guess 
that's the proper behaviour.
Still the processes are hung and I can't destroy the pool.

This leads to being unable to open a new session with a user that has a home 
dir.

I copy-pasted some mdb results I found while looking for a way to get rid of 
the pool.
Please note I had failmode=wait for the failling pool.

But since you can't change it once you're stuck, you're bound to reboot in case 
of iscsi failure.
Or am I misunderstanding something ?


d...@nc-tanktsm:/tsmvol2# ps -ef | grep zpool
     root     5     0   0 01:47:33 ?           0:06 zpool-rpool
     root   327     0   0 01:47:50 ?          86:36 zpool-tank
     root  4721  4042   0 15:13:27 pts/3       0:00 zpool online tsmvol 
c9t600144F05DF34C0000004B51BF950003d0
     root  4617     0   0 14:36:35 ?           0:00 zpool-tsmvol
     root  4752     0   0 15:14:40 ?           0:39 zpool-tsmvol2
     root  4664  4042   0 15:08:34 pts/3       0:00 zpool destroy -f tsmvol
     root  4861  4042   0 15:27:33 pts/3       0:00 grep zpool

d...@nc-tanktsm:/tsmvol2# echo "0t4721::pid2proc|::walk thread|::findstack -v" 
| mdb -k
stack pointer for thread ffffff040c813c20: ffffff00196a3aa0
[ ffffff00196a3aa0 _resume_from_idle+0xf1() ]
   ffffff00196a3ad0 swtch+0x145()
   ffffff00196a3b00 cv_wait+0x61(ffffff03f7ea4e52, ffffff03f7ea4e18)
   ffffff00196a3b50 txg_wait_synced+0x7c(ffffff03f7ea4c40, 0)
   ffffff00196a3b90 spa_vdev_state_exit+0x78(ffffff0402d9da80, ffffff040c832700,
   0)
   ffffff00196a3c00 vdev_online+0x20a(ffffff0402d9da80, abe9a540ed085f5c, 0,
   ffffff00196a3c14)
   ffffff00196a3c40 zfs_ioc_vdev_set_state+0x83(ffffff046c08f000)
   ffffff00196a3cc0 zfsdev_ioctl+0x175(0, 5a0d, 8042310, 100003, 
ffffff04054f4528
   , ffffff00196a3de4)
   ffffff00196a3d00 cdev_ioctl+0x45(0, 5a0d, 8042310, 100003, ffffff04054f4528,
   ffffff00196a3de4)
   ffffff00196a3d40 spec_ioctl+0x5a(ffffff03e3218180, 5a0d, 8042310, 100003,
   ffffff04054f4528, ffffff00196a3de4, 0)
   ffffff00196a3dc0 fop_ioctl+0x7b(ffffff03e3218180, 5a0d, 8042310, 100003,
   ffffff04054f4528, ffffff00196a3de4, 0)
   ffffff00196a3ec0 ioctl+0x18e(3, 5a0d, 8042310)
   ffffff00196a3f10 _sys_sysenter_post_swapgs+0x149()
d...@nc-tanktsm:/tsmvol2# echo "0t4664::pid2proc|::walk thread|::findstack -v" 
| mdb -k
stack pointer for thread ffffff03ec9898a0: ffffff00195ccc20
[ ffffff00195ccc20 _resume_from_idle+0xf1() ]
   ffffff00195ccc50 swtch+0x145()
   ffffff00195ccc80 cv_wait+0x61(ffffff0403008658, ffffff0403008650)
   ffffff00195cccb0 rrw_enter_write+0x49(ffffff0403008650)
   ffffff00195ccce0 rrw_enter+0x22(ffffff0403008650, 0, fffffffff79da8a0)
   ffffff00195ccd40 zfsvfs_teardown+0x3b(ffffff0403008580, 1)
   ffffff00195ccd90 zfs_umount+0xe1(ffffff0403101b80, 400, ffffff04054f4528)
   ffffff00195ccdc0 fsop_unmount+0x22(ffffff0403101b80, 400, ffffff04054f4528)
   ffffff00195cce10 dounmount+0x5f(ffffff0403101b80, 400, ffffff04054f4528)
   ffffff00195cce60 umount2_engine+0x5c(ffffff0403101b80, 400, ffffff04054f4528,
   1)
   ffffff00195ccec0 umount2+0x142(80c1fd8, 400)
   ffffff00195ccf10 _sys_sysenter_post_swapgs+0x149()
d...@nc-tanktsm:/tsmvol2# ps -ef | grep iozone
     root  4631  3809   0 14:37:16 pts/2       0:00 
/usr/benchmarks/iozone/iozone -a -b results2.xls
     root  4879  4042   0 15:28:06 pts/3       0:00 grep iozone
d...@nc-tanktsm:/tsmvol2# echo "0t4631::pid2proc|::walk thread|::findstack -v" 
| mdb -k
stack pointer for thread ffffff040c7683e0: ffffff001791e050
[ ffffff001791e050 _resume_from_idle+0xf1() ]
   ffffff001791e080 swtch+0x145()
   ffffff001791e0b0 cv_wait+0x61(ffffff04ec895328, ffffff04ec895320)
   ffffff001791e0f0 zio_wait+0x5d(ffffff04ec895020)
   ffffff001791e150 dbuf_read+0x1e8(ffffff0453f1ea48, 0, 2)
   ffffff001791e1c0 dmu_buf_hold+0x93(ffffff03f60bdcc0, 3, 0, 0, 
ffffff001791e1f8
   )
   ffffff001791e260 zap_lockdir+0x67(ffffff03f60bdcc0, 3, 0, 1, 1, 0,
   ffffff001791e288)
   ffffff001791e2f0 zap_lookup_norm+0x55(ffffff03f60bdcc0, 3, ffffff001791e720, 
8
   , 1, ffffff001791e438, 0, 0, 0, 0)
   ffffff001791e350 zap_lookup+0x2d(ffffff03f60bdcc0, 3, ffffff001791e720, 8, 1,
   ffffff001791e438)
   ffffff001791e3d0 zfs_match_find+0xfd(ffffff0403008580, ffffff040aeb64b0,
   ffffff001791e720, 0, 1, 0, 0, ffffff001791e438)
   ffffff001791e4a0 zfs_dirent_lock+0x3d1(ffffff001791e4d8, ffffff040aeb64b0,
   ffffff001791e720, ffffff001791e4d0, 6, 0, 0)
   ffffff001791e540 zfs_dirlook+0xd9(ffffff040aeb64b0, ffffff001791e720,
   ffffff001791e6f0, 1, 0, 0)
   ffffff001791e5c0 zfs_lookup+0x25f(ffffff040b230300, ffffff001791e720,
   ffffff001791e6f0, ffffff001791ea30, 1, ffffff03e1776d80, ffffff03f84053a0, 0,
   0, 0)
   ffffff001791e660 fop_lookup+0xed(ffffff040b230300, ffffff001791e720,
   ffffff001791e6f0, ffffff001791ea30, 1, ffffff03e1776d80, ffffff03f84053a0, 0,
   0, 0)
   ffffff001791e8a0 lookuppnvp+0x281(ffffff001791ea30, 0, 1, ffffff001791ea20,
   ffffff001791ec00, ffffff03e1776d80, ffffff040b230300, ffffff03f84053a0)
   ffffff001791e940 lookuppnatcred+0x11b(ffffff001791ea30, 0, 1, 
ffffff001791ea20
   , ffffff001791ec00, 0, ffffff03f84053a0)
   ffffff001791e9d0 lookuppnat+0x69(ffffff001791ea30, 0, 1, ffffff001791ea20,
   ffffff001791ec00, 0)
   ffffff001791eb70 vn_createat+0x13a(809cb08, 0, ffffff001791ec20, 0, 80,
   ffffff001791ec00, 0, 2102, 12, 0)
   ffffff001791ed30 vn_openat+0x1fb(809cb08, 0, 2102, 0, ffffff001791ed78, 0, 12
   , 0, 3)
   ffffff001791ee90 copen+0x435(ffd19553, 809cb08, 2102, 0)
   ffffff001791eec0 open64+0x34(809cb08, 101, 0)
   ffffff001791ef10 _sys_sysenter_post_swapgs+0x149()



-----Message d'origine-----
De : Arnaud Brand
Envoyé : samedi 16 janvier 2010 01:54
À : zfs-discuss@opensolaris.org
Objet : Zfs over iscsi bad status

I was testing zfs over iscsi (with commstar over a zvol) and got some errors.
Target and initiator are on the same host.

I've copy-pasted an excerpt on zpool status hereafter.
The pool (tank) containing the iscsi-shared zvol (tank/tsmvol) is healthy and 
show no errors.
But the zpool (tsmvol) on the initiator side shows errors.

The processes that where running to generate some IO load/volume are stuck and 
aren't killable.
Zpool destroy tsmvol, zpool export tsmvol, zfs list stay stuck.
I managed to open an ssh session by connecting as an user that has no homedir 
and thus doesn't run quota (which is stuck and non killable too).

My questions are the following :

1 - why does the tsmvol pool show up as degraded when the only device it 
contains is itself degraded ?
Shouldn't it show as faulted ?

2 - why are my load generating processes (plain old cat) unkillable ?
Shouldn't they have been killed by ioerrors caused by the dead pool ?
I guess this is what the failmode setting is for, so I'll set it for the rest 
of my tests.
But I'm a bit puzzled about implications.

3 - how could there be errors in localhost transmission ?
I followed the basic steps outlined in commstar's how to, are there some 
specific settings to set for localhost iscsi access ?

Thanks for your help,
Arnaud

Excerpt of zpool status :

   pool: tank
  state: ONLINE
  scrub: none requested
config:

         NAME           STATE     READ WRITE CKSUM
         tank           ONLINE       0     0     0
           raidz1-0     ONLINE       0     0     0
             c10t0d0p0  ONLINE       0     0     0
             c10t1d0p0  ONLINE       0     0     0
             c10t2d0p0  ONLINE       0     0     0
             c10t3d0p0  ONLINE       0     0     0
             c10t4d0p0  ONLINE       0     0     0
             c10t5d0p0  ONLINE       0     0     0
             c10t6d0p0  ONLINE       0     0     0
           raidz1-1     ONLINE       0     0     0
             c10t7d0p0  ONLINE       0     0     0
             c11t1d0p0  ONLINE       0     0     0
             c11t2d0p0  ONLINE       0     0     0
             c11t4d0p0  ONLINE       0     0     0
             c11t5d0p0  ONLINE       0     0     0
             c11t6d0p0  ONLINE       0     0     0
             c11t7d0p0  ONLINE       0     0     0
         logs
           c11t0d0p2    ONLINE       0     0     0
         cache
           c11t0d0p3    ONLINE       0     0     0

errors: No known data errors

   pool: tsmvol
  state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
    see: http://www.sun.com/msg/ZFS-8000-HC
  scrub: none requested
config:

         NAME                                     STATE     READ WRITE CKSUM
         tsmvol                                   DEGRADED     3 24,0K     0
           c9t600144F05DF34C0000004B50D7D80001d0  DEGRADED     1 24,3K     1  
too many errors

errors: 24614 data errors, use '-v' for a list



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Zfs over iscsi bad status

Reply via email to