OK, the third question (localhost transmission failure) should have been posted
to storage-discuss.
I'll subscribe to this list and ask there.
Regarding the first question, after having removed the lun from the target,
devfsadm -C removes the device and then the pool shows as unavailable. I guess
that's the proper behaviour.
Still the processes are hung and I can't destroy the pool.
This leads to being unable to open a new session with a user that has a home
dir.
I copy-pasted some mdb results I found while looking for a way to get rid of
the pool.
Please note I had failmode=wait for the failling pool.
But since you can't change it once you're stuck, you're bound to reboot in case
of iscsi failure.
Or am I misunderstanding something ?
d...@nc-tanktsm:/tsmvol2# ps -ef | grep zpool
root 5 0 0 01:47:33 ? 0:06 zpool-rpool
root 327 0 0 01:47:50 ? 86:36 zpool-tank
root 4721 4042 0 15:13:27 pts/3 0:00 zpool online tsmvol
c9t600144F05DF34C0000004B51BF950003d0
root 4617 0 0 14:36:35 ? 0:00 zpool-tsmvol
root 4752 0 0 15:14:40 ? 0:39 zpool-tsmvol2
root 4664 4042 0 15:08:34 pts/3 0:00 zpool destroy -f tsmvol
root 4861 4042 0 15:27:33 pts/3 0:00 grep zpool
d...@nc-tanktsm:/tsmvol2# echo "0t4721::pid2proc|::walk thread|::findstack -v"
| mdb -k
stack pointer for thread ffffff040c813c20: ffffff00196a3aa0
[ ffffff00196a3aa0 _resume_from_idle+0xf1() ]
ffffff00196a3ad0 swtch+0x145()
ffffff00196a3b00 cv_wait+0x61(ffffff03f7ea4e52, ffffff03f7ea4e18)
ffffff00196a3b50 txg_wait_synced+0x7c(ffffff03f7ea4c40, 0)
ffffff00196a3b90 spa_vdev_state_exit+0x78(ffffff0402d9da80, ffffff040c832700,
0)
ffffff00196a3c00 vdev_online+0x20a(ffffff0402d9da80, abe9a540ed085f5c, 0,
ffffff00196a3c14)
ffffff00196a3c40 zfs_ioc_vdev_set_state+0x83(ffffff046c08f000)
ffffff00196a3cc0 zfsdev_ioctl+0x175(0, 5a0d, 8042310, 100003,
ffffff04054f4528
, ffffff00196a3de4)
ffffff00196a3d00 cdev_ioctl+0x45(0, 5a0d, 8042310, 100003, ffffff04054f4528,
ffffff00196a3de4)
ffffff00196a3d40 spec_ioctl+0x5a(ffffff03e3218180, 5a0d, 8042310, 100003,
ffffff04054f4528, ffffff00196a3de4, 0)
ffffff00196a3dc0 fop_ioctl+0x7b(ffffff03e3218180, 5a0d, 8042310, 100003,
ffffff04054f4528, ffffff00196a3de4, 0)
ffffff00196a3ec0 ioctl+0x18e(3, 5a0d, 8042310)
ffffff00196a3f10 _sys_sysenter_post_swapgs+0x149()
d...@nc-tanktsm:/tsmvol2# echo "0t4664::pid2proc|::walk thread|::findstack -v"
| mdb -k
stack pointer for thread ffffff03ec9898a0: ffffff00195ccc20
[ ffffff00195ccc20 _resume_from_idle+0xf1() ]
ffffff00195ccc50 swtch+0x145()
ffffff00195ccc80 cv_wait+0x61(ffffff0403008658, ffffff0403008650)
ffffff00195cccb0 rrw_enter_write+0x49(ffffff0403008650)
ffffff00195ccce0 rrw_enter+0x22(ffffff0403008650, 0, fffffffff79da8a0)
ffffff00195ccd40 zfsvfs_teardown+0x3b(ffffff0403008580, 1)
ffffff00195ccd90 zfs_umount+0xe1(ffffff0403101b80, 400, ffffff04054f4528)
ffffff00195ccdc0 fsop_unmount+0x22(ffffff0403101b80, 400, ffffff04054f4528)
ffffff00195cce10 dounmount+0x5f(ffffff0403101b80, 400, ffffff04054f4528)
ffffff00195cce60 umount2_engine+0x5c(ffffff0403101b80, 400, ffffff04054f4528,
1)
ffffff00195ccec0 umount2+0x142(80c1fd8, 400)
ffffff00195ccf10 _sys_sysenter_post_swapgs+0x149()
d...@nc-tanktsm:/tsmvol2# ps -ef | grep iozone
root 4631 3809 0 14:37:16 pts/2 0:00
/usr/benchmarks/iozone/iozone -a -b results2.xls
root 4879 4042 0 15:28:06 pts/3 0:00 grep iozone
d...@nc-tanktsm:/tsmvol2# echo "0t4631::pid2proc|::walk thread|::findstack -v"
| mdb -k
stack pointer for thread ffffff040c7683e0: ffffff001791e050
[ ffffff001791e050 _resume_from_idle+0xf1() ]
ffffff001791e080 swtch+0x145()
ffffff001791e0b0 cv_wait+0x61(ffffff04ec895328, ffffff04ec895320)
ffffff001791e0f0 zio_wait+0x5d(ffffff04ec895020)
ffffff001791e150 dbuf_read+0x1e8(ffffff0453f1ea48, 0, 2)
ffffff001791e1c0 dmu_buf_hold+0x93(ffffff03f60bdcc0, 3, 0, 0,
ffffff001791e1f8
)
ffffff001791e260 zap_lockdir+0x67(ffffff03f60bdcc0, 3, 0, 1, 1, 0,
ffffff001791e288)
ffffff001791e2f0 zap_lookup_norm+0x55(ffffff03f60bdcc0, 3, ffffff001791e720,
8
, 1, ffffff001791e438, 0, 0, 0, 0)
ffffff001791e350 zap_lookup+0x2d(ffffff03f60bdcc0, 3, ffffff001791e720, 8, 1,
ffffff001791e438)
ffffff001791e3d0 zfs_match_find+0xfd(ffffff0403008580, ffffff040aeb64b0,
ffffff001791e720, 0, 1, 0, 0, ffffff001791e438)
ffffff001791e4a0 zfs_dirent_lock+0x3d1(ffffff001791e4d8, ffffff040aeb64b0,
ffffff001791e720, ffffff001791e4d0, 6, 0, 0)
ffffff001791e540 zfs_dirlook+0xd9(ffffff040aeb64b0, ffffff001791e720,
ffffff001791e6f0, 1, 0, 0)
ffffff001791e5c0 zfs_lookup+0x25f(ffffff040b230300, ffffff001791e720,
ffffff001791e6f0, ffffff001791ea30, 1, ffffff03e1776d80, ffffff03f84053a0, 0,
0, 0)
ffffff001791e660 fop_lookup+0xed(ffffff040b230300, ffffff001791e720,
ffffff001791e6f0, ffffff001791ea30, 1, ffffff03e1776d80, ffffff03f84053a0, 0,
0, 0)
ffffff001791e8a0 lookuppnvp+0x281(ffffff001791ea30, 0, 1, ffffff001791ea20,
ffffff001791ec00, ffffff03e1776d80, ffffff040b230300, ffffff03f84053a0)
ffffff001791e940 lookuppnatcred+0x11b(ffffff001791ea30, 0, 1,
ffffff001791ea20
, ffffff001791ec00, 0, ffffff03f84053a0)
ffffff001791e9d0 lookuppnat+0x69(ffffff001791ea30, 0, 1, ffffff001791ea20,
ffffff001791ec00, 0)
ffffff001791eb70 vn_createat+0x13a(809cb08, 0, ffffff001791ec20, 0, 80,
ffffff001791ec00, 0, 2102, 12, 0)
ffffff001791ed30 vn_openat+0x1fb(809cb08, 0, 2102, 0, ffffff001791ed78, 0, 12
, 0, 3)
ffffff001791ee90 copen+0x435(ffd19553, 809cb08, 2102, 0)
ffffff001791eec0 open64+0x34(809cb08, 101, 0)
ffffff001791ef10 _sys_sysenter_post_swapgs+0x149()
-----Message d'origine-----
De : Arnaud Brand
Envoyé : samedi 16 janvier 2010 01:54
À : zfs-discuss@opensolaris.org
Objet : Zfs over iscsi bad status
I was testing zfs over iscsi (with commstar over a zvol) and got some errors.
Target and initiator are on the same host.
I've copy-pasted an excerpt on zpool status hereafter.
The pool (tank) containing the iscsi-shared zvol (tank/tsmvol) is healthy and
show no errors.
But the zpool (tsmvol) on the initiator side shows errors.
The processes that where running to generate some IO load/volume are stuck and
aren't killable.
Zpool destroy tsmvol, zpool export tsmvol, zfs list stay stuck.
I managed to open an ssh session by connecting as an user that has no homedir
and thus doesn't run quota (which is stuck and non killable too).
My questions are the following :
1 - why does the tsmvol pool show up as degraded when the only device it
contains is itself degraded ?
Shouldn't it show as faulted ?
2 - why are my load generating processes (plain old cat) unkillable ?
Shouldn't they have been killed by ioerrors caused by the dead pool ?
I guess this is what the failmode setting is for, so I'll set it for the rest
of my tests.
But I'm a bit puzzled about implications.
3 - how could there be errors in localhost transmission ?
I followed the basic steps outlined in commstar's how to, are there some
specific settings to set for localhost iscsi access ?
Thanks for your help,
Arnaud
Excerpt of zpool status :
pool: tank
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c10t0d0p0 ONLINE 0 0 0
c10t1d0p0 ONLINE 0 0 0
c10t2d0p0 ONLINE 0 0 0
c10t3d0p0 ONLINE 0 0 0
c10t4d0p0 ONLINE 0 0 0
c10t5d0p0 ONLINE 0 0 0
c10t6d0p0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
c10t7d0p0 ONLINE 0 0 0
c11t1d0p0 ONLINE 0 0 0
c11t2d0p0 ONLINE 0 0 0
c11t4d0p0 ONLINE 0 0 0
c11t5d0p0 ONLINE 0 0 0
c11t6d0p0 ONLINE 0 0 0
c11t7d0p0 ONLINE 0 0 0
logs
c11t0d0p2 ONLINE 0 0 0
cache
c11t0d0p3 ONLINE 0 0 0
errors: No known data errors
pool: tsmvol
state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://www.sun.com/msg/ZFS-8000-HC
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tsmvol DEGRADED 3 24,0K 0
c9t600144F05DF34C0000004B50D7D80001d0 DEGRADED 1 24,3K 1
too many errors
errors: 24614 data errors, use '-v' for a list
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss