> Dennis Clarke wrote: >>> Dennis Clarke wrote: >>>>>>> It may be because it is blocked in kernel. >>>>>>> Can you do something like this: >>>>>>> echo "0t<pid of zpool import>::pid2proc|::walk thread|::findstack >>>>>>> -v" >>>>> So we see that it cannot complete import here and is waiting for >>>>> transaction group to sync. So probably spa_sync thread is stuck, and >>>>> we >>>>> need to find out why. >>>> Well, the details are going to change, I had to reboot. :-( >>>> >>>> I'll start up the stuck thread bug again here by simply starting over. >>>> I'll bet you would be able to learn a few things if you were to ssh >>>> into >>>> this machine. ? >>>> >>>> regardless, let's start over. >>>> >>>> dcla...@neptune:~$ uname -a >>>> SunOS neptune 5.11 snv_111 i86pc i386 i86pc >>>> dcla...@neptune:~$ uptime >>>> 2:04pm up 10:13, 1 user, load average: 0.17, 0.16, 0.15 >>>> dcla...@neptune:~$ su - >>>> Password: >>>> Sun Microsystems Inc. SunOS 5.11 snv_111 November 2008 >>>> # >>>> # zpool import >>>> pool: foo >>>> id: 15989070886807735056 >>>> state: ONLINE >>>> status: The pool was last accessed by another system. >>>> action: The pool can be imported using its name or numeric identifier >>>> and >>>> the '-f' flag. >>>> see: http://www.sun.com/msg/ZFS-8000-EY >>>> config: >>>> >>>> foo ONLINE >>>> c0d0p0 ONLINE >>>> # >>>> >>>> please see ALL the details at : >>>> >>>> http://www.blastwave.org/dclarke/blog/files/kernel_thread_stuck.README >>> There's a corrupted space map which is being updated as part of the txg >>> sync; in order to update it (add a few free ops to the last block), we >>> need to read in current content of the last block from disk first, and >>> that fails because it is corrupted (as indicated by checksum errors in >>> the fmdump output): >>> >>> eb5c9dc0 fec1f398 0 0 60 d38e1828 >>> PC: _resume_from_idle+0xb1 THREAD: txg_sync_thread() >>> stack pointer for thread eb5c9dc0: eb5c9a28 >>> swtch+0x188() >>> cv_wait+0x53() >>> zio_wait+0x55() >>> dbuf_read+0x201() >>> dbuf_will_dirty+0x30() >>> dmu_write+0xd7() >>> space_map_sync+0x304() >>> metaslab_sync+0x284() >>> vdev_sync+0xc6() >>> spa_sync+0x3d0() >>> txg_sync_thread+0x308() >>> thread_start+8() >>> >>> Victor >> >> I had to cc that back onto the ZFS list, it may be of value here. > > Sorry for that, I've just hit wrong button ;-) > >> I agree that there is something wrong, no doubt, however we should not >> see >> zpool import simply hang and become unresponsive nor should that pid be >> unresponsive to a SIGKILL. Good behaviour should be the norm and that is >> not what we see with a stuck kernel thread. Really, we should get some >> response to the effect that "a device is corrupt" or similar. >> >> Right now, what the user gets, is very little information other than a >> non-responsive command. >> >> CTRL+C does nothing and kill -9 pid does nothing to this command. >> >> feels like a bug to me > > Yes, it is: > > http://bugs.opensolaris.org/view_bug.do?bug_id=6758902
oh drat, I thought I hit something new :-\ Not very likely with ZFS, it is pretty well flushed out all the way into the dark corners I guess. Dennis _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss