Dennis Clarke wrote:
Dennis Clarke wrote:
It may be because it is blocked in kernel.
Can you do something like this:
echo "0t<pid of zpool import>::pid2proc|::walk thread|::findstack -v"
So we see that it cannot complete import here and is waiting for
transaction group to sync. So probably spa_sync thread is stuck, and we
need to find out why.
Well, the details are going to change, I had to reboot.  :-(

I'll start up the stuck thread bug again here by simply starting over.
I'll bet you would be able to learn a few things if you were to ssh into
this machine. ?

regardless, let's start over.

dcla...@neptune:~$ uname -a
SunOS neptune 5.11 snv_111 i86pc i386 i86pc
dcla...@neptune:~$ uptime
  2:04pm  up 10:13,  1 user,  load average: 0.17, 0.16, 0.15
dcla...@neptune:~$ su -
Password:
Sun Microsystems Inc.   SunOS 5.11      snv_111 November 2008
#
# zpool import
  pool: foo
    id: 15989070886807735056
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier
and
        the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

        foo         ONLINE
          c0d0p0    ONLINE
#

please see ALL the details at :

http://www.blastwave.org/dclarke/blog/files/kernel_thread_stuck.README
There's a corrupted space map which is being updated as part of the txg
sync; in order to update it (add a few free ops to the last block), we
need to read in current content of the last block from disk first, and
that fails because it is corrupted (as indicated by checksum errors in
the fmdump output):

eb5c9dc0 fec1f398        0   0  60 d38e1828
   PC: _resume_from_idle+0xb1    THREAD: txg_sync_thread()
   stack pointer for thread eb5c9dc0: eb5c9a28
     swtch+0x188()
     cv_wait+0x53()
     zio_wait+0x55()
     dbuf_read+0x201()
     dbuf_will_dirty+0x30()
     dmu_write+0xd7()
     space_map_sync+0x304()
     metaslab_sync+0x284()
     vdev_sync+0xc6()
     spa_sync+0x3d0()
     txg_sync_thread+0x308()
     thread_start+8()

Victor

I had to cc that back onto the ZFS list, it may be of value here.

Sorry for that, I've just hit wrong button ;-)

I agree that there is something wrong, no doubt, however we should not see
zpool import simply hang and become unresponsive nor should that pid be
unresponsive to a SIGKILL. Good behaviour should be the norm and that is
not what we see with a stuck kernel thread. Really, we should get some
response to the effect that "a device is corrupt" or similar.

Right now, what the user gets, is very little information other than a
non-responsive command.

 CTRL+C does nothing and kill -9 pid does nothing to this command.

feels like a bug to me

Yes, it is:

http://bugs.opensolaris.org/view_bug.do?bug_id=6758902

Regards,
Victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to