> Dennis Clarke wrote:
>>>>> It may be because it is blocked in kernel.
>>>>> Can you do something like this:
>>>>> echo "0t<pid of zpool import>::pid2proc|::walk thread|::findstack -v"
>>> So we see that it cannot complete import here and is waiting for
>>> transaction group to sync. So probably spa_sync thread is stuck, and we
>>> need to find out why.
>>
>> Well, the details are going to change, I had to reboot.  :-(
>>
>> I'll start up the stuck thread bug again here by simply starting over.
>> I'll bet you would be able to learn a few things if you were to ssh into
>> this machine. ?
>>
>> regardless, let's start over.
>>
>> dcla...@neptune:~$ uname -a
>> SunOS neptune 5.11 snv_111 i86pc i386 i86pc
>> dcla...@neptune:~$ uptime
>>   2:04pm  up 10:13,  1 user,  load average: 0.17, 0.16, 0.15
>> dcla...@neptune:~$ su -
>> Password:
>> Sun Microsystems Inc.   SunOS 5.11      snv_111 November 2008
>> #
>> # zpool import
>>   pool: foo
>>     id: 15989070886807735056
>>  state: ONLINE
>> status: The pool was last accessed by another system.
>> action: The pool can be imported using its name or numeric identifier
>> and
>>         the '-f' flag.
>>    see: http://www.sun.com/msg/ZFS-8000-EY
>> config:
>>
>>         foo         ONLINE
>>           c0d0p0    ONLINE
>> #
>>
>> please see ALL the details at :
>>
>> http://www.blastwave.org/dclarke/blog/files/kernel_thread_stuck.README
>
> There's a corrupted space map which is being updated as part of the txg
> sync; in order to update it (add a few free ops to the last block), we
> need to read in current content of the last block from disk first, and
> that fails because it is corrupted (as indicated by checksum errors in
> the fmdump output):
>
> eb5c9dc0 fec1f398        0   0  60 d38e1828
>    PC: _resume_from_idle+0xb1    THREAD: txg_sync_thread()
>    stack pointer for thread eb5c9dc0: eb5c9a28
>      swtch+0x188()
>      cv_wait+0x53()
>      zio_wait+0x55()
>      dbuf_read+0x201()
>      dbuf_will_dirty+0x30()
>      dmu_write+0xd7()
>      space_map_sync+0x304()
>      metaslab_sync+0x284()
>      vdev_sync+0xc6()
>      spa_sync+0x3d0()
>      txg_sync_thread+0x308()
>      thread_start+8()
>
> Victor

I had to cc that back onto the ZFS list, it may be of value here.

I agree that there is something wrong, no doubt, however we should not see
zpool import simply hang and become unresponsive nor should that pid be
unresponsive to a SIGKILL. Good behaviour should be the norm and that is
not what we see with a stuck kernel thread. Really, we should get some
response to the effect that "a device is corrupt" or similar.

Right now, what the user gets, is very little information other than a
non-responsive command.

 CTRL+C does nothing and kill -9 pid does nothing to this command.

feels like a bug to me

Dennis

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to