> Dennis Clarke wrote:
>>> Dennis Clarke wrote:
>>>>>>> It may be because it is blocked in kernel.
>>>>>>> Can you do something like this:
>>>>>>> echo "0t<pid of zpool import>::pid2proc|::walk thread|::findstack
>>>>>>> -v"
>>>>> So we see that it cannot complete import here and is waiting for
>>>>> transaction group to sync. So probably spa_sync thread is stuck, and
>>>>> we
>>>>> need to find out why.
>>>> Well, the details are going to change, I had to reboot.  :-(
>>>>
>>>> I'll start up the stuck thread bug again here by simply starting over.
>>>> I'll bet you would be able to learn a few things if you were to ssh
>>>> into
>>>> this machine. ?
>>>>
>>>> regardless, let's start over.
>>>>
>>>> dcla...@neptune:~$ uname -a
>>>> SunOS neptune 5.11 snv_111 i86pc i386 i86pc
>>>> dcla...@neptune:~$ uptime
>>>>   2:04pm  up 10:13,  1 user,  load average: 0.17, 0.16, 0.15
>>>> dcla...@neptune:~$ su -
>>>> Password:
>>>> Sun Microsystems Inc.   SunOS 5.11      snv_111 November 2008
>>>> #
>>>> # zpool import
>>>>   pool: foo
>>>>     id: 15989070886807735056
>>>>  state: ONLINE
>>>> status: The pool was last accessed by another system.
>>>> action: The pool can be imported using its name or numeric identifier
>>>> and
>>>>         the '-f' flag.
>>>>    see: http://www.sun.com/msg/ZFS-8000-EY
>>>> config:
>>>>
>>>>         foo         ONLINE
>>>>           c0d0p0    ONLINE
>>>> #
>>>>
>>>> please see ALL the details at :
>>>>
>>>> http://www.blastwave.org/dclarke/blog/files/kernel_thread_stuck.README
>>> There's a corrupted space map which is being updated as part of the txg
>>> sync; in order to update it (add a few free ops to the last block), we
>>> need to read in current content of the last block from disk first, and
>>> that fails because it is corrupted (as indicated by checksum errors in
>>> the fmdump output):
>>>
>>> eb5c9dc0 fec1f398        0   0  60 d38e1828
>>>    PC: _resume_from_idle+0xb1    THREAD: txg_sync_thread()
>>>    stack pointer for thread eb5c9dc0: eb5c9a28
>>>      swtch+0x188()
>>>      cv_wait+0x53()
>>>      zio_wait+0x55()
>>>      dbuf_read+0x201()
>>>      dbuf_will_dirty+0x30()
>>>      dmu_write+0xd7()
>>>      space_map_sync+0x304()
>>>      metaslab_sync+0x284()
>>>      vdev_sync+0xc6()
>>>      spa_sync+0x3d0()
>>>      txg_sync_thread+0x308()
>>>      thread_start+8()
>>>
>>> Victor
>>
>> I had to cc that back onto the ZFS list, it may be of value here.
>
> Sorry for that, I've just hit wrong button ;-)
>
>> I agree that there is something wrong, no doubt, however we should not
>> see
>> zpool import simply hang and become unresponsive nor should that pid be
>> unresponsive to a SIGKILL. Good behaviour should be the norm and that is
>> not what we see with a stuck kernel thread. Really, we should get some
>> response to the effect that "a device is corrupt" or similar.
>>
>> Right now, what the user gets, is very little information other than a
>> non-responsive command.
>>
>>  CTRL+C does nothing and kill -9 pid does nothing to this command.
>>
>> feels like a bug to me
>
> Yes, it is:
>
> http://bugs.opensolaris.org/view_bug.do?bug_id=6758902

oh drat, I thought I hit something new :-\

Not very likely with ZFS, it is pretty well flushed out all the way into
the dark corners I guess.

Dennis


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to