On Sun, Jun 7, 2009 at 3:50 AM, Ian Collins<i...@ianshome.com> wrote:
> Ian Collins wrote:
>>
>> Tim Haley wrote:
>>>
>>> Brent Jones wrote:
>>>>
>>>> On the sending side, I CAN kill the ZFS send process, but the remote
>>>> side leaves its processes going, and I CANNOT kill -9 them. I also
>>>> cannot reboot the receiving system, at init 6, the system will just
>>>> hang trying to unmount the file systems.
>>>> I have to physically cut power to the server, but a couple days later,
>>>> this issue will occur again.
>>>>
>>>>
>>> A crash dump from the receiving server with the stuck receives would be
>>> highly useful, if you can get it. Reboot -d would be best, but it might just
>>> hang. You can try savecore -L.
>>>
>> I tried a reboot -d (I even had kmem-flags=0xf set), but it did hang. I
>> didn't try savecore.
>>
>> One thing I didn't try was scat on the running system. What should I look
>> for (with scat) if this happens again?
>>
> I now have a system with a hanging zfs receive, any hints on debugging it?
>
> --
> Ian.

I haven't figured out a way to identify the problem, still trying to
find a 100% way to reproduce this problem.
Seemingly the more snapshots I send at a given time, the likelihood of
this happening goes up, but, correlation is not causation  :)

I might try to open a support case with Sun (have a support contract),
but Opensolaris doesn't seem to be well understood by the support
folks yet, so not sure how far it will get.

-- 
Brent Jones
br...@servuhome.net
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to