On Sun, Jun 7, 2009 at 3:50 AM, Ian Collins<i...@ianshome.com> wrote: > Ian Collins wrote: >> >> Tim Haley wrote: >>> >>> Brent Jones wrote: >>>> >>>> On the sending side, I CAN kill the ZFS send process, but the remote >>>> side leaves its processes going, and I CANNOT kill -9 them. I also >>>> cannot reboot the receiving system, at init 6, the system will just >>>> hang trying to unmount the file systems. >>>> I have to physically cut power to the server, but a couple days later, >>>> this issue will occur again. >>>> >>>> >>> A crash dump from the receiving server with the stuck receives would be >>> highly useful, if you can get it. Reboot -d would be best, but it might just >>> hang. You can try savecore -L. >>> >> I tried a reboot -d (I even had kmem-flags=0xf set), but it did hang. I >> didn't try savecore. >> >> One thing I didn't try was scat on the running system. What should I look >> for (with scat) if this happens again? >> > I now have a system with a hanging zfs receive, any hints on debugging it? > > -- > Ian.
I haven't figured out a way to identify the problem, still trying to find a 100% way to reproduce this problem. Seemingly the more snapshots I send at a given time, the likelihood of this happening goes up, but, correlation is not causation :) I might try to open a support case with Sun (have a support contract), but Opensolaris doesn't seem to be well understood by the support folks yet, so not sure how far it will get. -- Brent Jones br...@servuhome.net _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss