Tim Haley wrote:
Ian Collins wrote:
Ian Collins wrote:
Tim Haley wrote:
Brent Jones wrote:

On the sending side, I CAN kill the ZFS send process, but the remote
side leaves its processes going, and I CANNOT kill -9 them. I also
cannot reboot the receiving system, at init 6, the system will just
hang trying to unmount the file systems.
I have to physically cut power to the server, but a couple days later,
this issue will occur again.


A crash dump from the receiving server with the stuck receives would be highly useful, if you can get it. Reboot -d would be best, but it might just hang. You can try savecore -L.

I tried a reboot -d (I even had kmem-flags=0xf set), but it did hang. I didn't try savecore.

One thing I didn't try was scat on the running system. What should I look for (with scat) if this happens again?

I now have a system with a hanging zfs receive, any hints on debugging it?


If you've got it stuck, but can still do things on the console, then
run 'mdb -K' on the console and type '::stacks -m zfs'. That will summarize all threads running in the kernel related to zfs. Perhaps there will be a clue in the stacks of the receive(s) as to where they are stuck.

I've seen this again on Solaris 10u7. I'm doing a couple of full sends from one pool to another on the same host. One of the sends stopped while the other completed. All zfs commands on the source pool now hang, the destination pool is OK.

::stacks isn't recognised by mdb on Solaris 10, is there an alternative?

Also, make sure the spa is healthy and not suspended. This is an example on one of my machines.

There were.

--
Ian.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to