On Fri, Jun 5, 2009 at 4:20 PM, Tim Haley <tim.ha...@sun.com> wrote: > Brent Jones wrote: >> >> Hello all, >> I had been running snv_106 for about 3 or 4 months on a pair of X4540's. >> I would ship snapshots from the primary server to the secondary server >> nightly, which was working really well. >> >> However, I have upgraded to 2009.06, and my replication scripts appear >> to "hang" when performing zfs send/recv. >> When one zfs send/recv process hangs, you cannot send any other >> snapshots from any other filesystem to the remote host. >> I have about 20 file systems I snapshots and replicate nightly. >> >> The script I use to perform the snapshots is here: >> http://www.brentrjones.com/wp-content/uploads/2009/03/replicate.ksh >> >> On the remote side, I end up with many "hung" processes, like this: >> >> bjones 11676 11661 0 01:30:03 ? 0:00 /sbin/zfs recv -vFd >> pdxfilu02 >> bjones 11673 11660 0 01:30:03 ? 0:00 /sbin/zfs recv -vFd >> pdxfilu02 >> bjones 11664 11653 0 01:30:03 ? 0:00 /sbin/zfs recv -vFd >> pdxfilu02 >> bjones 13727 13722 0 14:21:20 ? 0:00 /sbin/zfs recv -vFd >> pdxfilu02 >> >> And so on, one for each file system. >> >> On the receiving end, 'zfs list' shows one filesystem attempting to >> receive a snapshot, but I cannot stop it: >> >> $ zfs list >> NAME USED AVAIL REFER MOUNTPOINT >> pdxfilu02/data/fs01/%20090605-00:30:00 1.74G 27.2T 208G >> /pdxfilu02/data/fs01/%20090605-00:30:00 >> >> >> >> On the sending side, I CAN kill the ZFS send process, but the remote >> side leaves its processes going, and I CANNOT kill -9 them. I also >> cannot reboot the receiving system, at init 6, the system will just >> hang trying to unmount the file systems. >> I have to physically cut power to the server, but a couple days later, >> this issue will occur again. >> >> > A crash dump from the receiving server with the stuck receives would be > highly useful, if you can get it. Reboot -d would be best, but it might > just hang. You can try savecore -L. > > -tim > >> I'f I boot to my snv_106 BE, everything works fine, this issue has >> never occurred on that version. >> >> Any thoughts? >> > >
Well, I think I found a specific file system that is causing this. I kicked off a zpool scrub to see if there might be corruption on either end, but that takes well over 40 hours on these servers. -- Brent Jones br...@servuhome.net _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss