On Thu, Mar 04, 2010 at 07:51:13PM -0300, Giovanni Tirloni wrote: > > On Thu, Mar 4, 2010 at 7:28 PM, Ian Collins <[1]...@ianshome.com> > wrote: > > Gary Mills wrote: > > We have an IMAP e-mail server running on a Solaris 10 10/09 system. > It uses six ZFS filesystems built on a single zpool with 14 daily > snapshots. Every day at 11:56, a cron command destroys the oldest > snapshots and creates new ones, both recursively. For about four > minutes thereafter, the load average drops and I/O to the disk > devices > drops to almost zero. Then, the load average shoots up to about > ten > times normal and then declines to normal over about four minutes, > as > disk activity resumes. The statistics return to their normal state > about ten minutes after the cron command runs. > Is it destroying old snapshots or creating new ones that causes > this > dead time? What does each of these procedures do that could affect > the system? What can I do to make this less visible to users? > > I have a couple of Solaris 10 boxes that do something similar > (hourly snaps) and I've never seen any lag in creating and > destroying snapshots. One system with 16 filesystems takes 5 > seconds to destroy the 16 oldest snaps and create 5 recursive new > ones. I logged load average on these boxes and there is a small > spike on the hour, but this is down to sending the snaps, not > creating them. > > We've seen the behaviour that Gary describes while destroying datasets > recursively (>600GB and with 7 snapshots). It seems that close to the > end the server stalls for 10-15 minutes and NFS activity stops. For > small datasets/snapshots that doesn't happen or is harder to notice. > Does ZFS have to do something special when it's done releasing the > data blocks at the end of the destroy operation ?
That does sound similar to the problem here. The zpool is 3 TB in size with about 1.4 TB used. It does sound as if the stall happens during the `zfs destroy -r' rather than during the `zfs snapshot -r'. What can zfs be doing when the CPU load average drops and disk I/O is close to zero? I also had peculiar problem here recently when I was upgrading the ZFS filesystems on our test server from 3 to 4. When I tried `zfs upgrade -a', the command hung for a long time and could not be interrupted, killed, or traced. Eventually it terminated on its own. Only the two upper-level filesystems had been upgraded. I upgraded the lower- level ones individually with `zfs upgrade' with no further problems. I had previously upgraded the zpool with no problems. I don't know if this behavior is related to the stall on the production server. I haven't attempted the upgrades there yet. -- -Gary Mills- -Unix Group- -Computer and Network Services- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss