Gary Mills wrote:
We have an IMAP e-mail server running on a Solaris 10 10/09 system.
It uses six ZFS filesystems built on a single zpool with 14 daily
snapshots. Every day at 11:56, a cron command destroys the oldest
snapshots and creates new ones, both recursively. For about four
minutes thereafter, the load average drops and I/O to the disk devices
drops to almost zero. Then, the load average shoots up to about ten
times normal and then declines to normal over about four minutes, as
disk activity resumes. The statistics return to their normal state
about ten minutes after the cron command runs.
Is it destroying old snapshots or creating new ones that causes this
dead time? What does each of these procedures do that could affect
the system? What can I do to make this less visible to users?
Creating a snapshot shouldn't do anything much more than a regular
transaction group commit, which should be happening at least every 30
seconds anyway.
Deleting a snapshot potentially results in freeing up the space occupied
by files/blocks which aren't in any other snapshots. One way to think of
this is that when you're using regular snapshots, the freeing up of
space which happens when you delete files is in effect all deferred
until you destroy the snapshot(s) which also refer to that space, which
has the effect of bunching all your space freeing.
If this is the cause (a big _if_, as I'm just speculating), then it
might be a good idea to:
a) spread out the deleting of the snapshots, and
b) create more snapshots more often (and conversely delete more
snapshots, more often), so each one contains fewer accumulated space to
be freed off.
--
Andrew
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss