On Thu, Mar 04, 2010 at 07:51:13PM -0300, Giovanni Tirloni wrote:
> 
>    On Thu, Mar 4, 2010 at 7:28 PM, Ian Collins <[1]...@ianshome.com>
>    wrote:
>    
>    Gary Mills wrote:
>    
>      We have an IMAP e-mail server running on a Solaris 10 10/09 system.
>      It uses six ZFS filesystems built on a single zpool with 14 daily
>      snapshots.  Every day at 11:56, a cron command destroys the oldest
>      snapshots and creates new ones, both recursively.  For about four
>      minutes thereafter, the load average drops and I/O to the disk
>      devices
>      drops to almost zero.  Then, the load average shoots up to about
>      ten
>      times normal and then declines to normal over about four minutes,
>      as
>      disk activity resumes.  The statistics return to their normal state
>      about ten minutes after the cron command runs.
>      Is it destroying old snapshots or creating new ones that causes
>      this
>      dead time?  What does each of these procedures do that could affect
>      the system?  What can I do to make this less visible to users?
>      
>      I have a couple of Solaris 10 boxes that do something similar
>      (hourly snaps) and I've never seen any lag in creating and
>      destroying snapshots.  One system with 16 filesystems takes 5
>      seconds to destroy the 16 oldest snaps and create 5 recursive new
>      ones.  I logged load average on these boxes and there is a small
>      spike on the hour, but this is down to sending the snaps, not
>      creating them.
>      
>    We've seen the behaviour that Gary describes while destroying datasets
>    recursively (>600GB and with 7 snapshots). It seems that close to the
>    end the server stalls for 10-15 minutes and NFS activity stops. For
>    small datasets/snapshots that doesn't happen or is harder to notice.
>    Does ZFS have to do something special when it's done releasing the
>    data blocks at the end of the destroy operation ?

That does sound similar to the problem here.  The zpool is 3 TB in
size with about 1.4 TB used.  It does sound as if the stall happens
during the `zfs destroy -r' rather than during the `zfs snapshot -r'.
What can zfs be doing when the CPU load average drops and disk I/O is
close to zero?

I also had peculiar problem here recently when I was upgrading the ZFS
filesystems on our test server from 3 to 4.  When I tried `zfs upgrade
-a', the command hung for a long time and could not be interrupted,
killed, or traced.  Eventually it terminated on its own.  Only the two
upper-level filesystems had been upgraded.  I upgraded the lower-
level ones individually with `zfs upgrade' with no further problems.
I had previously upgraded the zpool with no problems.  I don't know if
this behavior is related to the stall on the production server.  I
haven't attempted the upgrades there yet.

-- 
-Gary Mills-        -Unix Group-        -Computer and Network Services-
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to