Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

andrew Sat, 29 Nov 2008 08:40:38 -0800

> I am [trying to] perform a test prior to moving my
> data to solaris and zfs.  Things are going very
> poorly.  Please suggest what I might do to understand
> what is going on, report a meaningful bug report, fix
> it, whatever!
> 
> Both to learn what the compression could be, and to
> induce a heavy load to expose issues, I am running
> with compress=gzip-9.
> 
> I have two machines, both identical 800MHz P3 with
> 768MB memory.  The disk complement and OS is
> different.  My current host is Suse Linux 10.2
> (2.6.18 kernel) running two 120GB drives under LVM.
> My test machine is 2008.11 B2 with two 200GB drives
> on the motherboard secondary IDE, zfs mirroring
>  them, NFS exported.
> 
> My "test" is to simply run "cp -rp * /testhome" on
> the Linux machine, where /testhome is the NFS mounted
> zfs file system on the Solaris system.
> 
> It starts out with "reasonable" throughput.  Although
> the heavy load makes the Solaris system pretty jerky
> and unresponsive, it does work.  The Linux system is
> a little jerky and unresponsive, I assume due to
> waiting for sluggish network responses.
> 
> After about 12 hours, the throughput has slowed to a
> crawl.  The Solaris machine takes a minute or more to
> respond to every character typed and mouse click.
> The Linux machines is no longer jerky, which makes
> sense since it has to wait alot for Solaris.  Stuff
> is flowing, but throughput is in the range of 100K
>  bytes/second.
> 
> The Linux machine (available for tests) "gzip -9"ing
> a few multi-GB files seems to get 3MB/sec +/- 5%
> pretty consistently.  Being the exact same CPU, RAM
> (Including brand and model), Chipset, etc. I would
> expect should have similar throughput from ZFS.  This
> is in the right ballpark of what I saw when the copy
> first started.  In an hour or two it moved about
> 17GB.
> 
> I am also running a "vmstat" and a "top" to a log
> file.  Top reports total swap size as 512MB, 510
> available.  vmstat for the first few hours reported
> something reasonable (it never seems to agree with
> top), but now is reporting around 570~580MB, and for
> a while was reporting well over 600MB free swap out
> of the 512M total!
> 
> I have gotten past a top memory leak (opensolaris.com
> bug 5482) and so am now running top only one
> iteration, in a shell for loop with a sleep instead
> of letting it repeat.  This was to be my test run to
> see it work.
> 
> What information can I capture and how can I capture
> it to figure this out?
> 
> My goal is to gain confidence in this system.  The
> idea is that Solaris and ZFS should be more reliable
> than Linux and LVM.  Although I have never lost data
> due to Linux problems, I have lost it due to disk
> failure, and zfs should cover that!
> 
> Thank you ahead for any ideas or suggestions.


Solaris reports "virtual memory" as the sum of physical memory and page file - 
so this is where your strange vmstat output comes from. Running ZFS stress 
tests on a system with only 768MB of memory is not a good idea since ZFS uses 
large amounts of memory for its cache. You can limit the size of the ARC 
(Adaptive Replacement Cache) using the details here:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache

Try limiting the ARC size then run the test again - if this works then memory 
contention is the cause of the slowdown.

Also, NFS to ZFS filesystems will run slowly under certain conditions 
-including with the default configuration. See this link for more information:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes

Cheers

Andrew.
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Reply via email to