I am [trying to] perform a test prior to moving my data to solaris and zfs.  
Things are going very poorly.  Please suggest what I might do to understand 
what is going on, report a meaningful bug report, fix it, whatever!

Both to learn what the compression could be, and to induce a heavy load to 
expose issues, I am running with compress=gzip-9.

I have two machines, both identical 800MHz P3 with 768MB memory.  The disk 
complement and OS is different.  My current host is Suse Linux 10.2 (2.6.18 
kernel) running two 120GB drives under LVM.  My test machine is 2008.11 B2 with 
two 200GB drives on the motherboard secondary IDE, zfs mirroring them, NFS 
exported.

My "test" is to simply run "cp -rp * /testhome" on the Linux machine, where 
/testhome is the NFS mounted zfs file system on the Solaris system.

It starts out with "reasonable" throughput.  Although the heavy load makes the 
Solaris system pretty jerky and unresponsive, it does work.  The Linux system 
is a little jerky and unresponsive, I assume due to waiting for sluggish 
network responses.

After about 12 hours, the throughput has slowed to a crawl.  The Solaris 
machine takes a minute or more to respond to every character typed and mouse 
click.  The Linux machines is no longer jerky, which makes sense since it has 
to wait alot for Solaris.  Stuff is flowing, but throughput is in the range of 
100K bytes/second.

The Linux machine (available for tests) "gzip -9"ing a few multi-GB files seems 
to get 3MB/sec +/- 5% pretty consistently.  Being the exact same CPU, RAM 
(Including brand and model), Chipset, etc. I would expect should have similar 
throughput from ZFS.  This is in the right ballpark of what I saw when the copy 
first started.  In an hour or two it moved about 17GB.

I am also running a "vmstat" and a "top" to a log file.  Top reports total swap 
size as 512MB, 510 available.  vmstat for the first few hours reported 
something reasonable (it never seems to agree with top), but now is reporting 
around 570~580MB, and for a while was reporting well over 600MB free swap out 
of the 512M total!

I have gotten past a top memory leak (opensolaris.com bug 5482) and so am now 
running top only one iteration, in a shell for loop with a sleep instead of 
letting it repeat.  This was to be my test run to see it work.

What information can I capture and how can I capture it to figure this out?

My goal is to gain confidence in this system.  The idea is that Solaris and ZFS 
should be more reliable than Linux and LVM.  Although I have never lost data 
due to Linux problems, I have lost it due to disk failure, and zfs should cover 
that!

Thank you ahead for any ideas or suggestions.
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to