> I am [trying to] perform a test prior to moving my > data to solaris and zfs. Things are going very > poorly. Please suggest what I might do to understand > what is going on, report a meaningful bug report, fix > it, whatever! > > Both to learn what the compression could be, and to > induce a heavy load to expose issues, I am running > with compress=gzip-9. > > I have two machines, both identical 800MHz P3 with > 768MB memory. The disk complement and OS is > different. My current host is Suse Linux 10.2 > (2.6.18 kernel) running two 120GB drives under LVM. > My test machine is 2008.11 B2 with two 200GB drives > on the motherboard secondary IDE, zfs mirroring > them, NFS exported. > > My "test" is to simply run "cp -rp * /testhome" on > the Linux machine, where /testhome is the NFS mounted > zfs file system on the Solaris system. > > It starts out with "reasonable" throughput. Although > the heavy load makes the Solaris system pretty jerky > and unresponsive, it does work. The Linux system is > a little jerky and unresponsive, I assume due to > waiting for sluggish network responses. > > After about 12 hours, the throughput has slowed to a > crawl. The Solaris machine takes a minute or more to > respond to every character typed and mouse click. > The Linux machines is no longer jerky, which makes > sense since it has to wait alot for Solaris. Stuff > is flowing, but throughput is in the range of 100K > bytes/second. > > The Linux machine (available for tests) "gzip -9"ing > a few multi-GB files seems to get 3MB/sec +/- 5% > pretty consistently. Being the exact same CPU, RAM > (Including brand and model), Chipset, etc. I would > expect should have similar throughput from ZFS. This > is in the right ballpark of what I saw when the copy > first started. In an hour or two it moved about > 17GB. > > I am also running a "vmstat" and a "top" to a log > file. Top reports total swap size as 512MB, 510 > available. vmstat for the first few hours reported > something reasonable (it never seems to agree with > top), but now is reporting around 570~580MB, and for > a while was reporting well over 600MB free swap out > of the 512M total! > > I have gotten past a top memory leak (opensolaris.com > bug 5482) and so am now running top only one > iteration, in a shell for loop with a sleep instead > of letting it repeat. This was to be my test run to > see it work. > > What information can I capture and how can I capture > it to figure this out? > > My goal is to gain confidence in this system. The > idea is that Solaris and ZFS should be more reliable > than Linux and LVM. Although I have never lost data > due to Linux problems, I have lost it due to disk > failure, and zfs should cover that! > > Thank you ahead for any ideas or suggestions.
Solaris reports "virtual memory" as the sum of physical memory and page file - so this is where your strange vmstat output comes from. Running ZFS stress tests on a system with only 768MB of memory is not a good idea since ZFS uses large amounts of memory for its cache. You can limit the size of the ARC (Adaptive Replacement Cache) using the details here: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache Try limiting the ARC size then run the test again - if this works then memory contention is the cause of the slowdown. Also, NFS to ZFS filesystems will run slowly under certain conditions -including with the default configuration. See this link for more information: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes Cheers Andrew. -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss