On Aug 11, 2012, at 5:33 PM, Chad Leigh - Pengar LLC wrote: > Hi > > I have a FreeBSD 9 system with ZFS root. It is actually a VM under Xen on a > beefy piece of HW (4 core Sandy Bridge 3ghz Xeon, total HW memory 32GB -- VM > has 4vcpus and 6GB RAM). Mirrored gpart partitions. I am looking for data > integrity more than performance as long as performance is reasonable (which > it has more than been the last 3 months). > > The other "servers" on the same HW, the other VMs on the same, don't have > this problem but are set up the same way. There are 4 other FreeBSD VMs, one > running email for a one man company and a few of his friends, as well as some > static web pages and stuff for him, one runs a few low use web apps for > various customers, and one runs about 30 websites with apache and nginx, > mostly just static sites. None are heavily used. There is also one VM with > linux running a couple low use FrontBase databases. Not high use database > -- low use ones. > > The troubleseome VM has been running fine for over 3 months since I > installed it. Level of use has been pretty much constant. The server > runs 4 jails on it, each dedicated to a different bit of email processing for > a small number of users. One is a secondary DNS. One runs clamav and > spamassassin. One runs exim for incoming and outgoing mail. One runs > dovecot for imap and pop. There is no web server or database or anything > else running. > > Total number of mail users on the system is approximately 50, plus or minus. > Total mail traffic is very low compared to "real" mail servers. > > Earlier this week things started "freezing up". It might last a few minutes, > or it might last 1/2 hour. Processes become unresponsive. This can last a > few minutes or much longer. It eventually resolves itself and things are > good for another 10 minutes or 3 hours until it happens again. When it > happens, lots of processes are listed in "top" as > > zfs > zio->i > zfs > tx->tx > db->db > > state. These processes only get listed in these states when there are > problems. What are these states indicative of? >
Ok, after much reading of ZFS blog posts, forum postings, email list postings, and trying stuff out, I seem to have gotten stuff back down to normal and reasonable performance. In case anyone has similar issues in a similar circumstance, here is what I did. Some of these may have had little or no effect but this is what was changed. The biggest effect was when I did the following: vfs.zfs.zfetch.block_cap from default 256 down to 64 This was like night and day. The idea to try this from a post by user "madtrader" in the forum http://forums.sagetv.com/forums/showthread.php?t=43830&page=2 . He was recording multiple streams of HD video and trying to play HD video off a stream from the same server/ZFS file system. Also, setting vfs.zfs.write_limit_override to something other than the default disabled "0" seems to have had a relatively significant effect. Before I worked with the "block_cap" above, I was focussing on this and had tried everything from 64M to 768M. It is currently set to 576M and is around the area where I was having best results on my system with my amount of RAM (6GB). I tried 512M and had good results and then 768M, which was still good but not quite as good as far as I could tell from testing. So I went with 576M on my last attempt and then added in the block_cap and things really are pretty much back to normal. I turned on vdev caching vfs.zfs.vdev.cache.size form 0 to 10M. Don't know if it helped. I also lowered vfs.zfs.txg.timeout from 5 to 3. This seems to have had a slightly noticeable effect. I also adjusted vfs.zfs.arc_max The default of 0 (meaning system self set) seemed to result in an actual value of around 75-80% of RAM, which seemed high. I ended up setting it at 3072M, which for me seems to work well. Don't know what the overall effect on the problem was though. Thanks Chad