Re: [zfs-discuss] ZFS memory recommendations

Erik Trimble Wed, 19 May 2010 10:50:27 -0700

Bob Friesenhahn wrote:

On Wed, 19 May 2010, Deon Cui wrote:
http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance
It recommends that for every TB of storage you have you want 1GB ofRAM just for the metadata.
Interesting conclusion.
Is this really the case that ZFS metadata consumes so much RAM?
I'm currently building a storage server which will eventually hold upto 20TB of storage, I can't fit in 20GB of RAM on the motherboard!
Unless you do something like enable dedup (which is still risky touse), then there is no rule of thumb that I know of. ZFS will takeadvantage of available RAM. You should have at least 1GB of RAMavailable for ZFS to use. Beyond that, it depends entirely on thesize of your expected working set. The size of accessed files, therandomness of the access, the number of simultaneous accesses, and themaximum number of files per directory all make a difference to howmuch RAM you should have for good performance. If you have 200TB ofstored data, but only actually access 2GB of it at any one time, thenthe caching requirements are not very high.
Bob

I'd second Bob's notes here - for non-dedup purposes, you need at a verybare minimum of 512MB of RAM just for ZFS (Bob's recommendation of 1GBis much better, I'm quoting a real basement level beyond which you'reeffectively crippling ZFS).

The primary RAM consumption determination for pools without dedup is thesize of your active working set (as Bob mentioned). It's unrealistic toexpect to cache /all/ metadata for every file for large pools, and Ican't really see the worth in it anyhow (you end up with veryinfrequently-used metadata sitting in RAM, which gets evicted for use byother things in most cases). Storing any more metadata than what youneed for your working set isn't going to bring much performance bonus.What you need to have is sufficient RAM to cache your async writes(remember, this amount is relatively small in most cases - it's 3pending transactions per pool), plus enough RAM to hold your all thefiles (plus metadata) you expect to use (i.e. read more than once orwrite to) within about 5 minutes.


Here's three examples to show the differences (all without dedup):

(1) 100TB system which contains scientific data used in a data-miningapp. The system will need to frequently access very large amounts ofthe available data, but seldom writes much. As it is doing data-mining,a specific piece of data is read seldom, though the system needs to readlarge aggregate amounts continuously. In this case, you're pretty muchout of luck for caching. You'll need enough RAM to cache your maximumwrite size, and a little bit for read-ahead, but since you're accessingthe pool almost at random for large amounts of data which aren'tre-used, caching isn't going to help really at all. In this cases,1-2GB of RAM is likely all that really can be used.

(2) 1TB of data are being used for a Virtual Machine disk server. Thatis, the machine exports iSCSI (or FCoE, or NFS, or whatever) volumes foruse on client hardware to run a VM. Typically in this case, there arelots of effectively random read requests coming in for a bunch of "hot"files (which tend to be OS files in the VM-hosted OSes). There's alsofairly frequent write requests. However, the VMs will do a fair amountof read-caching of their own, so the amount of read requests is lowerthan one would think. For performance and administrative reasons, it islikely that you will want multiple pools, rather than a single largepool. In this case, you need a reasonable amount of write-cache for*each* pool, plus enough RAM to cache all of the OS files very oftenused for ALL the VMs. In this case, dedup would actually really helpRAM consumption, since it is highly likely that frequently-accessedfiles from multiple VMs are in fact identical, and thus with dedup,you'd only need to store one copy in the cache. In any case, hereyou'd need a few GB for the write caching, plus likely a dozen or moreGB for read caching, as your working set is moderately large, andfrequently re-used.

(3) 100TB of data for NFS home directory serving. Access pattern hereis likely highly random, with only small amounts of re-used data.However, you'll often have non-trivial write sizes. Having a ZIL isprobably a good idea, but in any case, you'll want a couple of GB (callit 3-4) for write caching per pool, and then several dozen MB per activeuser as read cache. That is, in this case, it's likely that yourdetermining factor is not total data size, but the number ofsimultaneous users, since the latter will dictate your frequency of fileaccess.

I'd say all of the recommendations/insights on the referenced link aregood, except for #1. The base amount of RAM is highly variable based onthe factors discussed above, and the blanket assumption that you need tocache all pool metadata isn't valid.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS memory recommendations

Reply via email to