Something occurs to me: how full is your current 4 vdev pool? I'm assuming it's not over 70% or so.

yes, by adding another 3 vdevs, any writes will be biased towards the "empty" vdevs, but that's for less-than-full-stripe-width writes (right, Richard?). That is, if I'm doing a write that would be full-stripe size, and I've got enough space on all vdevs (even if certain ones are much fuller than others), then it will write across all vdevs.

So, while you can't get a virgin pool out of this, I think you can get stuff reasonably well-balanced by recopying then deleting say 1TB (or less) of data at a time.


Richard Elling wrote:
On Nov 20, 2009, at 12:14 PM, Jesse Stroik wrote:

There are, of course, job types where you use the same set of data for multiple jobs, but having even a small amount of extra memory seems to be very helpful in that case, as you'll have several nodes reading the same data at roughly the same time.
Yep. More, faster memory closer to the consumer is always better.
You could buy machines with TBs of RAM, but high-end x86 boxes top
out at 512 GB.


That was our previous approach. We're testing doing it with relatively cheap, consumer-level Sun hardware (ie: machines with 64 or maybe 128GB of memory today) that can be easily expanded as the pool's purpose changes.

I know what our options are for increasing performance if we want to increase the budget. My question isn't, "I have this data set, can you please tell me how to buy and configure a system." My question is, "how does ZFS balance pools during writes, and how can I force it to balance data I want balanced in the way I want it balanced?" And if the answer to that question is, "you can't reliably do this," then that is acceptable. It's something I would like to be able to plan around.
From a user's standpoint, you can't "force" ZFS to do the block layout in a manner you specify. The best you can do is understand what ZFS does in a given situation. There's no ability to TELL ZFS what to do.



Writes (allocations) are biased towards freer (in the percentage sense) of fully functional vdevs. However, diversity for copies and affinity for gang blocks is preserved. The starting point for understanding this in the code is at: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c


Right now, this storage node is very small (~100TB) and in testing. I want to know how I can solve problems like this as we scale it up into a full fledged SAN that holds a lot more data and gets moved into production. Knowing the limitations of ZFS is a critical part of properly designing and expanding the system.
For a lot of reasons, I would consider creating NEW zpools when you add new disk space in large lots, rather than adding vdevs to existing zpools. It should prove no harder to manage, and allows you to get a virgin zpool which will provide the best performance.




Sometimes, ignorance is bliss :-)
 -- richard
oooh, then I must be ecstatically happy!

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to