Something occurs to me: how full is your current 4 vdev pool? I'm
assuming it's not over 70% or so.
yes, by adding another 3 vdevs, any writes will be biased towards the
"empty" vdevs, but that's for less-than-full-stripe-width writes (right,
Richard?). That is, if I'm doing a write that would be full-stripe
size, and I've got enough space on all vdevs (even if certain ones are
much fuller than others), then it will write across all vdevs.
So, while you can't get a virgin pool out of this, I think you can get
stuff reasonably well-balanced by recopying then deleting say 1TB (or
less) of data at a time.
Richard Elling wrote:
On Nov 20, 2009, at 12:14 PM, Jesse Stroik wrote:
There are, of course, job types where you use the same set of data
for multiple jobs, but having even a small amount of extra memory
seems to be very helpful in that case, as you'll have several nodes
reading the same data at roughly the same time.
Yep. More, faster memory closer to the consumer is always better.
You could buy machines with TBs of RAM, but high-end x86 boxes top
out at 512 GB.
That was our previous approach. We're testing doing it with
relatively cheap, consumer-level Sun hardware (ie: machines with 64
or maybe 128GB of memory today) that can be easily expanded as the
pool's purpose changes.
I know what our options are for increasing performance if we want to
increase the budget. My question isn't, "I have this data set, can
you please tell me how to buy and configure a system." My question
is, "how does ZFS balance pools during writes, and how can I force it
to balance data I want balanced in the way I want it balanced?" And
if the answer to that question is, "you can't reliably do this," then
that is acceptable. It's something I would like to be able to plan
around.
From a user's standpoint, you can't "force" ZFS to do the block layout
in a manner you specify. The best you can do is understand what ZFS
does in a given situation. There's no ability to TELL ZFS what to do.
Writes (allocations) are biased towards freer (in the percentage
sense) of
fully functional vdevs. However, diversity for copies and affinity
for gang blocks
is preserved. The starting point for understanding this in the code
is at:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c
Right now, this storage node is very small (~100TB) and in testing.
I want to know how I can solve problems like this as we scale it up
into a full fledged SAN that holds a lot more data and gets moved
into production. Knowing the limitations of ZFS is a critical part
of properly designing and expanding the system.
For a lot of reasons, I would consider creating NEW zpools when you add
new disk space in large lots, rather than adding vdevs to existing
zpools. It should prove no harder to manage, and allows you to get a
virgin zpool which will provide the best performance.
Sometimes, ignorance is bliss :-)
-- richard
oooh, then I must be ecstatically happy!
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss