On Nov 20, 2009, at 10:16 AM, Jesse Stroik wrote:
Thanks for the suggestions thus far,
Erik:
In your case, where you had a 4 vdev stripe, and then added 3
vdevs, I would recommend re-copying the existing data to make sure
it now covers all 7 vdevs.
Yes, this was my initial reaction as well, but I am concerned with
the fact that I do not know how zfs populates the vdevs. My naive
guess is that it either fills the most empty, or (and more likely)
fills them at a rate relative to their amount of free space -- that
is, the new devices with more free space will get a disproportionate
amount of some of the data.
There is a bias towards empty vdevs during writes. However, that won't
help
data previously written. The often-requested block pointer rewrite
feature
could help rebalance, but do not expect it to be a trivial endeavor
for very
large pools.
Richard's suggestion, while tongue-in-cheek, has much merit. If
you are only going to be doing work on a small portion of your
total data set at once, but heavily hit that section, then you want
to read cache (L2ARC) as much of that as possible. Which means,
either buy lots of RAM, or get yourself an SSD. Good news is that
you can likely use one of the "cheaper" SSDs - the Intel X25-M is a
good fit here for a Readzilla.
The problem is that caching the data may often not help: we're
storing tens of terabytes of data for some instruments, and we may
only need to read each job worth of data once. So you could cache
the data, but it simply wouldn't be read again.
Use the secondarycache property to manage those file systems that
use read-once data.
There are, of course, job types where you use the same set of data
for multiple jobs, but having even a small amount of extra memory
seems to be very helpful in that case, as you'll have several nodes
reading the same data at roughly the same time.
Yep. More, faster memory closer to the consumer is always better.
You could buy machines with TBs of RAM, but high-end x86 boxes top
out at 512 GB.
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss