On Nov 20, 2009, at 10:16 AM, Jesse Stroik wrote:
Thanks for the suggestions thus far,

Erik:

In your case, where you had a 4 vdev stripe, and then added 3 vdevs, I would recommend re-copying the existing data to make sure it now covers all 7 vdevs.


Yes, this was my initial reaction as well, but I am concerned with the fact that I do not know how zfs populates the vdevs. My naive guess is that it either fills the most empty, or (and more likely) fills them at a rate relative to their amount of free space -- that is, the new devices with more free space will get a disproportionate amount of some of the data.

There is a bias towards empty vdevs during writes. However, that won't help data previously written. The often-requested block pointer rewrite feature could help rebalance, but do not expect it to be a trivial endeavor for very
large pools.

Richard's suggestion, while tongue-in-cheek, has much merit. If you are only going to be doing work on a small portion of your total data set at once, but heavily hit that section, then you want to read cache (L2ARC) as much of that as possible. Which means, either buy lots of RAM, or get yourself an SSD. Good news is that you can likely use one of the "cheaper" SSDs - the Intel X25-M is a good fit here for a Readzilla.


The problem is that caching the data may often not help: we're storing tens of terabytes of data for some instruments, and we may only need to read each job worth of data once. So you could cache the data, but it simply wouldn't be read again.

Use the secondarycache property to manage those file systems that
use read-once data.

There are, of course, job types where you use the same set of data for multiple jobs, but having even a small amount of extra memory seems to be very helpful in that case, as you'll have several nodes reading the same data at roughly the same time.

Yep. More, faster memory closer to the consumer is always better.
You could buy machines with TBs of RAM, but high-end x86 boxes top
out at 512 GB.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to