Thanks for the suggestions thus far,

Erik:

In your case, where you had a 4 vdev stripe, and then added 3 vdevs, I would recommend re-copying the existing data to make sure it now covers all 7 vdevs.


Yes, this was my initial reaction as well, but I am concerned with the fact that I do not know how zfs populates the vdevs. My naive guess is that it either fills the most empty, or (and more likely) fills them at a rate relative to their amount of free space -- that is, the new devices with more free space will get a disproportionate amount of some of the data.



Richard's suggestion, while tongue-in-cheek, has much merit. If you are only going to be doing work on a small portion of your total data set at once, but heavily hit that section, then you want to read cache (L2ARC) as much of that as possible. Which means, either buy lots of RAM, or get yourself an SSD. Good news is that you can likely use one of the "cheaper" SSDs - the Intel X25-M is a good fit here for a Readzilla.


The problem is that caching the data may often not help: we're storing tens of terabytes of data for some instruments, and we may only need to read each job worth of data once. So you could cache the data, but it simply wouldn't be read again.

There are, of course, job types where you use the same set of data for multiple jobs, but having even a small amount of extra memory seems to be very helpful in that case, as you'll have several nodes reading the same data at roughly the same time.

Best,
Jesse
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to