Richard Elling wrote:
Buy a large, read-optimized SSD (or several) and add it as a cache device :-)
 -- richard

On Nov 20, 2009, at 8:44 AM, Jesse Stroik wrote:

I'm migrating to ZFS and Solaris for cluster computing storage, and have some completely static data sets that need to be as fast as possible. One of the scenarios I'm testing is the addition of vdevs to a pool.

Starting out, I populated a pool that had 4 vdevs. Then, I added 3 more vdevs and would like to balance this data across the pool for performance. The data may be in subdirectories like this: /proxy_data/instrument_X/domain_Y. Because of the access pattern across the cluster, I need these subdirectories each spread across as many disks as possible. Simply putting the data evenly on all vdevs is suboptimal because it is likely the case that different files within a single domain from a single instrument may be used with 200 jobs at once.

Because this particular data is 100% static, I cannot count on reads/writes automatically balancing the pool.

Best,
Jesse Stroik
OK, maybe I'm missing something here, but ZFS should spread ALL data across ALL vdevs, with the caveaut that very small files (under the min stripe size) will only be found on a portion of the vdevs - that is, such small files will only take up a portion of a single stripe. The directory structure is irrelevant to how the data is written.

That is, the only determination as to how the file /foo/bar/baz is located on disk is the actual file size of baz itself, and the level of fragmentation of the zpool. For static write-once data like yours, fragmentation shouldn't be an issue.

In your case, where you had a 4 vdev stripe, and then added 3 vdevs, I would recommend re-copying the existing data to make sure it now covers all 7 vdevs.

Thus, I'd do something like:

% cd /proxy_data
% for i in instrument_*
do
   mkdir ${i}.new
   rsync -a $i/ ${i}.new/
   rm -rf $i
   mv ${i}.new $i
done




Richard's suggestion, while tongue-in-cheek, has much merit. If you are only going to be doing work on a small portion of your total data set at once, but heavily hit that section, then you want to read cache (L2ARC) as much of that as possible. Which means, either buy lots of RAM, or get yourself an SSD. Good news is that you can likely use one of the "cheaper" SSDs - the Intel X25-M is a good fit here for a Readzilla.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to