On Nov 20, 2009, at 12:14 PM, Jesse Stroik wrote:

There are, of course, job types where you use the same set of data for multiple jobs, but having even a small amount of extra memory seems to be very helpful in that case, as you'll have several nodes reading the same data at roughly the same time.
Yep. More, faster memory closer to the consumer is always better.
You could buy machines with TBs of RAM, but high-end x86 boxes top
out at 512 GB.


That was our previous approach. We're testing doing it with relatively cheap, consumer-level Sun hardware (ie: machines with 64 or maybe 128GB of memory today) that can be easily expanded as the pool's purpose changes.

I know what our options are for increasing performance if we want to increase the budget. My question isn't, "I have this data set, can you please tell me how to buy and configure a system." My question is, "how does ZFS balance pools during writes, and how can I force it to balance data I want balanced in the way I want it balanced?" And if the answer to that question is, "you can't reliably do this," then that is acceptable. It's something I would like to be able to plan around.

Writes (allocations) are biased towards freer (in the percentage sense) of fully functional vdevs. However, diversity for copies and affinity for gang blocks is preserved. The starting point for understanding this in the code is at:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c


Right now, this storage node is very small (~100TB) and in testing. I want to know how I can solve problems like this as we scale it up into a full fledged SAN that holds a lot more data and gets moved into production. Knowing the limitations of ZFS is a critical part of properly designing and expanding the system.

More work is done to try and level across metaslabs when the metaslabs
have less than 30% free space.  There may be a reasonable rule of thumb
lurking here somewhere, but I'm not sure it can be general enough as it
will depend, to some degree, on the workload.

This is pretty far down in the weeds... do many people think it would be
useful to describe this in human-grokable form?
Sometimes, ignorance is bliss :-)
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to