On Nov 20, 2009, at 12:14 PM, Jesse Stroik wrote:
There are, of course, job types where you use the same set of data
for multiple jobs, but having even a small amount of extra memory
seems to be very helpful in that case, as you'll have several
nodes reading the same data at roughly the same time.
Yep. More, faster memory closer to the consumer is always better.
You could buy machines with TBs of RAM, but high-end x86 boxes top
out at 512 GB.
That was our previous approach. We're testing doing it with
relatively cheap, consumer-level Sun hardware (ie: machines with 64
or maybe 128GB of memory today) that can be easily expanded as the
pool's purpose changes.
I know what our options are for increasing performance if we want to
increase the budget. My question isn't, "I have this data set, can
you please tell me how to buy and configure a system." My question
is, "how does ZFS balance pools during writes, and how can I force
it to balance data I want balanced in the way I want it balanced?"
And if the answer to that question is, "you can't reliably do this,"
then that is acceptable. It's something I would like to be able to
plan around.
Writes (allocations) are biased towards freer (in the percentage
sense) of
fully functional vdevs. However, diversity for copies and affinity
for gang blocks
is preserved. The starting point for understanding this in the code
is at:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c
Right now, this storage node is very small (~100TB) and in testing.
I want to know how I can solve problems like this as we scale it up
into a full fledged SAN that holds a lot more data and gets moved
into production. Knowing the limitations of ZFS is a critical part
of properly designing and expanding the system.
More work is done to try and level across metaslabs when the metaslabs
have less than 30% free space. There may be a reasonable rule of thumb
lurking here somewhere, but I'm not sure it can be general enough as it
will depend, to some degree, on the workload.
This is pretty far down in the weeds... do many people think it would be
useful to describe this in human-grokable form?
Sometimes, ignorance is bliss :-)
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss