On May 28, 2009, at 2:00 PM, Patrick Angeles wrote:
On Thu, May 28, 2009 at 10:24 AM, Brian Bockelman <[email protected]
>wrote:
We do both -- push the disk image out to NFS and have a mirrored
SAS hard
drives on the namenode. The SAS drives appear to be overkill.
This sounds like a nice approach, taking into account hardware,
labor and
downtime costs... $700 for a RAID controller seems reasonable to
minimize
maintenance due to a disk failure. Alex's suggestion to go JBOD and
write to
all volumes would work as well, but slightly more labor intensive.
Remember though that disk failure downtime is actually rather rare.
The question is "how tight is your hardware budget": if $700 is worth
the extra 1 day of uptime a year, then spend it. I come from an
academic background where (a) we don't lose money if things go down
and (b) jobs move to another site in the US if things are down. That
perhaps gives you a reading into my somewhat relaxed attitude.
I'm not a hardware guy anymore, but I'd personally prefer a software
RAID. I've seen mirrored disks go down because the RAID controller
decided to puke.
2. What is a good processor-to-storage ratio for a task node with
4TB of
raw storage? (The config above has 1 core per 1TB of raw storage.)
We're data hungry locally -- I'd put in bigger hard drives. The
1.5TB
Seagate drives seem to have passed their teething issues, and are
at a
pretty sweet price point. They only will scale up to 60 IOPS, so
make sure
your workflows don't have lots of random I/O.
I haven't seen too many vendors offering the 1.5TB option. What type
of data
are you working with? At what volumes? I sense that at 50GB/day, we
are
higher than average in terms of data volume over time.
We have just short of 300TB of raw disk; our daily downloads range
from a few GB to 10TB.
We bought 1.5TB drives separately from the nodes and sent students
with screwdrivers at the cluster.
As Steve mentions below, the rest is really up to your algorithm.
Do you
need 1 CPU second / byte? If so, buy more CPUs. Do you need .1
CPU second
/ MB? If so, buy more disks.
Unfortunately, we won't know until we have a cluster to test on.
Classic
catch-22. We are going to experiment with a small cluster and a
small data
set, with plans to buy more appropriately sized slave nodes based on
what we
learn.
In that case, you're probably good! 24TB probably formats out to
20TB. With 2x replication at 50GB a day, you've got enough room for
about half a year of data. Hope your procurement process isn't too
slow!
Brian