On May 28, 2009, at 2:00 PM, Patrick Angeles wrote:

On Thu, May 28, 2009 at 10:24 AM, Brian Bockelman <[email protected] >wrote:


We do both -- push the disk image out to NFS and have a mirrored SAS hard
drives on the namenode.  The SAS drives appear to be overkill.


This sounds like a nice approach, taking into account hardware, labor and downtime costs... $700 for a RAID controller seems reasonable to minimize maintenance due to a disk failure. Alex's suggestion to go JBOD and write to
all volumes would work as well, but slightly more labor intensive.

Remember though that disk failure downtime is actually rather rare. The question is "how tight is your hardware budget": if $700 is worth the extra 1 day of uptime a year, then spend it. I come from an academic background where (a) we don't lose money if things go down and (b) jobs move to another site in the US if things are down. That perhaps gives you a reading into my somewhat relaxed attitude.

I'm not a hardware guy anymore, but I'd personally prefer a software RAID. I've seen mirrored disks go down because the RAID controller decided to puke.



2. What is a good processor-to-storage ratio for a task node with 4TB of
raw storage? (The config above has 1 core per 1TB of raw storage.)



We're data hungry locally -- I'd put in bigger hard drives. The 1.5TB Seagate drives seem to have passed their teething issues, and are at a pretty sweet price point. They only will scale up to 60 IOPS, so make sure
your workflows don't have lots of random I/O.


I haven't seen too many vendors offering the 1.5TB option. What type of data are you working with? At what volumes? I sense that at 50GB/day, we are
higher than average in terms of data volume over time.


We have just short of 300TB of raw disk; our daily downloads range from a few GB to 10TB.

We bought 1.5TB drives separately from the nodes and sent students with screwdrivers at the cluster.


As Steve mentions below, the rest is really up to your algorithm. Do you need 1 CPU second / byte? If so, buy more CPUs. Do you need .1 CPU second
/ MB?  If so, buy more disks.


Unfortunately, we won't know until we have a cluster to test on. Classic catch-22. We are going to experiment with a small cluster and a small data set, with plans to buy more appropriately sized slave nodes based on what we
learn.


In that case, you're probably good! 24TB probably formats out to 20TB. With 2x replication at 50GB a day, you've got enough room for about half a year of data. Hope your procurement process isn't too slow!

Brian

Reply via email to