so many replies! let me try and cover most points.

1) the backblaze is certainly at one corner of the price/performance/cost 
manifold.
it is about $10K per box (i think we bought parts for 2 systems
and 20 extra drives for $22K). the parts are easier to get these days; one 
vendor
sells a kit with a bunch of odd, hard to get parts.

2) in our configuration, we have 90TB of disk and 5 1gbps ethernet ports, and 
we certainly plan
to use it as a nearline storage medium. (actually, within our milieu, we call 
these storage
tier a "parking lot", where the data is stored on disk, but slowish disk, and 
we typically
roll stuff in out of the parking lot onto our working disks/filesystems for 
real work.)

3) although i am not so worried about outright disk failure (i expect about one 
disk to fail
per year for the first 3 or so years), i am worried about silent data 
corruption,
which almost no RAID guards against. of course, i have been paranoid about this
for years (on the record, too), and at my rule of thumb of a file corruption
per 10TB-years, i expect a file to go corrupt every 6 weeks per box.

4) my likely solution is to have 22 4TB logical volumes, formed by striping
together 2 disks (striping is for performance). within the box, i will replicate
files onto different volumes. (so replication is at the file level, not block 
level.)
and a background process will continually verify files (each file has its md5 
checksum in its name).
the interface for file access to the box will likely be https and so it will be 
easy to
take a request and figure out a path to use to satisfy that request.

5) preliminary performance figures indicate that the SATA concentraters have a 
bandwidth
of ~120MB/s (and they serve 5 disks), so performance is quite modest. after one 
round
of measurements, i haven't seen any total write bandwidth exceeding 600MB/s
for the whole system. this will suit our needs, but others will find it lacking.

6) IOPS performance will likely be quite good, after all, there are 45 heads 
per box.
but i generally find my load is more like sequential file access, where 
bandwidth outweighs
IOPS.

7) power supplies.... doug's point is well taken. one thing i would investigate
hard before building more is fabricating the wiring harnass so as to connect to 
the modular
power supplies common today (such as my fave, OCZ fata1ity). then all the power
could come from one beefy supply, and improve the airflow a fair bit. i have 
found
power supplies quite reliable, and therefore don't worry too much about them 
failing.
in fact, given our application sits on top of the Ningaui cluster framework, i 
would handle
possible power supply failures by replicating files across a pair of 
backblazes, instead of within
each backblaze.

as always, thanks for the feedback.

------------------
Andrew Hume  (best -> Telework) +1 623-551-2845
and...@research.att.com  (Work) +1 none currently
AT&T Labs - Research; member of USENIX and LOPSA




_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to