so many replies! let me try and cover most points. 1) the backblaze is certainly at one corner of the price/performance/cost manifold. it is about $10K per box (i think we bought parts for 2 systems and 20 extra drives for $22K). the parts are easier to get these days; one vendor sells a kit with a bunch of odd, hard to get parts.
2) in our configuration, we have 90TB of disk and 5 1gbps ethernet ports, and we certainly plan to use it as a nearline storage medium. (actually, within our milieu, we call these storage tier a "parking lot", where the data is stored on disk, but slowish disk, and we typically roll stuff in out of the parking lot onto our working disks/filesystems for real work.) 3) although i am not so worried about outright disk failure (i expect about one disk to fail per year for the first 3 or so years), i am worried about silent data corruption, which almost no RAID guards against. of course, i have been paranoid about this for years (on the record, too), and at my rule of thumb of a file corruption per 10TB-years, i expect a file to go corrupt every 6 weeks per box. 4) my likely solution is to have 22 4TB logical volumes, formed by striping together 2 disks (striping is for performance). within the box, i will replicate files onto different volumes. (so replication is at the file level, not block level.) and a background process will continually verify files (each file has its md5 checksum in its name). the interface for file access to the box will likely be https and so it will be easy to take a request and figure out a path to use to satisfy that request. 5) preliminary performance figures indicate that the SATA concentraters have a bandwidth of ~120MB/s (and they serve 5 disks), so performance is quite modest. after one round of measurements, i haven't seen any total write bandwidth exceeding 600MB/s for the whole system. this will suit our needs, but others will find it lacking. 6) IOPS performance will likely be quite good, after all, there are 45 heads per box. but i generally find my load is more like sequential file access, where bandwidth outweighs IOPS. 7) power supplies.... doug's point is well taken. one thing i would investigate hard before building more is fabricating the wiring harnass so as to connect to the modular power supplies common today (such as my fave, OCZ fata1ity). then all the power could come from one beefy supply, and improve the airflow a fair bit. i have found power supplies quite reliable, and therefore don't worry too much about them failing. in fact, given our application sits on top of the Ningaui cluster framework, i would handle possible power supply failures by replicating files across a pair of backblazes, instead of within each backblaze. as always, thanks for the feedback. ------------------ Andrew Hume (best -> Telework) +1 623-551-2845 and...@research.att.com (Work) +1 none currently AT&T Labs - Research; member of USENIX and LOPSA
_______________________________________________ Tech mailing list Tech@lists.lopsa.org https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/