On Mon, 2009-08-03 at 21:23 -1000, Tim Newsham wrote: > > 2. do we have anybody successfully managing that much storage that is > > also spread across the nodes? And if so, what's the best practices > > out there to make the client not worry about where does the storage > > actually come from (IOW, any kind of proxying of I/O, etc) > > http://labs.google.com/papers/gfs.html > http://hadoop.apache.org/common/docs/current/hdfs_design.html > > > I'm trying to see how the life after NFSv4 or AFS might look like for > > the clients still clinging to the old ways of doing things, yet > > trying to cooperatively use hundreds of T of storage. > > the two I mention above are both used in conjunction with > distributed map/reduce calculations. Calculations are done > on the nodes where the data is stored...
Hadoop and GFS are good examples and they work great for the single distributed application that is *written* with them in mind. Unfortunately, I can not stretch my imagination hard enough to see them as general purpose filesystems backing up data for gazillions of non-cooperative applications. The sort of thing NFS and AFS were built to accomplish. In that respect, ceph is more what I have in mind: it assembles storage from clusters of unrelated OSDs into a a hierarchy with a single point of entry for every user/application. The question, however, is how to avoid the complexity of ceph and still have it look like a humongous kenfs or fossil from the outside. Thanks, Roman.