2. do we have anybody successfully managing that much storage that is also spread across the nodes? And if so, what's the best practices out there to make the client not worry about where does the storage actually come from (IOW, any kind of proxying of I/O, etc)
http://labs.google.com/papers/gfs.html http://hadoop.apache.org/common/docs/current/hdfs_design.html
I'm trying to see how the life after NFSv4 or AFS might look like for the clients still clinging to the old ways of doing things, yet trying to cooperatively use hundreds of T of storage.
the two I mention above are both used in conjunction with distributed map/reduce calculations. Calculations are done on the nodes where the data is stored...
Thanks, Roman.
Tim Newsham http://www.thenewsh.com/~newsham/