On Mon, 2009-08-03 at 21:23 -1000, Tim Newsham wrote:
> >  2. do we have anybody successfully managing that much storage that is
> >     also spread across the nodes? And if so, what's the best practices
> >     out there to make the client not worry about where does the storage
> >     actually come from (IOW, any kind of proxying of I/O, etc)
> 
> http://labs.google.com/papers/gfs.html
> http://hadoop.apache.org/common/docs/current/hdfs_design.html
> 
> > I'm trying to see how the life after NFSv4 or AFS might look like for
> > the clients still clinging to the old ways of doing things, yet
> > trying to cooperatively use hundreds of T of storage.
> 
> the two I mention above are both used in conjunction with
> distributed map/reduce calculations.  Calculations are done
> on the nodes where the data is stored...

Hadoop and GFS are good examples and they work great for the
single distributed application that is *written* with them
in mind.

Unfortunately, I can not stretch my imagination hard enough
to see them as general purpose filesystems backing up data
for gazillions of non-cooperative applications. The sort
of thing NFS and AFS were built to accomplish.

In that respect, ceph is more what I have in mind: it 
assembles storage from clusters of unrelated OSDs into a 
a hierarchy with a single point of entry for every 
user/application.

The question, however, is how to avoid the complexity of
ceph and still have it look like a humongous kenfs or
fossil from the outside. 

Thanks,
Roman.


Reply via email to