We currently run a commodity cluster that supports a few petabytes of data. 
Each node in the cluster has 4 drives, currently mounted as /0 through /3. We 
have been researching alternatives for managing the storage, Ceph being one 
possibility, iRODS being another. For preservation purposes, we would like each 
file to exist as one whole piece per drive (as opposed to being striped across 
multiple drives). It appears this is the default in Ceph.

Now, it has always been convenient for us to run distributed jobs over SSH to, 
for instance, compile a list of checksums of all files in the cluster:

dsh -Mca 'find /{0..3}/items -name \*.warc.gz | xargs md5sum 
>/tmp/$HOSTNAME.md5sum'

And that nicely allows each node to process its own files using the local CPU.

Would this scenario still be possible where Ceph is managing the storage?

Thanks in advance for any feedback.

Youssef Eldakar
Bibliotheca Alexandrina
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to