Exactly. You can split a file into blocks of any size and you can actually distribute the metadata across a large set of machines. You wouldn't have the issue of having small files in this approach. The issue maybe the eventual consistency - not sure that is a paradigm that would be acceptable for a file system. But that is a discussion for another time/day.
Avinash On Wed, Apr 14, 2010 at 7:15 PM, Ken Sandney <bluefl...@gmail.com> wrote: > Large files can be split into small blocks, and the size of block can be > tuned. It may increase the complexity of writing such a file system, but can > be for general purpose (not only for relative small files) > > > On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta <tsalora...@gmail.com>wrote: > >> On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi <bluefl...@gmail.com> wrote: >> > Hi, >> > Cassandra has a good distributed model: decentralized, auto-partition, >> > auto-recovery. I am evaluating about writing a file system over >> Cassandra >> > (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if >> > Cassandra is good at such use case? >> >> It sort of depends on what you are looking for. From use case for >> which something like S3 is good, yes, except with one difference: >> Cassandra is more geared towards lots of small files, whereas S3 is >> more geared towards moderate number of files (possibly large). >> >> So I think it can definitely be a good use case, and I may use >> Cassandra for this myself in future. Having range queries allows >> implementing directory/path structures (list keys using path as >> prefix). And you can split storage such that metadata could live in >> OPP partition, raw data in RP. >> >> -+ Tatu +- >> > >