On 15 April 2010 02:42, Zhuguo Shi <bluefl...@gmail.com> wrote:

> Hi,
>
> Cassandra has a good distributed model: decentralized, auto-partition,
> auto-recovery. I am evaluating about writing a file system over Cassandra
> (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if
> Cassandra is good at such use case?
>

I have considered this too.

I think a FUSE-based filesystem could be made to work over Cassandra;
initially it could be limited to storing small files (<500M for example) so
that we could put the entire file contents in one row.

However a lot of operations are difficult to do no matter how you design it,
especially renames (e.g. what happens if two nodes rename different files to
the same name).

Also the filesystem would not have POSIX conformity, however, would probably
be able to produce some behaviour which was useful to most applications in
most cases (think of straightforward document management, uploaded image
storage, quarantine storage etc).

Eventual consistency would mean that things which are conventionally atomic
in POSIX, would not be (e.g. rename) and the user (application) would need
to tolerate this.

Depending on how you constructed it, it could be easy to "lose" files which
continued to be stored, but no longer appears in the filesystem (broken
link) which then could not be efficiently garbage collected - the typical
case would be where a file was not completely created (a client node failed)
or where two files were renamed to the same name (one would be lost, but
might not get marked as deleted in Cassandra). This would cause a resource
leak.

If you can work around these problems, it would be an attractive option for
many types of application.

Mark

Reply via email to