The overhead for column families was greatly reduced in 0.8 and 1.0. It should now be possible to have hundreds or thousands of column families. The setting 'memtable_total_space_in_mb' was introduced that allows for a global memtable threshold, and cassandra will handle flushing on its own.
See http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management Another thing you should consider is the lack of built in access controls. There is an authentication/authorization interface you can plug in to and examples in the examples/ directory of the source download. On Wed, Dec 21, 2011 at 10:36 AM, Ryan Lowe <ryanjl...@gmail.com> wrote: > What we have done to avoid creating multiple column families is to sort of > namespace the row key. So if we have a column family of Users and accounts: > "AccountA" and "AccountB", we do the following: > > Column Family User: > "AccountA/ryan" : { first: Ryan, last: Lowe } > "AccountB/ryan" : { first: Ryan, last: Smith} > > etc. > > For our needs, this did the same thing as having 2 "User" column families > for "AccountA" and "AccountB" > > Ryan > > > On Wed, Dec 21, 2011 at 10:34 AM, Flavio Baronti <f.baro...@list-group.com> > wrote: >> >> Hi, >> >> based on my experience with Cassandra 0.7.4, i strongly discourage you to >> do that: we tried dynamical creation of column families, and it was a >> nightmare. >> First of all, the operation can not be done concurrently, therefore you >> must find a way to avoid parallel creation (over all the cluster, not in a >> single node). >> The main problem however is with timestamps. The structure of your >> keyspace is versioned with a time-dependent id, which is assigned by the >> host where you perform the schema update based on the local machine time. If >> you do two updates in close succession on two different nodes, and their >> clocks are not perfectly synchronized (and they will never be), Cassandra >> might be confused by their relative ordering, and stop working altogether. >> >> Bottom line: don't. >> >> Flavio >> >> Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto: >> >>> Hello, >>> >>> I am evaluating the usage of cassandra for my system. I will have several >>> clients who won't share data with each other. My idea is to create one >>> column family per client. When a new client comes in and adds data to the >>> system, I'd like to create a column family dynamically. Is that reliable? >>> Can I create a column family on a node and imediately add new data on that >>> column family and be confident that the data added will eventually become >>> visible to a read? >>> >>> []'s >>> Rafael >>> >>> >>> >> >