Hi.

So, I am interested in using Cassandra not because of large amount of data,
but because of following reasons.

1) It's easy to administrate and handle fail-over (and scale, of course)
2) Easy to write an application that makes sense to developers (Developers'
fully in control of how data is orchestrated - indexed, queried, etc..)
3) Easy to expand an application to some extend - as long as changes only
applies to adding /removing new column (not column family..)

Are these good enough reasons to start experimenting with Cassandra as a
general purpose data store? Or Cassandra, or any NOSQL solution really makes
no sense if you don't have or expect to have TB of data?

For bullet 3) above.. If I have 100 nodes that runs Cassandra, and want to
add a new table (..ColumnFamily) does that mean I have to update storage.xml
on all 100 nodes and restart them? For example, if user wants me to add a
capability to sort "stuff" in ways that I haven't supported yet, I might
have to do following.

1. Create a new ColumnFamily that orders "stuff" based on a new foreign key
currently stored inside one of column for "stuff".
2. Populate this new ColumnFamily based on all "stuff" records currently
exist.
3. Update application that access this new ColumnFamily for new sort
options.
4. Update application so that everytime "stuff" is added or removed, also
update this new ColumnFamily.
5. Update the storage.xml on ALL nodes in the cluster and restart them!

If I use a regular DB, I only have to do 3.. Does this mean, unless I have
some *very* stable application that no such user requirement could happen, I
should stick to using a regular DB? If this is the case, Cassandra only
makes sense in some special case where size of the data simply does not work
for regular DB (meaning - if data size is not an issue stick to regular DB).

Thanks,
Soichi

Reply via email to