I would suggest you build one cluster, using all your nodes, and create one keyspace for all users.
There are lots of reasons, here a few: * many nodes in a single clusters spreads the load and gives you fault tolerance. * read and write requests can be distributed in a many node cluster. * cassandra caches and os level file caches will shared * cassandra does not suffer from locking and contention during reads and writes * you can prefix row keys to create "virtual keyspaces" Hope that helps. Aaron ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/04/2012, at 4:33 AM, Trevor Francis wrote: > We are launching a data-intensive application that will store in upwards of > 50 million 150-byte records per day per user. We have identified Cassandra as > our database technology and Flume as what we will use to seed the data from > log files into the database. > > Each user is given their own server instance, but the schema of the data for > each user will be the same. > > We will be performing realtime analysis on this information as part of our > application and was considering the advantages/disadvantages of all users > using the same keyspace. All data will be treated the same as far as > replication factor and the only difference is we won't be displaying one > user's info to another user. They will be compartmentalized and one user's > data will not affect or ever be compared against another user. > > Conceptualize this as a each user has their own Apache server and that server > spits out 50 million records per day and each user will only be analyzing the > data for their particular server, not anyone elses. The log formats are > exactly the same. > > My experience lies in relational databases and not key-value stores, like > Cassandra. So, in the mysql world we would put each user in their own > database to avoid the locking contention and to make queries faster. > > If we don't post info into different keyspaces, i assume we will have to add > an additional field to our records to identify the user that owns that > particular record. How does a single large Keyspace affect query speed, etc. > etc. > > > > Trevor Francis > >