If you are staring out small one logical/physical cluster is probably the best and only approach.
Long term this is very case by case dependent but I generally believe Cluster per Application is the best approach. Although I consider it "Cluster per QOS" For our use cases I find that two applications have very different data sizes and quality of service requirements. For example, one application may have a small dataset size and a high repeated read/ cache hit rate scenario. While another application may have a large sparse dataset and a "random read pattern". Also one application may demand fast < 3 ms reads while the other may find 10 or 20 ms reads acceptable. When those two applications are placed on the same set of hardware you end up scaling them both even though at a given time only one or the other needs to be scaled. In extreme cases application 1 and 2 cause contention and make each other unhappy. What is best to do is architect your systems in such a way that moving an individual column family to a new set of hardware is not difficult. This might involve something map reduce program that can bulk load existing data between two clusters, while your front end application can send the write/updates/deletes to both the old an the new cluster. Also make sure your application does not have too many hard coded touch points that assume a single cluster. As you mentioned one thing gained from keeping everything in the same keyspace is connection pooling. However unlike a RDBMS world where coordinated transactions have to happen in order, etc, etc that is not the case with C* so getting all data into the same physical "system" is not as important. On Wed, Aug 22, 2012 at 8:25 AM, Hiller, Dean <dean.hil...@nrel.gov> wrote: > Just an opinion here as we are having to do this ourselves loading tons of > researchers datasets into one clusters. We are going the path of one > keyspace as it makes it easier if you ever want to mine the data so you don't > have to keep building different clients for another keyspace. We ended up > adding our own security layer as well so researchers can expose their > datasets to other researchers and once exposed, other researchers can join > that data with their existing data. > > This of course is just one use case, but if 10 applications use cassandra, > you still may find a benefit in having an 11th data mining app look at the > data from all 10 apps. > > Later, > Dean > > playOrm Developer > > From: Ersin Er <ersin...@gmail.com<mailto:ersin...@gmail.com>> > Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Date: Wednesday, August 22, 2012 12:44 AM > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Subject: Cluster per Application vs. Multi-Application Clusters > > Hi all, > > What are the advantages of allocating a cluster for a single application vs > running multiple applications on the same cassandra cluster? Is any of the > models suggested over the other? > > Thanks. > > -- > Ersin Er