I couldn't resist responding. Having done some experiments with lots of keyspaces and purposely created lots of keyspaces versus 1 keyspace, the only good reasons I see for many keyspaces
1. each keyspaces needs a different replication factor. Even in this case, I personally can't justify having hundreds of different replication factor settings. Beyond replication factor of 4, my bias take is the highest number would be the number of datacenters and 1 for local workstation development 2. using keyspaces to logically organize schema to support things like multi-tenant applications I'm sure there are other valid reasons, but those are the ones that come to my mind. On Tue, Mar 11, 2014 at 11:58 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > The mathematical overhead is one thing. I would guess if you tried some > design with 10,000 keyspaces and then you ran into a bug/performance > problem the first thing someone would say to you is "WTF do you have that > many keyspaces" :) Don't let that be you. > > > > On Tue, Mar 11, 2014 at 11:38 AM, Jeremiah D Jordan < > jeremiah.jor...@gmail.com> wrote: > >> Also, in terms of overhead, server side the overhead is pretty much all >> at the Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the >> same as 1 keyspace, 100 CF's. >> >> -Jeremiah >> >> On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan < >> jeremiah.jor...@gmail.com> wrote: >> >> The use of more than one keyspace is not uncommon. Using 100's of them >> is. That being said, different keyspaces let you specify different >> replication and different authentication. If you are not going to be doing >> one of those things, then there really is no point to multiple keyspaces. >> If you do want to do one of those things, then go for it, make multiple >> keyspaces. >> >> >> -Jeremiah >> >> On Mar 11, 2014, at 10:17 AM, Edward Capriolo <edlinuxg...@gmail.com> >> wrote: >> >> I am not sure. As stated the only benefit of multiple keyspaces is if you >> need: >> >> 1) different replication per keyspace >> 2) different multiple data center configurations per keyspace >> >> Unless you have one of these cases you do not need to do this. I would >> always tackle this problem at the application level using something like: >> >> >> http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html >> >> Client issues aside, it is not a very common case and I would advice >> against uncommon set ups. >> >> >> >> On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright <kwri...@nanigans.com>wrote: >> >>> Does this whole true for the native protocol? I've noticed that you can >>> create a session object in the datastax driver without specifying a >>> keyspace and so long as you include the keyspace in all queries instead of >>> just table name, it works fine. In that case, I assume there's only one >>> connection pool for all keyspaces. >>> >>> From: Edward Capriolo <edlinuxg...@gmail.com> >>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>> Date: Tuesday, March 11, 2014 at 11:05 AM >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>> Subject: Re: How expensive are additional keyspaces? >>> >>> The biggest expense of them is that you need to be authenticated to a >>> keyspace to perform and operation. Thus connection pools are bound to >>> keyspaces. Switching a keyspace is an RPC operation. In the thrift client, >>> If you have 100 keyspaces you need 100 connection pools that starts to be a >>> pain very quickly. >>> >>> I suggest keeping everything in one keyspace unless you really need >>> different replication factors and or network replication settings per >>> keyspace. >>> >>> >>> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <elreydet...@gmail.com>wrote: >>> >>>> Hey all - >>>> >>>> My company is working on introducing a configuration service system to >>>> provide cofig data to several of our applications, to be backed by >>>> Cassandra. We're already using Cassandra for other services, and at >>>> the moment our pending design just puts all the new tables (9 of them, >>>> I believe) in one of our pre-existing keyspaces. >>>> >>>> I've got a few questions about keyspaces that I'm hoping for input on. >>>> Some Google hunting didn't turn up obvious answers, at least not for >>>> recent versions of Cassandra. >>>> >>>> 1) What trade offs are being made by using a new keyspace versus >>>> re-purposing an existing one (that is in active use by another >>>> application)? Organization is the obvious answer, I'm looking for any >>>> technical reasons. >>>> >>>> 2) Is there any per-keyspace overhead incurred by the cluster? >>>> >>>> 3) Does it impact on-disk layout at all for tables to be in a >>>> different keyspace from others? Is any sort of file fragmentation >>>> potentially introduced just by doing this in a new keyspace as opposed >>>> to an exiting one? >>>> >>>> 4) Does it add any metadata overhead to the system keyspace? >>>> >>>> 5) Why might we *not* want to make a separate keyspace for this? >>>> >>>> 6) Does anyone have experience with creating additional keyspaces to >>>> the point that Cassandra can no longer handle it? Note that we're >>>> *not* planning to do this, I'm just curious. >>>> >>>> Cheers, >>>> Martin >>>> >>> >>> >> >> >> >