Re: How expensive are additional keyspaces?

Peter Lin Tue, 11 Mar 2014 09:07:06 -0700

I couldn't resist responding.

Having done some experiments with lots of keyspaces and purposely created
lots of keyspaces versus 1 keyspace, the only good reasons I see for many
keyspaces


1. each keyspaces needs a different replication factor. Even in this case,
I personally can't justify having hundreds of different replication factor
settings. Beyond replication factor of 4, my bias take is the highest
number would be the number of datacenters and 1 for local workstation
development

2. using keyspaces to logically organize schema to support things like
multi-tenant applications

I'm sure there are other valid reasons, but those are the ones that come to
my mind.


On Tue, Mar 11, 2014 at 11:58 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> The mathematical overhead is one thing. I would guess if you tried some
> design with 10,000 keyspaces and then you ran into a bug/performance
> problem the first thing someone would say to you is "WTF do you have that
> many keyspaces" :) Don't let that be you.
>
>
>
> On Tue, Mar 11, 2014 at 11:38 AM, Jeremiah D Jordan <
> jeremiah.jor...@gmail.com> wrote:
>
>> Also, in terms of overhead, server side the overhead is pretty much all
>> at the Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the
>> same as 1 keyspace, 100 CF's.
>>
>> -Jeremiah
>>
>> On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan <
>> jeremiah.jor...@gmail.com> wrote:
>>
>> The use of more than one keyspace is not uncommon.  Using 100's of them
>> is.  That being said, different keyspaces let you specify different
>> replication and different authentication.  If you are not going to be doing
>> one of those things, then there really is no point to multiple keyspaces.
>>  If you do want to do one of those things, then go for it, make multiple
>> keyspaces.
>>
>>
>> -Jeremiah
>>
>> On Mar 11, 2014, at 10:17 AM, Edward Capriolo <edlinuxg...@gmail.com>
>> wrote:
>>
>> I am not sure. As stated the only benefit of multiple keyspaces is if you
>> need:
>>
>> 1) different replication per keyspace
>> 2) different multiple data center configurations per keyspace
>>
>> Unless you have one of these cases you do not need to do this. I would
>> always tackle this problem at the application level using something like:
>>
>>
>> http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
>>
>> Client issues aside, it is not a very common case and I would advice
>> against uncommon set ups.
>>
>>
>>
>> On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright <kwri...@nanigans.com>wrote:
>>
>>> Does this whole true for the native protocol?  I've noticed that you can
>>> create a session object in the datastax driver without specifying a
>>> keyspace and so long as you include the keyspace in all queries instead of
>>> just table name, it works fine.  In that case, I assume there's only one
>>> connection pool for all keyspaces.
>>>
>>> From: Edward Capriolo <edlinuxg...@gmail.com>
>>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> Date: Tuesday, March 11, 2014 at 11:05 AM
>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> Subject: Re: How expensive are additional keyspaces?
>>>
>>> The biggest expense of them is that you need to be authenticated to a
>>> keyspace to perform and operation. Thus connection pools are bound to
>>> keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
>>> If you have 100 keyspaces you need 100 connection pools that starts to be a
>>> pain very quickly.
>>>
>>> I suggest keeping everything in one keyspace unless you really need
>>> different replication factors and or network replication settings per
>>> keyspace.
>>>
>>>
>>> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <elreydet...@gmail.com>wrote:
>>>
>>>> Hey all -
>>>>
>>>> My company is working on introducing a configuration service system to
>>>> provide cofig data to several of our applications, to be backed by
>>>> Cassandra. We're already using Cassandra for other services, and at
>>>> the moment our pending design just puts all the new tables (9 of them,
>>>> I believe) in one of our pre-existing keyspaces.
>>>>
>>>> I've got a few questions about keyspaces that I'm hoping for input on.
>>>> Some Google hunting didn't turn up obvious answers, at least not for
>>>> recent versions of Cassandra.
>>>>
>>>> 1) What trade offs are being made by using a new keyspace versus
>>>> re-purposing an existing one (that is in active use by another
>>>> application)? Organization is the obvious answer, I'm looking for any
>>>> technical reasons.
>>>>
>>>> 2) Is there any per-keyspace overhead incurred by the cluster?
>>>>
>>>> 3) Does it impact on-disk layout at all for tables to be in a
>>>> different keyspace from others? Is any sort of file fragmentation
>>>> potentially introduced just by doing this in a new keyspace as opposed
>>>> to an exiting one?
>>>>
>>>> 4) Does it add any metadata overhead to the system keyspace?
>>>>
>>>> 5) Why might we *not* want to make a separate keyspace for this?
>>>>
>>>> 6) Does anyone have experience with creating additional keyspaces to
>>>> the point that Cassandra can no longer handle it? Note that we're
>>>> *not* planning to do this, I'm just curious.
>>>>
>>>> Cheers,
>>>> Martin
>>>>
>>>
>>>
>>
>>
>>
>

Re: How expensive are additional keyspaces?

Reply via email to