Nick, Assuming I have a tenant that has only one CF, and I am using NetworkAware repliaction strategy where the keys of this CF are replicated 3 times, each copy in a different DC (DC1,DC2,DC3) Now lets assume the cluster holds 5 DCs. As far as I understand only the servers that belong to the three DCs that hold a copy will build this CF's memtable. The servers that belong to the other 2 DCs (DC4,DC5) wont have evidence to these CF nor this keyspace, am I correct?
I have additional more basic question as follows: Is there a way to define two clusters on the same node? Is it by configuration in the storage-conf file or does it means additional Cassandra daemon? Thanks a lot, Miriam On Fri, Feb 18, 2011 at 12:08 PM, Nick Telford <nick.telf...@gmail.com>wrote: > Large numbers of keyspaces/column-families are not a good ideas as each > column-family memtable requires it's own memory. If you have 1000 tenants in > the same cluster, each with only 1 CF, regardless of the cluster size > *every* node will require 1 memtable per tenant CF - 1000 memtables. > > This limitation is the primary reason for workarounds (such as "virtual > keyspaces") to enable multi-tenant setups. > > You might have more luck partitioning tenants in to different clusters, but > then you end up with potential hot-spots (where more active tenants generate > more load on a specific cluster). > > Regards, > Nick > > > On 18 February 2011 09:55, Mimi Aluminium <mimi.alumin...@gmail.com>wrote: > >> Thanks a lot for you suggestions, >> I will check the virtual keyspace solution - btw, currently I am using >> Thrift client with Pycassa, I am not familiar with Hector - does it mean >> we'll need to move to Hector client? >> >> I thought of using keyspaces for each tenant, but I dont understand how to >> define the whole cluster. Meaning, assuming the tenants are distributed >> (replicated) across hundreds of DCs each consists of tens of racks and >> servers, so can I define a single cassandra cluster for all the servers? it >> does not seem to be reasonable , this is the reason I thought of sepearating >> the clusters. Please let me know how would you solve it? >> Thanks, >> Miriam >> >> >> >> On Thu, Feb 17, 2011 at 10:30 PM, Nate McCall <n...@datastax.com> wrote: >> >>> Hector's virtual keyspaces would work well for what you describe. Ed >>> Anuff, who added this feature to Hector, showed me a working >>> multi-tennancy based app the other day and it worked quite well. >>> >>> On Thu, Feb 17, 2011 at 1:44 PM, Norman Maurer <nor...@apache.org> >>> wrote: >>> > Maybe you could make use of "Virtual Keyspaces". >>> > >>> > See this wiki for the idea: >>> > https://github.com/rantav/hector/wiki/Virtual-Keyspaces >>> > >>> > Bye, >>> > Norman >>> > >>> > 2011/2/17 Frank LoVecchio <fr...@isidorey.com>: >>> >> Why not just create some sort of ACL on the client side and use one >>> >> Keyspace? It's a lot less management. >>> >> >>> >> On Thu, Feb 17, 2011 at 12:34 PM, Mimi Aluminium < >>> mimi.alumin...@gmail.com> >>> >> wrote: >>> >>> >>> >>> Hi, >>> >>> I really need your help in this matter. >>> >>> I will try to simplify my problem and ask specific questions >>> >>> >>> >>> I am thinking of solving the multi-tenancy problem by providing a >>> separate >>> >>> cluster per each tenant. Does it sound reasonable? >>> >>> I can end-up with one node belongs to several clusters. >>> >>> Does Cassandra support several clusters per node? Does it mean >>> several >>> >>> Cassandra daemons on each node? Do you recommend doing that ? what is >>> the >>> >>> overhead? is there any link that explain how to do that? >>> >>> >>> >>> Thanks a lot, >>> >>> Mimi >>> >>> >>> >>> >>> >>> On Wed, Feb 16, 2011 at 6:43 PM, Mimi Aluminium < >>> mimi.alumin...@gmail.com> >>> >>> wrote: >>> >>>> >>> >>>> Hi, >>> >>>> We are interested in a multi-tenancy environment, that may consist >>> of up >>> >>>> to hundreds of data centers. The current design requires cross rack >>> and >>> >>>> cross DC replication. Specifically, the per-tenant CFs will be >>> replicated 6 >>> >>>> times: in three racks, with 2 copies inside a rack, the racks will >>> be >>> >>>> located in at least two different DCs. In the future other >>> replication >>> >>>> policies will be considered. The application will decide where >>> (which racks >>> >>>> and DC) to place each tenant's replicas. and it might be that one >>> rack can >>> >>>> hold more than one tenant. >>> >>>> >>> >>>> Separating each tenant in a different keyspace, as was suggested >>> >>>> in previous mail thread in this subject, seems to be a good >>> approach >>> >>>> (assuming the memtable problem will be solved somehow). >>> >>>> But then we had concern with regard to the cluster size. >>> >>>> and here are my questions: >>> >>>> 1) Given the above, should I define one Cassandra cluster that hold >>> all >>> >>>> the DCs? sounds not reasonable given hundreds DCs tens of servers >>> in each >>> >>>> DC etc. Where is the bottleneck here? keep-alive messages, the >>> gossip, >>> >>>> request routing? what is the largest number of servers a cluster can >>> bear? >>> >>>> 2) Now assuming that I can create the per-tenant keyspace only for >>> the >>> >>>> servers that in the three racks where the replicas are held, does >>> such >>> >>>> definition reduces the messaging transfer among the other servers. >>> Does >>> >>>> Cassandra optimizes the message transfer in such case? >>> >>>> 3) Additional possible solution was to create a separate clusters >>> per >>> >>>> each tenant. But it can cause a situation where one server has to >>> run two or >>> >>>> more Cassandra's clusters. Can we run more than one cluster in >>> parallel, >>> >>>> does it means two cassandra daemons / instances on one server? what >>> will be >>> >>>> the overhead? do you have a link that explains how to deal with it? >>> >>>> >>> >>>> Please can you help me to decide which of these solution can work or >>> you >>> >>>> are welcome to suggest something else. >>> >>>> Thanks a lot, >>> >>>> Mimi >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Frank LoVecchio >>> >> Senior Software Engineer | Isidorey, LLC >>> >> Google Voice +1.720.295.9179 >>> >> isidorey.com | facebook.com/franklovecchio | franklovecchio.com >>> >> >>> > >>> >> >> >