Hi, We are interested in a multi-tenancy environment, that may consist of up to hundreds of data centers. The current design requires cross rack and cross DC replication. Specifically, the per-tenant CFs will be replicated 6 times: in three racks, with 2 copies inside a rack, the racks will be located in at least two different DCs. In the future other replication policies will be considered. The application will decide where (which racks and DC) to place each tenant's replicas. and it might be that one rack can hold more than one tenant.
Separating each tenant in a different keyspace, as was suggested in previous mail thread in this subject, seems to be a good approach (assuming the memtable problem will be solved somehow). But then we had concern with regard to the cluster size. and here are my questions: 1) Given the above, should I define one Cassandra cluster that hold all the DCs? sounds not reasonable given hundreds DCs tens of servers in each DC etc. Where is the bottleneck here? keep-alive messages, the gossip, request routing? what is the largest number of servers a cluster can bear? 2) Now assuming that I can create the per-tenant keyspace only for the servers that in the three racks where the replicas are held, does such definition reduces the messaging transfer among the other servers. Does Cassandra optimizes the message transfer in such case? 3) Additional possible solution was to create a separate clusters per each tenant. But it can cause a situation where one server has to run two or more Cassandra's clusters. Can we run more than one cluster in parallel, does it means two cassandra daemons / instances on one server? what will be the overhead? do you have a link that explains how to deal with it? Please can you help me to decide which of these solution can work or you are welcome to suggest something else. Thanks a lot, Mimi