On Mon, Aug 1, 2011 at 6:08 PM, Yang <teddyyyy...@gmail.com> wrote: > for example my data consists of "salary", "office stationery list", > > let's say I do use the same replicationStrategy for them, these 2 > data sets have > different key ranges, key distributions, > > then is it better to use separate keyspaces for each of them? or use a > single one? > > the factors I can think of: > separate: have to call set_keyspace() if your calls switch between datasets > potential to change to different replication factor in > the future > > any thoughts? > > Thanks a lot > Yang >
Ah interesting question. In the old days operations a operations like get() took keyspace as the first string argument. Now changing keyspace requires running setKeyspace(String) which is an extra RPC operation. If you want to interact with two keyspaces you either need to keep two connection pools open, or you have to use an RPC call every time you want to change keyspaces. While the smaller signature for the get() is nice having the extra RPC call is not good. However as you mentioned you can only apply different replication factors on the keyspace level. That is nice especially if you find one column family is not as important as another. Since a keyspace is a folder you can also mount a keyspace on a different physical device. I still like one column family per keyspace, but having N connection pools for N keyspaces complicates things.