Is this the proper use of OPP?

AJ Mon, 13 Jun 2011 14:07:30 -0700

I'm just becoming aware of the restrictions of using an OPP as comparedto Random. Please let me know if I understand this correctly.

First off, if using the OPP only for an increased performance of rangequeries, then it will probably be very hard to predict if you will endup with hotspots or not and thus where and even how the data may beclustered together in a particular node. This is because all thevarious keys of the various CFs may or may not have any correlation withone another. So, in effect, you just have a big mess of keys of variousranges and formats, but they all are partitioned according to one globalset of tokens that apply to ALL CFs of ALL keyspaces.


[main reason for post below...]

OTOH, if you want to use OPP to purposely cluster certain data togetheron specific nodes, such as for geographic partitioning, then you have tochoose a prefix for all of the keys of ALL CFs and ALL keyspaces! Thisis because they will all be partitioned based on the tokens assigned tothe nodes. IOW, if I had two datacenters, one in the US and another inEurope, then for all rows in all KSs and in all CFs, I would need toprepend a prefix to the keys, such as "US:" and "EU:". The problem is Imay not want ALL of my CFs to be partitioned this way; only specificones. Also, it may be very difficult if not impossible for all keys ofall keyspaces and CFs to use keys of this form. I'm not sure if Cass isdesigned for this.

However, if using the random partitioner, then there is no problem. Youcan use any key of any type you want (UTF8, Long, etc.) since they areall hashed before deciding which node gets the key/row.

Do I understand things correctly or am I missing something? Is Cassdesigned to use OPP this way or am I hacking it? If so, is there anacceptable way to do geographic partitioning?


Also, what is OPP really good for?

Thanks!

Is this the proper use of OPP?

Reply via email to