I would point you to this article, it does a good job describing OPP and pretty much answers the specific questions you asked.
http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ -Eric On Mon, Jun 13, 2011 at 5:06 PM, AJ <a...@dude.podzone.net> wrote: > I'm just becoming aware of the restrictions of using an OPP as compared to > Random. Please let me know if I understand this correctly. > > First off, if using the OPP only for an increased performance of range > queries, then it will probably be very hard to predict if you will end up > with hotspots or not and thus where and even how the data may be clustered > together in a particular node. This is because all the various keys of the > various CFs may or may not have any correlation with one another. So, in > effect, you just have a big mess of keys of various ranges and formats, but > they all are partitioned according to one global set of tokens that apply to > ALL CFs of ALL keyspaces. > > [main reason for post below...] > OTOH, if you want to use OPP to purposely cluster certain data together on > specific nodes, such as for geographic partitioning, then you have to choose > a prefix for all of the keys of ALL CFs and ALL keyspaces! This is because > they will all be partitioned based on the tokens assigned to the nodes. > IOW, if I had two datacenters, one in the US and another in Europe, then > for all rows in all KSs and in all CFs, I would need to prepend a prefix to > the keys, such as "US:" and "EU:". The problem is I may not want ALL of my > CFs to be partitioned this way; only specific ones. Also, it may be very > difficult if not impossible for all keys of all keyspaces and CFs to use > keys of this form. I'm not sure if Cass is designed for this. > > However, if using the random partitioner, then there is no problem. You can > use any key of any type you want (UTF8, Long, etc.) since they are all > hashed before deciding which node gets the key/row. > > Do I understand things correctly or am I missing something? Is Cass > designed to use OPP this way or am I hacking it? If so, is there an > acceptable way to do geographic partitioning? > > Also, what is OPP really good for? > > Thanks! >