I would point you to this article, it does a good job describing OPP
and pretty much answers the specific questions you asked.

http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

-Eric


On Mon, Jun 13, 2011 at 5:06 PM, AJ <a...@dude.podzone.net> wrote:
> I'm just becoming aware of the restrictions of using an OPP as compared to
> Random.  Please let me know if I understand this correctly.
>
> First off, if using the OPP only for an increased performance of range
> queries, then it will probably be very hard to predict if you will end up
> with hotspots or not and thus where and even how the data may be clustered
> together in a particular node.  This is because all the various keys of the
> various CFs may or may not have any correlation with one another.  So, in
> effect, you just have a big mess of keys of various ranges and formats, but
> they all are partitioned according to one global set of tokens that apply to
> ALL CFs of ALL keyspaces.
>
> [main reason for post below...]
> OTOH, if you want to use OPP to purposely cluster certain data together on
> specific nodes, such as for geographic partitioning, then you have to choose
> a prefix for all of the keys of ALL CFs and ALL keyspaces!  This is because
> they will all be partitioned based on the tokens assigned to the nodes.
>  IOW, if I had two datacenters, one in the US and another in Europe, then
> for all rows in all KSs and in all CFs, I would need to prepend a prefix to
> the keys, such as "US:" and "EU:".  The problem is I may not want ALL of my
> CFs to be partitioned this way; only specific ones.  Also, it may be very
> difficult if not impossible for all keys of all keyspaces and CFs to use
> keys of this form.  I'm not sure if Cass is designed for this.
>
> However, if using the random partitioner, then there is no problem.  You can
> use any key of any type you want (UTF8, Long, etc.) since they are all
> hashed before deciding which node gets the key/row.
>
> Do I understand things correctly or am I missing something?  Is Cass
> designed to use OPP this way or am I hacking it?  If so, is there an
> acceptable way to do geographic partitioning?
>
> Also, what is OPP really good for?
>
> Thanks!
>

Reply via email to