Re: Order Preserving Partitioner

Jonathan Shook Wed, 26 May 2010 11:59:11 -0700

I don't think that queries on a key range are valid unless you are using OPP.
As far as hashing the key for OPP goes, I take it to be the same a not
using OPP. It's really a matter of where it gets done, but it has much
the same effect.
(I think)


Jonathan

On Wed, May 26, 2010 at 12:51 PM, Peter Hsu <pe...@motivecast.com> wrote:
> Correct me if I'm wrong here.  Even though you can get your results with
> Random Partitioner, it's a lot less efficient if you're going across
> different machines to get your results.  If you're doing a lot of range
> queries, it makes sense to have things ordered sequentially so that if you
> do need to go to disk, the reads will be faster, rather than lots of random
> reads across your system.
> It's also my understanding that if you go with the OPP, you could hash your
> key yourself using md5 or sha-1 to effectively get random partitioning.  So
> it's a bit of a pain, but not impossible to do a split between OPP and RP
> for your different columnfamily/keyspaces.
> On May 26, 2010, at 2:32 AM, David Boxenhorn wrote:
>
> Just in case you don't know: You can do range searches on keys even with
> Random Partitioner, you just won't get the results in order. If this is good
> enough for you (e.g. if you can order the results on the client, or if you
> just need to get the right answer, but not the right order), then you should
> use Random Partitioner.
>
> (I bring this up because it confused me until recently.)
>
> On Wed, May 26, 2010 at 5:14 AM, Steve Lihn <stevel...@gmail.com> wrote:
>>
>> I have a question on using Order Preserving Partitioner.
>>
>> Many rowKeys in my system will be related to dates, so it seems natural to
>> use Order Preserving Partitioner instead of the default Random Partitioner.
>> However, I have been warned that special attention has to be applied for
>> Order Preserving Partitioner to work properly (basically to ensure a good
>> key distribution and avoid "hot spot") and reverting it back to Random may
>> not be easy. Also not every rowKey is related to dates, for these, using
>> Random Partitioner is okay, but there is only one place to set Partitioner.
>>
>> (Note: The intension of this warning is actually to discredit Cassandra
>> and persuade me not to use it.)
>>
>> It seems the choice of Partitioner is defined in the storage-conf.xml and
>> is a global property. My question why does it have to be a global property?
>> Is there a future plan to make it customizable per KeySpace (just like you
>> would choose hash or range partition for different table/data in RDBMS) ?
>>
>> Thanks,
>> Steve
>
>
>

Re: Order Preserving Partitioner

Reply via email to