Re: What is wrong in this token function

Jack Krupansky Thu, 10 Mar 2016 14:28:58 -0800

>From the doc: "When using the RandomPartitioner or Murmur3Partitioner,
Cassandra rows are ordered by the hash of their value and hence the order
of rows is not meaningful... The ByteOrdered partitioner arranges tokens
the same way as key values, but the RandomPartitioner and
Murmur3Partitioner distribute tokens in a completely unordered manner. The
token function makes it possible to page through these unordered
partitioner results."


See:
https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html (for 2.1)
https://docs.datastax.com/en/cql/3.3/cql/cql_using/usePaging.html (for 2.2
and 3.x)


-- Jack Krupansky

On Thu, Mar 10, 2016 at 5:14 PM, Rakesh Kumar <dcrunch...@aim.com> wrote:

> I am using default Murmur3.  So are you saying in case of Murmur3 the
> following two queries
>
> select count*)
> where customer_id = '289'
> and event_time >= '2016-03-01 18:45:00+0000' and event_time <= '2016-03-12
> 19:05:00+0000'   ;
> and
> select count(*)
> where token(customer_id,event_time) >= token('289','2016-03-01
> 18:45:00+0000')
> and token(customer_id,event_time) <= token('289','2016-03-12
> 19:05:00+0000')  ;
>
> are not same ?
>
> And yes I am aware of how to change the clustering_key to get the first
> query. This question is more of academic exercise for me.
>
>
> -----Original Message-----
> From: Jack Krupansky <jack.krupan...@gmail.com>
> To: user <user@cassandra.apache.org>
> Sent: Thu, Mar 10, 2016 4:55 pm
> Subject: Re: What is wrong in this token function
>
> What partitioner are you using? The default partitioner is not "ordered",
> so it will randomly order the hashes/tokens, so that tokens will not be
> ordered even if your PKs are ordered. You probably want to use customer as
> your partition key and event time as a clustering column - then you can use
> RDBMS-like WHERE conditions to select a slice of the partition.
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 4:45 PM, Rakesh Kumar <dcrunch...@aim.com> wrote:
>
>>
>> typo: the primary key was (customer_id + event_time )
>>
>>
>> -----Original Message-----
>> From: Rakesh Kumar <dcrunch...@aim.com>
>> To: user <user@cassandra.apache.org>
>> Sent: Thu, Mar 10, 2016 4:44 pm
>> Subject: What is wrong in this token function
>>
>> C*  3.0.3
>>
>> I have a table table1 which has the primary key on
>> ((customer_id,event_id)).
>>
>> I loaded 1.03 million rows from a csv file.
>>
>> Business case: Show me all events for a given customer in a given time
>> frame
>>
>> In RDBMS it will be
>>
>> (Query1)
>> where customer_id = '289'
>> and event_time >= '2016-03-01 18:45:00+0000' and event_time <=
>> '2016-03-12 19:05:00+0000'   ;
>>
>> But C* does not allow >= <= on PKY cols. It suggested token function.
>>
>> So I did this:
>>
>> (Query2)
>> where token(customer_id,event_time) >= token('289','2016-03-01
>> 18:45:00+0000')
>> and token(customer_id,event_time) <= token('289','2016-03-12
>> 19:05:00+0000')  ;
>>
>> I am seeing 75% more rows than what it should be. It should be 99K rows,
>> it shows 163K.
>>
>> I checked the output with the csv file itself.  To double check I loaded
>> the csv in another table
>> with modified PKY so that the first query (Query1) can be executed. It
>> also showed 99K rows.
>>
>> Am I using token function incorrectly ?
>>
>>
>>
>>
>

Re: What is wrong in this token function

Reply via email to