Re: What is wrong in this token function

Rakesh Kumar Thu, 10 Mar 2016 15:13:40 -0800

thanks. that explains it.

-----Original Message-----
From: Jack Krupansky <jack.krupan...@gmail.com>
To: user <user@cassandra.apache.org>
Sent: Thu, Mar 10, 2016 5:28 pm
Subject: Re: What is wrong in this token function

>From the doc: "When using the RandomPartitioner or Murmur3Partitioner, 
>Cassandra rows are ordered by the hash of their value and hence the order of 
>rows is not meaningful... The ByteOrdered partitioner arranges tokens the same 
>way as key values, but the RandomPartitioner and Murmur3Partitioner distribute 
>tokens in a completely unordered manner. The token function makes it possible 
>to page through these unordered partitioner results."

See:
https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html (for 2.1)

https://docs.datastax.com/en/cql/3.3/cql/cql_using/usePaging.html (for 2.2 and 
3.x)

-- Jack Krupansky

On Thu, Mar 10, 2016 at 5:14 PM, Rakesh Kumar <dcrunch...@aim.com> wrote:

I am using default Murmur3.  So are you saying in case of Murmur3 the following 
two queries

select count*)

where customer_id = '289'
and event_time >= '2016-03-01 18:45:00+0000' and event_time <= '2016-03-12 
19:05:00+0000'   ;
and
select count(*)

where token(customer_id,event_time) >= token('289','2016-03-01 18:45:00+0000')
and token(customer_id,event_time) <= token('289','2016-03-12 19:05:00+0000')  ;

are not same ?

And yes I am aware of how to change the clustering_key to get the first query. 
This question is more of academic exercise for me.

-----Original Message-----
From: Jack Krupansky <jack.krupan...@gmail.com>
To: user <user@cassandra.apache.org>

Sent: Thu, Mar 10, 2016 4:55 pm
Subject: Re: What is wrong in this token function

What partitioner are you using? The default partitioner is not "ordered", so it 
will randomly order the hashes/tokens, so that tokens will not be ordered even 
if your PKs are ordered. You probably want to use customer as your partition 
key and event time as a clustering column - then you can use RDBMS-like WHERE 
conditions to select a slice of the partition.

-- Jack Krupansky

On Thu, Mar 10, 2016 at 4:45 PM, Rakesh Kumar <dcrunch...@aim.com> wrote:

typo: the primary key was (customer_id + event_time )

-----Original Message-----
From: Rakesh Kumar <dcrunch...@aim.com>
To: user <user@cassandra.apache.org>
Sent: Thu, Mar 10, 2016 4:44 pm
Subject: What is wrong in this token function

C*  3.0.3

I have a table table1 which has the primary key on ((customer_id,event_id)).

I loaded 1.03 million rows from a csv file.

Business case: Show me all events for a given customer in a given time frame

In RDBMS it will be

(Query1)

where customer_id = '289'
and event_time >= '2016-03-01 18:45:00+0000' and event_time <= '2016-03-12 
19:05:00+0000'   ;

But C* does not allow >= <= on PKY cols. It suggested token function.

So I did this:

(Query2)

where token(customer_id,event_time) >= token('289','2016-03-01 18:45:00+0000')
and token(customer_id,event_time) <= token('289','2016-03-12 19:05:00+0000')  ;

I am seeing 75% more rows than what it should be. It should be 99K rows, it 
shows 163K.

I checked the output with the csv file itself.  To double check I loaded the 
csv in another table
with modified PKY so that the first query (Query1) can be executed. It also 
showed 99K rows.

Am I using token function incorrectly ?

Re: What is wrong in this token function

Reply via email to