I am using default Murmur3. So are you saying in case of Murmur3 the following
two queries
select count*)
where customer_id = '289'
and event_time >= '2016-03-01 18:45:00+0000' and event_time <= '2016-03-12
19:05:00+0000' ;
and
select count(*)
where token(customer_id,event_time) >= token('289','2016-03-01 18:45:00+0000')
and token(customer_id,event_time) <= token('289','2016-03-12 19:05:00+0000') ;
are not same ?
And yes I am aware of how to change the clustering_key to get the first query.
This question is more of academic exercise for me.
-----Original Message-----
From: Jack Krupansky <[email protected]>
To: user <[email protected]>
Sent: Thu, Mar 10, 2016 4:55 pm
Subject: Re: What is wrong in this token function
What partitioner are you using? The default partitioner is not "ordered", so it
will randomly order the hashes/tokens, so that tokens will not be ordered even
if your PKs are ordered. You probably want to use customer as your partition
key and event time as a clustering column - then you can use RDBMS-like WHERE
conditions to select a slice of the partition.
-- Jack Krupansky
On Thu, Mar 10, 2016 at 4:45 PM, Rakesh Kumar <[email protected]> wrote:
typo: the primary key was (customer_id + event_time )
-----Original Message-----
From: Rakesh Kumar <[email protected]>
To: user <[email protected]>
Sent: Thu, Mar 10, 2016 4:44 pm
Subject: What is wrong in this token function
C* 3.0.3
I have a table table1 which has the primary key on ((customer_id,event_id)).
I loaded 1.03 million rows from a csv file.
Business case: Show me all events for a given customer in a given time frame
In RDBMS it will be
(Query1)
where customer_id = '289'
and event_time >= '2016-03-01 18:45:00+0000' and event_time <= '2016-03-12
19:05:00+0000' ;
But C* does not allow >= <= on PKY cols. It suggested token function.
So I did this:
(Query2)
where token(customer_id,event_time) >= token('289','2016-03-01 18:45:00+0000')
and token(customer_id,event_time) <= token('289','2016-03-12 19:05:00+0000') ;
I am seeing 75% more rows than what it should be. It should be 99K rows, it
shows 163K.
I checked the output with the csv file itself. To double check I loaded the
csv in another table
with modified PKY so that the first query (Query1) can be executed. It also
showed 99K rows.
Am I using token function incorrectly ?