thanks. that explains it.
-----Original Message----- From: Jack Krupansky <jack.krupan...@gmail.com> To: user <user@cassandra.apache.org> Sent: Thu, Mar 10, 2016 5:28 pm Subject: Re: What is wrong in this token function >From the doc: "When using the RandomPartitioner or Murmur3Partitioner, >Cassandra rows are ordered by the hash of their value and hence the order of >rows is not meaningful... The ByteOrdered partitioner arranges tokens the same >way as key values, but the RandomPartitioner and Murmur3Partitioner distribute >tokens in a completely unordered manner. The token function makes it possible >to page through these unordered partitioner results." See: https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html (for 2.1) https://docs.datastax.com/en/cql/3.3/cql/cql_using/usePaging.html (for 2.2 and 3.x) -- Jack Krupansky On Thu, Mar 10, 2016 at 5:14 PM, Rakesh Kumar <dcrunch...@aim.com> wrote: I am using default Murmur3. So are you saying in case of Murmur3 the following two queries select count*) where customer_id = '289' and event_time >= '2016-03-01 18:45:00+0000' and event_time <= '2016-03-12 19:05:00+0000' ; and select count(*) where token(customer_id,event_time) >= token('289','2016-03-01 18:45:00+0000') and token(customer_id,event_time) <= token('289','2016-03-12 19:05:00+0000') ; are not same ? And yes I am aware of how to change the clustering_key to get the first query. This question is more of academic exercise for me. -----Original Message----- From: Jack Krupansky <jack.krupan...@gmail.com> To: user <user@cassandra.apache.org> Sent: Thu, Mar 10, 2016 4:55 pm Subject: Re: What is wrong in this token function What partitioner are you using? The default partitioner is not "ordered", so it will randomly order the hashes/tokens, so that tokens will not be ordered even if your PKs are ordered. You probably want to use customer as your partition key and event time as a clustering column - then you can use RDBMS-like WHERE conditions to select a slice of the partition. -- Jack Krupansky On Thu, Mar 10, 2016 at 4:45 PM, Rakesh Kumar <dcrunch...@aim.com> wrote: typo: the primary key was (customer_id + event_time ) -----Original Message----- From: Rakesh Kumar <dcrunch...@aim.com> To: user <user@cassandra.apache.org> Sent: Thu, Mar 10, 2016 4:44 pm Subject: What is wrong in this token function C* 3.0.3 I have a table table1 which has the primary key on ((customer_id,event_id)). I loaded 1.03 million rows from a csv file. Business case: Show me all events for a given customer in a given time frame In RDBMS it will be (Query1) where customer_id = '289' and event_time >= '2016-03-01 18:45:00+0000' and event_time <= '2016-03-12 19:05:00+0000' ; But C* does not allow >= <= on PKY cols. It suggested token function. So I did this: (Query2) where token(customer_id,event_time) >= token('289','2016-03-01 18:45:00+0000') and token(customer_id,event_time) <= token('289','2016-03-12 19:05:00+0000') ; I am seeing 75% more rows than what it should be. It should be 99K rows, it shows 163K. I checked the output with the csv file itself. To double check I loaded the csv in another table with modified PKY so that the first query (Query1) can be executed. It also showed 99K rows. Am I using token function incorrectly ?