You can change the table to support Multi-column slice restrictions CREATE TABLE routes ( start text, end text, year int, month int, day int, PRIMARY KEY (start, end, year, month, day) );
Then using Multi-column slice restrictions you can query: SELECT * from routes where start = 'New York' and end = 'Washington' and (year,month,day) >= (2016,1,1) and (year,month,day) <= (2016,1,31); For more details about Multi-column slice restrictions read http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause Oded From: Chris Martin [mailto:ch...@cmartinit.co.uk] Sent: Thursday, March 24, 2016 9:40 AM To: user@cassandra.apache.org Subject: Re: DataModelling to query date range Ah- that looks interesting! I'm actaully still on cassandra 2.x but I was planning on updgrading anyway. Once I do so I'll check this one out. Chris On Thu, Mar 24, 2016 at 2:57 AM, Henry M <henrymanm...@gmail.com<mailto:henrymanm...@gmail.com>> wrote: I haven't tried the new SASI indexer but it may help: https://github.com/apache/cassandra/blob/trunk/doc/SASI.md On Wed, Mar 23, 2016 at 2:08 PM, Chris Martin <ch...@cmartinit.co.uk<mailto:ch...@cmartinit.co.uk>> wrote: Hi all, I have a table that represents a train timetable and looks a bit like this: CREATE TABLE routes ( start text, end text, validFrom timestamp, validTo timestamp, PRIMARY KEY (start, end, validFrom, validTo) ); In this case validFrom is the date that the route becomes valid and validTo is the date that the route that stops becoming valid. If this was SQL I could write a query to find all valid routes between New York and Washington from Jan 1st 2016 to Jan 31st 2016 using something like: SELECT * from routes where start = 'New York' and end = 'Washington' and validFrom <= 2016-01-31 and validTo >= 2016-01-01. As far as I can tell such a query is impossible with CQL and my current table structure. I'm considering running a query like: SELECT * from routes where start = 'New York' and end = 'Washington' and validFrom <= 2016-01-31 And then filtering the rest of the data app side. This doesn't seem ideal though as I'm going to end up fetching much more data (probably around an order of magnitude more) from Cassandra than I really want. Is there a better way to model the data? thanks, Chris