Hi Vidur,

I had a go at your solution but the problem is that it doesn't match routes
which are valid all throughtout the range queried.  For example if I have
 route that is valid for all of Jan 2016. I will have a table that looks
something like this:

start           | end                | valid
New York   Washington     2016-01-01
New York   Washington     2016-01-31

So if I query for ranges that have at least one bound outside Jan (e.g Jan
15 - Feb 15) then the query you gave will work fine.  If, however, I query
for a range that is completely inside Jan e.g all routes valid on Jan 15th,
 The I think I'll end up with a query like:

SELECT * from routes where start = 'New York' and end = 'Washington'
and valid <= 2016-01-15 and valid >= 2016-01-15.

which will return 0 results as it would only match routes that have a valid
of 2016-01-15 exactly.

 thanks,

Chris


On Wed, Mar 23, 2016 at 11:19 PM, Vidur Malik <vi...@shopkeep.com> wrote:

> Flip the problem over. Instead of storing validTo and validFrom, simply
> store a valid field and partition by (start, end). This may sound wasteful,
> but disk is cheap:
>
> CREATE TABLE routes (
> start text,
> end text,
> valid timestamp,
> PRIMARY KEY ((start, end), valid)
> );
>
> Now, you can execute something like:
>
> SELECT * from routes where start = 'New York' and end = 'Washington' and 
> valid <= 2016-01-31 and valid >= 2016-01-01.
>
>
> On Wed, Mar 23, 2016 at 5:08 PM, Chris Martin <ch...@cmartinit.co.uk>
> wrote:
>
>> Hi all,
>>
>> I have a table that represents a train timetable and looks a bit like
>> this:
>>
>> CREATE TABLE routes (
>> start text,
>> end text,
>> validFrom timestamp,
>> validTo timestamp,
>> PRIMARY KEY (start, end, validFrom, validTo)
>> );
>>
>> In this case validFrom is the date that the route becomes valid and
>> validTo is the date that the route that stops becoming valid.
>>
>> If this was SQL I could write a query to find all valid routes between
>> New York and Washington from Jan 1st 2016 to Jan 31st 2016 using something
>> like:
>>
>> SELECT * from routes where start = 'New York' and end = 'Washington' and 
>> validFrom <= 2016-01-31 and validTo >= 2016-01-01.
>>
>> As far as I can tell such a query is impossible with CQL and my current
>> table structure.  I'm considering running a query like:
>>
>> SELECT * from routes where start = 'New York' and end = 'Washington' and 
>> validFrom <= 2016-01-31
>>
>> And then filtering the rest of the data app side.  This doesn't seem
>> ideal though as I'm going to end up fetching much more data (probably
>> around an order of magnitude more) from Cassandra than I really want.
>>
>> Is there a better way to model the data?
>>
>> thanks,
>>
>> Chris
>>
>>
>>
>>
>
>
> --
>
> Vidur Malik
>
> [image: ShopKeep] <http://www.shopkeep.com>
>
> 800.820.9814
> <8008209814>
> [image: ShopKeep] <https://www.facebook.com/ShopKeepPOS> [image: ShopKeep]
> <https://twitter.com/shopkeep> [image: ShopKeep]
> <https://instagram.com/shopkeep/>
>

Reply via email to