Thanks for the response Peter. I used the temperature table because its the most common example on CQL timeseries and I thought I would reuse it. From some of the responses, looks like I was wrong.
event_time is the time the event happened. So yes it is valid time. I was trying to see if I can get away with not having valid_from and valid_to in Cassandra. transaction_time is the time the database record was written. Let's take an example - INSERT INTO temperature(weatherstation_id,event_time,transaction_time,temperature) VALUES (’1234ABCD’,’2015-02-18 07:01:00′,’2015-02-18 07:01:00′,’72F’); INSERT INTO temperature(weatherstation_id,event_time,transaction_time,temperature) VALUES (’1234ABCD’,’2015-02-18 08:01:00′,’2015-02-18 08:01:00′,’72F’); And I get an update for the first record tomorrow, I want to keep both versions. So I would have - INSERT INTO temperature(weatherstation_id,event_time,transaction_time,temperature) VALUES (’1234ABCD’,’2015-02-18 07:01:00′,’2015-02-*19* 07:01:00′,’72F’); I fundamentally need to execute 2 types of queries - 1. select the latest values for the weatherstation for a given event time period which should ideally just return the first and third record. 2. select the values as of particular transaction time(say ’2015-02-18 08:01:00′), in which case I would expect to return first and second record. About your comment on having valid_time in the keys, do I have a choice in Cassandra, unless you are suggesting to use secondary indexes. I am new to bi-temporal data modeling. So please advise if you think building this on top of Cassandra is a stupid idea. -Rajesh On Sun, Feb 15, 2015 at 10:03 AM, Peter Lin <wool...@gmail.com> wrote: > > I've built several different bi-temporal databases over the year for a > variety of applications, so I have to ask "why are you modeling it this > way?" > > Having a temperatures table doesn't make sense to me. Normally a > bi-temporal database has transaction time and valid time. The transaction > time is the timestamp of when the data is saved and valid time is usually > expressed as effective & expiration time. > > In the example, is event time suppose to be valid time and what is the > granularity (seconds, hours, or day)? > Which kind of queries do you need to query most of the time? > Why is event_time and transaction_time part of the key? > > If you take time to study temporal databases, having valid time in the > keys will cause a lot of headaches. Basically, it's an anti-pattern for > temporal databases. The key should only be the unique identifier and it > shouldn't have transaction or valid time. The use case you describe looks > more like regular time series data and not really bi-temporal. > > I would suggest take time to understand the kinds of queries you need to > run and then change the table. > > peter > > > On Sun, Feb 15, 2015 at 9:00 AM, Jack Krupansky <jack.krupan...@gmail.com> > wrote: > >> I had forgotten, but there is a new tuple notation to iterate over more >> than one clustering column in C* 2.0.6: >> >> https://issues.apache.org/jira/browse/CASSANDRA-4851 >> >> For example, >> >> SELECT ... WHERE (c1, c2) > (1, 0) >> >> There's an example in the CQL spec: >> https://cassandra.apache.org/doc/cql3/CQL.html >> >> >> -- Jack Krupansky >> >> On Sat, Feb 14, 2015 at 6:29 PM, Dave Brosius <dbros...@mebigfatguy.com> >> wrote: >> >>> As you point out, there's not really a node-based problem with your >>> query from a performance point of view. This is a limitation of CQL in >>> that, cql wants to slice one section of a partition's row (no matter how >>> big the section is). In your case, you are asking to slice multiple >>> sections of a partition's row, which currently isn't supported. >>> >>> It seems silly perhaps that this is the case, as certainly in your >>> example it would seem useful, and not to difficult, but the problem is that >>> you can wind up with n-depth slicing of that partitioned row given an >>> arbitrary query syntax if range queries on clustering keys was allowed >>> anywhere. >>> >>> At present, you can either duplicate the data using the other clustering >>> key (transaction_time) as primary clusterer for this use case, or omit >>> the 3rd criterion (transaction_time = 'xxxx')in the query and get all >>> the range query results and filter on the client. >>> >>> hth, >>> dave >>> >>> >>> >>> On 02/14/2015 06:05 PM, Raj N wrote: >>> >>> I don't think thats solves my problem. The question really is why can't >>> we use ranges for both time columns when they are part of the primary key. >>> They are on 1 row after all. Is this just a CQL limitation? >>> >>> -Raj >>> >>> On Sat, Feb 14, 2015 at 3:35 AM, DuyHai Doan <doanduy...@gmail.com> >>> wrote: >>> >>>> "I am trying to get the state as of a particular transaction_time" >>>> >>>> --> In that case you should probably define your primary key in >>>> another order for clustering columns >>>> >>>> PRIMARY KEY (weatherstation_id,transaction_time,event_time) >>>> >>>> Then, select * from temperatures where weatherstation_id = 'foo' and >>>> event_time >= '2015-01-01 00:00:00' and event_time < '2015-01-02 >>>> 00:00:00' and transaction_time = 'xxxx' >>>> >>>> >>>> >>>> On Sat, Feb 14, 2015 at 3:06 AM, Raj N <raj.cassan...@gmail.com> >>>> wrote: >>>> >>>>> Has anyone designed a bi-temporal table in Cassandra? Doesn't look >>>>> like I can do this using CQL for now. Taking the time series example from >>>>> well known modeling tutorials in Cassandra - >>>>> >>>>> CREATE TABLE temperatures ( >>>>> weatherstation_id text, >>>>> event_time timestamp, >>>>> temperature text, >>>>> PRIMARY KEY (weatherstation_id,event_time), >>>>> ) WITH CLUSTERING ORDER BY (event_time DESC); >>>>> >>>>> If I add another column transaction_time >>>>> >>>>> CREATE TABLE temperatures ( >>>>> weatherstation_id text, >>>>> event_time timestamp, >>>>> transaction_time timestamp, >>>>> temperature text, >>>>> PRIMARY KEY (weatherstation_id,event_time,transaction_time), >>>>> ) WITH CLUSTERING ORDER BY (event_time DESC, transaction_time DESC); >>>>> >>>>> If I try to run a query using the following CQL, it throws an error - >>>>> >>>>> select * from temperatures where weatherstation_id = 'foo' and >>>>> event_time >= '2015-01-01 00:00:00' and event_time < '2015-01-02 >>>>> 00:00:00' and transaction_time < '2015-01-02 00:00:00' >>>>> >>>>> It works if I use an equals clause for the event_time. I am trying >>>>> to get the state as of a particular transaction_time >>>>> >>>>> -Raj >>>>> >>>> >>>> >>> >>> >> >