Re: Storing bi-temporal data in Cassandra

Peter Lin Sun, 15 Feb 2015 07:06:06 -0800

I've built several different bi-temporal databases over the year for a
variety of applications, so I have to ask "why are you modeling it this
way?"


Having a temperatures table doesn't make sense to me. Normally a
bi-temporal database has transaction time and valid time. The transaction
time is the timestamp of when the data is saved and valid time is usually
expressed as effective & expiration time.

In the example, is event time suppose to be valid time and what is the
granularity (seconds, hours, or day)?
Which kind of queries do you need to query most of the time?
Why is event_time and transaction_time part of the key?

If you take time to study temporal databases, having valid time in the keys
will cause a lot of headaches. Basically, it's an anti-pattern for temporal
databases. The key should only be the unique identifier and it shouldn't
have transaction or valid time. The use case you describe looks more like
regular time series data and not really bi-temporal.

I would suggest take time to understand the kinds of queries you need to
run and then change the table.

peter


On Sun, Feb 15, 2015 at 9:00 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> I had forgotten, but there is a new tuple notation to iterate over more
> than one clustering column in C* 2.0.6:
>
> https://issues.apache.org/jira/browse/CASSANDRA-4851
>
> For example,
>
> SELECT ... WHERE (c1, c2) > (1, 0)
>
> There's an example in the CQL spec:
> https://cassandra.apache.org/doc/cql3/CQL.html
>
>
> -- Jack Krupansky
>
> On Sat, Feb 14, 2015 at 6:29 PM, Dave Brosius <dbros...@mebigfatguy.com>
> wrote:
>
>>  As you point out, there's not really a node-based problem with your
>> query from a performance point of view. This is a limitation of CQL in
>> that, cql wants to slice one section of a partition's row (no matter how
>> big the section is). In your case, you are asking to slice multiple
>> sections of a partition's row, which currently isn't supported.
>>
>> It seems silly perhaps that this is the case, as certainly in your
>> example it would seem useful, and not to difficult, but the problem is that
>> you can wind up with n-depth slicing of that partitioned row given an
>> arbitrary query syntax if range queries on clustering keys was allowed
>> anywhere.
>>
>> At present, you can either duplicate the data using the other clustering
>> key (transaction_time) as primary clusterer for this use case, or omit
>> the 3rd criterion (transaction_time = 'xxxx')in the query and get all
>> the range query results and filter on the client.
>>
>> hth,
>> dave
>>
>>
>>
>> On 02/14/2015 06:05 PM, Raj N wrote:
>>
>> I don't think thats solves my problem. The question really is why can't
>> we use ranges for both time columns when they are part of the primary key.
>> They are on 1 row after all. Is this just a CQL limitation?
>>
>>  -Raj
>>
>> On Sat, Feb 14, 2015 at 3:35 AM, DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>>>  "I am trying to get the state as of a particular transaction_time"
>>>
>>>   --> In that case you should probably define your primary key in
>>> another order for clustering columns
>>>
>>>  PRIMARY KEY (weatherstation_id,transaction_time,event_time)
>>>
>>>  Then, select * from temperatures where weatherstation_id = 'foo' and
>>> event_time >= '2015-01-01 00:00:00' and event_time < '2015-01-02
>>> 00:00:00' and transaction_time = 'xxxx'
>>>
>>>
>>>
>>>  On Sat, Feb 14, 2015 at 3:06 AM, Raj N <raj.cassan...@gmail.com> wrote:
>>>
>>>> Has anyone designed a bi-temporal table in Cassandra? Doesn't look like
>>>> I can do this using CQL for now. Taking the time series example from well
>>>> known modeling tutorials in Cassandra -
>>>>
>>>>  CREATE TABLE temperatures (
>>>> weatherstation_id text,
>>>> event_time timestamp,
>>>> temperature text,
>>>> PRIMARY KEY (weatherstation_id,event_time),
>>>> ) WITH CLUSTERING ORDER BY (event_time DESC);
>>>>
>>>>  If I add another column transaction_time
>>>>
>>>>  CREATE TABLE temperatures (
>>>> weatherstation_id text,
>>>> event_time timestamp,
>>>> transaction_time timestamp,
>>>>  temperature text,
>>>> PRIMARY KEY (weatherstation_id,event_time,transaction_time),
>>>> ) WITH CLUSTERING ORDER BY (event_time DESC, transaction_time DESC);
>>>>
>>>>  If I try to run a query using the following CQL, it throws an error -
>>>>
>>>>  select * from temperatures where weatherstation_id = 'foo' and
>>>> event_time >= '2015-01-01 00:00:00' and event_time < '2015-01-02
>>>> 00:00:00' and transaction_time < '2015-01-02 00:00:00'
>>>>
>>>>  It works if I use an equals clause for the event_time. I am trying to
>>>> get the state as of a particular transaction_time
>>>>
>>>>  -Raj
>>>>
>>>
>>>
>>
>>
>

Re: Storing bi-temporal data in Cassandra

Reply via email to