Hi,

so we are developing a system that computes profile of things that it observes. 
The observation comes in form of events. Each thing that it observe has an id 
and each thing has a set of subthings in it which has measurement of some kind. 
Roughly there are about 500 subthings within each thing. We receive events 
containing measurements of these 500 subthings every 10 seconds or so.

So as we receive events, we  read the old profile value, calculate the new 
profile based on the new value and save it back. We use the following schema to 
hold the profile. 

CREATE TABLE myprofile (
    id text,
    month text,
    day text,
    hour text,
    subthings text,
    lastvalue double,
    count int,
    stddev double,
 PRIMARY KEY ((id, month, day, hour), subthings)
) WITH CLUSTERING ORDER BY (subthings ASC) );


This profile will then be use for certain analytics that can use in the context 
of the ‘thing’ or in the context of specific thing and subthing. 

A profile can be defined as monthly, daily, hourly. So in case of monthly the 
month will be set to the current month (i.e. ‘Oct’) and the day and hour will 
be set to empty ‘’ string.


The problem that we have observed is that over time (actually in just a matter 
of hours) we will see a huge degradation of query response  for the monthly 
profile. At the start it will be respinding in 10-100 ms and after a couple of 
hours it will go to 2000-3000 ms . If you leave it for a couple of days you 
will start experiencing readtimeouts . The query is basically just :

select * from myprofile where id=‘1’ and month=‘Oct’ and day=‘’ and hour=‘'

This will have only about 500 rows or so.


I believe that this is cause by the fact there are multiple updates done to 
this specific partition. So what do we think can be done to resolve this ? 

BTW, I am using Cassandra 2.2.1 . And since this is a test , this is just 
running on a single node.




Reply via email to