So in that case I would create a different column family for each app id and then a "time bucket" key as the row key with perhaps an hour resolution? Something like this:
create 123_table organic_events ( hour timestamp, event_id UUID, app_id INT, event_time TIMESTAMP, user_id INT, Š. PRIMARY KEY (hour, event_time, event_id) ) WITH CLUSTERING ORDER BY (event_time desc); Is this what others are doing? On 5/7/13 4:18 PM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote: >We use PlayOrm to do 60,000 different streams which are all time series >and use the virtual column families of PlayOrm so they are all in one >column family. We then partition by time as well. I don't believe that >we really have any hotspots from what I can tell. > >Dean > >From: Keith Wright <kwri...@nanigans.com<mailto:kwri...@nanigans.com>> >Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" ><user@cassandra.apache.org<mailto:user@cassandra.apache.org>> >Date: Tuesday, May 7, 2013 2:02 PM >To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" ><user@cassandra.apache.org<mailto:user@cassandra.apache.org>> >Subject: CQL3 Data Model Question > >Hi all, > > I was hoping you could provide some assistance with a data modeling >question (my apologies if a similar question has already been posed). I >have time based data that I need to store on a per customer (aka app id ) >basis so that I can easily return it in sorted order by event time. The >data in question is being written at high volume (~50K / sec) and I am >concerned about the cardinality of using either app id or event time as >the row key as either will likely result in hot spots. Here are is the >table definition I am considering: > >create table organic_events ( >event_id UUID, >app_id INT, >event_time TIMESTAMP, >user_id INT, >Š. >PRIMARY KEY (app_id, event_time, event_id) >) WITH CLUSTERING ORDER BY (app_id asc,event_time desc); > >So that I can be able to query as follows which will naturally sort the >results by time descending: > >select * from organic_events where app_id = 1234 and event_time <= >'2012-01-01' and event_time > '2012-01-01'; > >Anyone have an idea of the best way to accomplish this? I was >considering the following: > > * Making the row key a concatenation of app id and 0-100 using a mod >on event id to get the value. When getting data I would just fetch all >keys given the mods (app_id in (1234_0,1234_1,1234_2, etc). This would >alleviate the "hot" key issue but still seems expensive and a little hacky > * I tried removing app_id from the primary key all together (using >primary key of user_id, event_time, event_id) and making app_id a >secondary index. I would need to sort by time on the client. The above >query is valid however running a query is VERY slow as I believe it needs >to fetch every row key that matches the index which is quite expensive (I >get a timeout in cqlsh). > * Create a different column family for each app id (I.e. >1234_organic_events). Note that we could easily have 1000s of >application ids. > >Thanks!