So in that case I would create a different column family for each app id
and then a "time bucket" key as the row key with perhaps an hour
resolution?  Something like this:

create 123_table organic_events (
   hour timestamp,
   event_id UUID,
   app_id INT,
   event_time TIMESTAMP,
   user_id INT,
   Š.
   PRIMARY KEY (hour, event_time, event_id)
)  WITH CLUSTERING ORDER BY (event_time desc);



Is this what others are doing?


On 5/7/13 4:18 PM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote:

>We use PlayOrm to do 60,000 different streams which are all time series
>and use the virtual column families of PlayOrm so they are all in one
>column family.  We then partition by time as well.  I don't believe that
>we really have any hotspots from what I can tell.
>
>Dean
>
>From: Keith Wright <kwri...@nanigans.com<mailto:kwri...@nanigans.com>>
>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Date: Tuesday, May 7, 2013 2:02 PM
>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Subject: CQL3 Data Model Question
>
>Hi all,
>
>    I was hoping you could provide some assistance with a data modeling
>question (my apologies if a similar question has already been posed).  I
>have time based data that I need to store on a per customer (aka app id )
>basis so that I can easily return it in sorted order by event time.  The
>data in question is being written at high volume (~50K / sec) and I am
>concerned about the cardinality of using either app id or event time as
>the row key as either will likely result in hot spots.  Here are is the
>table definition I am considering:
>
>create table organic_events (
>event_id UUID,
>app_id INT,
>event_time TIMESTAMP,
>user_id INT,
>Š.
>PRIMARY KEY (app_id, event_time, event_id)
>)  WITH CLUSTERING ORDER BY (app_id asc,event_time desc);
>
>So that I can be able to query as follows which will naturally sort the
>results by time descending:
>
>select * from organic_events where app_id = 1234 and event_time <=
>'2012-01-01' and event_time > '2012-01-01';
>
>Anyone have an idea of the best way to accomplish this?  I was
>considering the following:
>
> *   Making the row key a concatenation of app id and 0-100 using a mod
>on event id to get the value.  When getting data I would just fetch all
>keys given the mods (app_id in (1234_0,1234_1,1234_2, etc).  This would
>alleviate the "hot" key issue but still seems expensive and a little hacky
> *   I tried removing app_id from the primary key all together (using
>primary key of user_id, event_time, event_id) and making app_id a
>secondary index.  I would need to sort by time on the client.  The above
>query is valid however running a query is VERY slow as I believe it needs
>to fetch every row key that matches the index which is quite expensive (I
>get a timeout in cqlsh).
> *   Create a different column family for each app id (I.e.
>1234_organic_events).  Note that we could easily have 1000s of
>application ids.
>
>Thanks!

Reply via email to