Hi, we plan to have 10^8 users and each user could generate 10 events per day. So we have: 10^8 records per day 10^8*30 records per month. Our timewindow analysis could be from 1 to 6 months.
Right now PK is PRIMARY KEY (user_id, ends) where endts is exact ts of event. So you suggest this approach: *PRIMARY KEY ((ymd, user_id), event_ts ) * *WITH CLUSTERING ORDER BY (**event_ts* * DESC);* where ymd=20150102 (the Second of January)? *What happens to writes:* SSTable with past days (ymd < current_day) stay untouched and don't take part in Compaction process since there are o changes to them? What happens to read: I issue query: select some_attributes from events where ymd >= 20150101 and ymd < 20150301 Does Cassandra skip SSTables which don't have ymd in specified range and give me a kind of partition elimination, like in traditional DBs? 2015-04-04 14:41 GMT+02:00 Jack Krupansky <jack.krupan...@gmail.com>: > It depends on the actual number of events per user, but simply bucketing > the partition key can give you the same effect - clustering rows by time > range. A composite partition key could be comprised of the user name and > the date. > > It also depends on the data rate - is it many events per day or just a few > events per week, or over what time period. You need to be careful - you > don't want your Cassandra partitions to be too big (millions of rows) or > too small (just a few or even one row per partition.) > > -- Jack Krupansky > > On Sat, Apr 4, 2015 at 7:03 AM, Serega Sheypak <serega.shey...@gmail.com> > wrote: > >> Hi, I switched from HBase to Cassandra and try to find problem solution >> for timeseries analysis on top Cassandra. >> I have a entity named "Event". >> "Event" has attributes: >> user_id - a guy who triggered event >> event_ts - when even happened >> event_type - type of event >> some_other_attr - some other attrs we don't care about right now. >> >> The DDL for entity event looks this way: >> >> CREATE TABLE user_plans ( >> >> id timeuuid, >> user_id timeuuid, >> event_ts timestamp, >> event_type int, >> some_other_attr text >> >> PRIMARY KEY (user_id, ends) >> ); >> >> Table is "infinite", It would grow continuously during application >> lifetime. >> I want to ask question: >> Cassandra, give me all event where event_ts >= xxx and event_ts <=yyy. >> >> Right now it would lead to full table scan. >> >> There is a trick in HBase. HBase has table abstraction and HBase has >> Column Family abstraction. >> Column family should be declared in advance. >> Column family - physically is a pack of HFiles ("SSTables in C*"). >> So I can easily add partitioning for my HBase table: >> alter table hbase_events add column familiy '2015_01' >> and store all 2015 January data to Column familiy named '2015_01'. >> >> When I want to get January data, I would directly access column family >> named '2015_01' and I won't massage all data in table, just this piece. >> >> What is approach in C* in this case? >> I have an idea create several tables: event_2015_01, event_2015_02, >> e.t.c. but it looks rather ugly from my current understanding how it works. >> >> >> >