It sounds like your time bucket should be a month, but it depends on the amount of data per user per day and your main query range. Within the partition you can then query for a range of days.
Yes, all of the rows within a partition are stored on one physical node as well as the replica nodes. -- Jack Krupansky On Sat, Apr 4, 2015 at 1:38 PM, Serega Sheypak <serega.shey...@gmail.com> wrote: > >non-equal relation on a partition key is not supported > Ok, can I generate select query: > select some_attributes > from events where ymd = 20150101 or ymd = 20150102 or 20150103 ... or > 20150331 > > > The partition key determines which node can satisfy the query > So you mean that all rows with the same *(ymd, user_id)* would be on one > physical node? > > > 2015-04-04 16:38 GMT+02:00 Jack Krupansky <jack.krupan...@gmail.com>: > >> Unfortunately, a non-equal relation on a partition key is not supported. >> You would need to bucket by some larger unit, like a month, and then use >> the date/time as a clustering column for the row key. Then you could query >> within the partition. The partition key determines which node can satisfy >> the query. Designing your partition key judiciously is the key (haha!) to >> performant Cassandra applications. >> >> -- Jack Krupansky >> >> On Sat, Apr 4, 2015 at 9:33 AM, Serega Sheypak <serega.shey...@gmail.com> >> wrote: >> >>> Hi, we plan to have 10^8 users and each user could generate 10 events >>> per day. >>> So we have: >>> 10^8 records per day >>> 10^8*30 records per month. >>> Our timewindow analysis could be from 1 to 6 months. >>> >>> Right now PK is PRIMARY KEY (user_id, ends) where endts is exact ts of >>> event. >>> >>> So you suggest this approach: >>> *PRIMARY KEY ((ymd, user_id), event_ts ) * >>> *WITH CLUSTERING ORDER BY (**event_ts* >>> * DESC);* >>> >>> where ymd=20150102 (the Second of January)? >>> >>> *What happens to writes:* >>> SSTable with past days (ymd < current_day) stay untouched and don't take >>> part in Compaction process since there are o changes to them? >>> >>> What happens to read: >>> I issue query: >>> select some_attributes >>> from events where ymd >= 20150101 and ymd < 20150301 >>> Does Cassandra skip SSTables which don't have ymd in specified range and >>> give me a kind of partition elimination, like in traditional DBs? >>> >>> >>> 2015-04-04 14:41 GMT+02:00 Jack Krupansky <jack.krupan...@gmail.com>: >>> >>>> It depends on the actual number of events per user, but simply >>>> bucketing the partition key can give you the same effect - clustering rows >>>> by time range. A composite partition key could be comprised of the user >>>> name and the date. >>>> >>>> It also depends on the data rate - is it many events per day or just a >>>> few events per week, or over what time period. You need to be careful - you >>>> don't want your Cassandra partitions to be too big (millions of rows) or >>>> too small (just a few or even one row per partition.) >>>> >>>> -- Jack Krupansky >>>> >>>> On Sat, Apr 4, 2015 at 7:03 AM, Serega Sheypak < >>>> serega.shey...@gmail.com> wrote: >>>> >>>>> Hi, I switched from HBase to Cassandra and try to find problem >>>>> solution for timeseries analysis on top Cassandra. >>>>> I have a entity named "Event". >>>>> "Event" has attributes: >>>>> user_id - a guy who triggered event >>>>> event_ts - when even happened >>>>> event_type - type of event >>>>> some_other_attr - some other attrs we don't care about right now. >>>>> >>>>> The DDL for entity event looks this way: >>>>> >>>>> CREATE TABLE user_plans ( >>>>> >>>>> id timeuuid, >>>>> user_id timeuuid, >>>>> event_ts timestamp, >>>>> event_type int, >>>>> some_other_attr text >>>>> >>>>> PRIMARY KEY (user_id, ends) >>>>> ); >>>>> >>>>> Table is "infinite", It would grow continuously during application >>>>> lifetime. >>>>> I want to ask question: >>>>> Cassandra, give me all event where event_ts >= xxx and event_ts <=yyy. >>>>> >>>>> Right now it would lead to full table scan. >>>>> >>>>> There is a trick in HBase. HBase has table abstraction and HBase has >>>>> Column Family abstraction. >>>>> Column family should be declared in advance. >>>>> Column family - physically is a pack of HFiles ("SSTables in C*"). >>>>> So I can easily add partitioning for my HBase table: >>>>> alter table hbase_events add column familiy '2015_01' >>>>> and store all 2015 January data to Column familiy named '2015_01'. >>>>> >>>>> When I want to get January data, I would directly access column family >>>>> named '2015_01' and I won't massage all data in table, just this piece. >>>>> >>>>> What is approach in C* in this case? >>>>> I have an idea create several tables: event_2015_01, event_2015_02, >>>>> e.t.c. but it looks rather ugly from my current understanding how it >>>>> works. >>>>> >>>>> >>>>> >>>> >>> >> >