>non-equal relation on a partition key is not supported Ok, can I generate select query: select some_attributes from events where ymd = 20150101 or ymd = 20150102 or 20150103 ... or 20150331
> The partition key determines which node can satisfy the query So you mean that all rows with the same *(ymd, user_id)* would be on one physical node? 2015-04-04 16:38 GMT+02:00 Jack Krupansky <jack.krupan...@gmail.com>: > Unfortunately, a non-equal relation on a partition key is not supported. > You would need to bucket by some larger unit, like a month, and then use > the date/time as a clustering column for the row key. Then you could query > within the partition. The partition key determines which node can satisfy > the query. Designing your partition key judiciously is the key (haha!) to > performant Cassandra applications. > > -- Jack Krupansky > > On Sat, Apr 4, 2015 at 9:33 AM, Serega Sheypak <serega.shey...@gmail.com> > wrote: > >> Hi, we plan to have 10^8 users and each user could generate 10 events per >> day. >> So we have: >> 10^8 records per day >> 10^8*30 records per month. >> Our timewindow analysis could be from 1 to 6 months. >> >> Right now PK is PRIMARY KEY (user_id, ends) where endts is exact ts of >> event. >> >> So you suggest this approach: >> *PRIMARY KEY ((ymd, user_id), event_ts ) * >> *WITH CLUSTERING ORDER BY (**event_ts* >> * DESC);* >> >> where ymd=20150102 (the Second of January)? >> >> *What happens to writes:* >> SSTable with past days (ymd < current_day) stay untouched and don't take >> part in Compaction process since there are o changes to them? >> >> What happens to read: >> I issue query: >> select some_attributes >> from events where ymd >= 20150101 and ymd < 20150301 >> Does Cassandra skip SSTables which don't have ymd in specified range and >> give me a kind of partition elimination, like in traditional DBs? >> >> >> 2015-04-04 14:41 GMT+02:00 Jack Krupansky <jack.krupan...@gmail.com>: >> >>> It depends on the actual number of events per user, but simply bucketing >>> the partition key can give you the same effect - clustering rows by time >>> range. A composite partition key could be comprised of the user name and >>> the date. >>> >>> It also depends on the data rate - is it many events per day or just a >>> few events per week, or over what time period. You need to be careful - you >>> don't want your Cassandra partitions to be too big (millions of rows) or >>> too small (just a few or even one row per partition.) >>> >>> -- Jack Krupansky >>> >>> On Sat, Apr 4, 2015 at 7:03 AM, Serega Sheypak <serega.shey...@gmail.com >>> > wrote: >>> >>>> Hi, I switched from HBase to Cassandra and try to find problem solution >>>> for timeseries analysis on top Cassandra. >>>> I have a entity named "Event". >>>> "Event" has attributes: >>>> user_id - a guy who triggered event >>>> event_ts - when even happened >>>> event_type - type of event >>>> some_other_attr - some other attrs we don't care about right now. >>>> >>>> The DDL for entity event looks this way: >>>> >>>> CREATE TABLE user_plans ( >>>> >>>> id timeuuid, >>>> user_id timeuuid, >>>> event_ts timestamp, >>>> event_type int, >>>> some_other_attr text >>>> >>>> PRIMARY KEY (user_id, ends) >>>> ); >>>> >>>> Table is "infinite", It would grow continuously during application >>>> lifetime. >>>> I want to ask question: >>>> Cassandra, give me all event where event_ts >= xxx and event_ts <=yyy. >>>> >>>> Right now it would lead to full table scan. >>>> >>>> There is a trick in HBase. HBase has table abstraction and HBase has >>>> Column Family abstraction. >>>> Column family should be declared in advance. >>>> Column family - physically is a pack of HFiles ("SSTables in C*"). >>>> So I can easily add partitioning for my HBase table: >>>> alter table hbase_events add column familiy '2015_01' >>>> and store all 2015 January data to Column familiy named '2015_01'. >>>> >>>> When I want to get January data, I would directly access column family >>>> named '2015_01' and I won't massage all data in table, just this piece. >>>> >>>> What is approach in C* in this case? >>>> I have an idea create several tables: event_2015_01, event_2015_02, >>>> e.t.c. but it looks rather ugly from my current understanding how it works. >>>> >>>> >>>> >>> >> >