Thanks, Ryan. I wasn't aware of static column support, and indeed they get me most of what I need. I think the only potential inefficiency is still at query time. Using Thrift, I could design the column family to get the all the static and dynamic content in a single query. If event_source and total_events are instead implemented as CQL3 statics, I probably need to do two queries to get data for a given event_type
To get event metadata (is the LIMIT 1 needed to reduce to 1 record?): SELECT event_source, total_events FROM timeseries WHERE event_type = 'some-type' To get the events: SELECT insertion_time, event FROM timeseries As a combined query, my concern is related to the overhead of repeating event_type/source/total_events (although with potentially many other pieces of static information). More generally, do you find that tuned applications tend to use Thrift, a combination of Thrift and CQL3, or is CQL3 really expected to replace Thrift? Thanks again! On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvi...@datastax.com> wrote: > Don't static columns get you what you want? > > > http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html > On Dec 22, 2014 10:50 PM, "David Broyles" <sj.clim...@gmail.com> wrote: > >> Although I used Cassandra 1.0.X extensively, I'm new to CQL3. Pages such >> as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest new >> projects should use CQL3. >> >> I'm wondering, however, if there are certain use cases not well covered >> by CQL3. Consider the standard timeseries example: >> >> CREATE TABLE timeseries ( >> event_type text, >> insertion_time timestamp, >> event blob, >> PRIMARY KEY (event_type, insertion_time) >> ) WITH CLUSTERING ORDER BY (insertion_time DESC); >> >> What happens if I want to store additional information that is shared by >> all events in the given series (but that I don't want to include in the row >> ID): e.g. the event source, a cached count of the number of events logged >> to date, etc.? I might try updating the definition as follows: >> >> CREATE TABLE timeseries ( >> event_type text, >> event_source text, >> total_events int, >> insertion_time timestamp, >> event blob, >> PRIMARY KEY (event_type, event_source, total_events, insertion_time) >> ) WITH CLUSTERING ORDER BY (insertion_time DESC); >> >> Is this not inefficient? When inserting or querying via CQL3, say in >> batches of up to 1000 events, won't the type/source/count be repeated 1000 >> times? Please let me know if I'm misunderstanding something, or if I >> should be sticking to Thrift for situations like this involving mixed >> static/dynamic data. >> >> Thanks! >> >