Thanks, Ryan.  I wasn't aware of static column support, and indeed they get
me most of what I need.  I think the only potential inefficiency  is still
at query time.  Using Thrift, I could design the column family to get the
all the static and dynamic content in a single query.
If event_source and total_events are instead implemented as CQL3 statics, I
probably need to do two queries to get data for a given event_type

To get event metadata (is the LIMIT 1 needed to reduce to 1 record?):
SELECT event_source, total_events FROM timeseries WHERE event_type =
'some-type'

To get the events:
SELECT insertion_time, event FROM timeseries

As a combined query, my concern is related to the overhead of repeating
event_type/source/total_events (although with potentially many other pieces
of static information).

More generally, do you find that tuned applications tend to use Thrift, a
combination of Thrift and CQL3, or is CQL3 really expected to replace
Thrift?

Thanks again!

On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvi...@datastax.com> wrote:

> Don't static columns get you what you want?
>
>
> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
>  On Dec 22, 2014 10:50 PM, "David Broyles" <sj.clim...@gmail.com> wrote:
>
>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3.  Pages such
>> as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest new
>> projects should use CQL3.
>>
>> I'm wondering, however, if there are certain use cases not well covered
>> by CQL3.  Consider the standard timeseries example:
>>
>> CREATE TABLE timeseries (
>>    event_type text,
>>    insertion_time timestamp,
>>    event blob,
>>    PRIMARY KEY (event_type, insertion_time)
>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>
>> What happens if I want to store additional information that is shared by
>> all events in the given series (but that I don't want to include in the row
>> ID): e.g. the event source, a cached count of the number of events logged
>> to date, etc.?  I might try updating the definition as follows:
>>
>> CREATE TABLE timeseries (
>>    event_type text,
>>       event_source text,
>>    total_events int,
>>    insertion_time timestamp,
>>    event blob,
>>    PRIMARY KEY (event_type, event_source, total_events, insertion_time)
>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>
>> Is this not inefficient?  When inserting or querying via CQL3, say in
>> batches of up to 1000 events, won't the type/source/count be repeated 1000
>> times?  Please let me know if I'm misunderstanding something, or if I
>> should be sticking to Thrift for situations like this involving mixed
>> static/dynamic data.
>>
>> Thanks!
>>
>

Reply via email to