basically any time you want to store maps of maps, lists of lists or actual java objects, CQL is not a good fit. CQL is really only good for primitive types, flat lists, maps and sets.
Using Cassandra pure with static columns is perfectly valid, but I don't live in that world. Most of what I do requires dynamic columns mixed with static columns in a single column family. This will sounds like heresy, but an use case that fits perfectly in SQL model, you're better off using something like VoltDB which gives you 100% SQL with ACID. On Wed, Dec 24, 2014 at 10:38 AM, Kai Wang <dep...@gmail.com> wrote: > Ryan, > > Can you elaborate a little on "Thrift over CQL is modeling clustering > columns in different nesting between rows is trivial in Thrift and not > really doable in CQL"? > On Dec 24, 2014 8:30 AM, "Ryan Svihla" <rsvi...@datastax.com> wrote: > >> I'm not entirely certain how you can't model that to solve your use case >> (wouldn't you be filtering the events as well, and therefore be able to get >> all that in one query). >> >> What you describe there has a number of avenues (collections, just >> heavier use of statics in a different order than you specified, object dump >> of events in a single column, switching up the clustering columns) of >> getting your question answered in one query. End of the day cql resolves to >> a given SStable format, you can still open up cassandra-cli and view what a >> given model looks like, when you've grokked this adequately you basically >> can bend CQL to fit your logical thrift modeling, at some point like >> learning any new language you'll learn to speak in both ( something I have >> to do nearly daily). >> >> FWIW other than the primary valid complaint remaining for Thrift over CQL >> is modeling clustering columns in different nesting between rows is trivial >> in Thrift and not really doable in CQL (clustering columns enforce a >> nesting order by logical construct), I've yet to not be able to swap a >> client from thrift to CQL ,and it's always ended up faster (so far). >> >> The main reason for this is performance on modern Cassandra and the >> native protocol is substantially better than pure thrift for many query >> types (see >> http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster) , so >> your mileage may vary, but I'd test it out first before proclaiming that >> thrift is faster for your use case (and make liberal use of cql features >> with cassandra-cli to make sure you know what's going on internally, >> remember it's all just sstables underneath). >> >> >> >> >> On Tue, Dec 23, 2014 at 12:00 PM, David Broyles <sj.clim...@gmail.com> >> wrote: >> >>> Thanks, Ryan. I wasn't aware of static column support, and indeed they >>> get me most of what I need. I think the only potential inefficiency is >>> still at query time. Using Thrift, I could design the column family to get >>> the all the static and dynamic content in a single query. >>> If event_source and total_events are instead implemented as CQL3 >>> statics, I probably need to do two queries to get data for a given >>> event_type >>> >>> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?): >>> SELECT event_source, total_events FROM timeseries WHERE event_type = >>> 'some-type' >>> >>> To get the events: >>> SELECT insertion_time, event FROM timeseries >>> >>> As a combined query, my concern is related to the overhead of repeating >>> event_type/source/total_events (although with potentially many other pieces >>> of static information). >>> >>> More generally, do you find that tuned applications tend to use Thrift, >>> a combination of Thrift and CQL3, or is CQL3 really expected to replace >>> Thrift? >>> >>> Thanks again! >>> >>> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvi...@datastax.com> >>> wrote: >>> >>>> Don't static columns get you what you want? >>>> >>>> >>>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html >>>> On Dec 22, 2014 10:50 PM, "David Broyles" <sj.clim...@gmail.com> >>>> wrote: >>>> >>>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3. Pages >>>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest >>>>> new projects should use CQL3. >>>>> >>>>> I'm wondering, however, if there are certain use cases not well >>>>> covered by CQL3. Consider the standard timeseries example: >>>>> >>>>> CREATE TABLE timeseries ( >>>>> event_type text, >>>>> insertion_time timestamp, >>>>> event blob, >>>>> PRIMARY KEY (event_type, insertion_time) >>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC); >>>>> >>>>> What happens if I want to store additional information that is shared >>>>> by all events in the given series (but that I don't want to include in the >>>>> row ID): e.g. the event source, a cached count of the number of events >>>>> logged to date, etc.? I might try updating the definition as follows: >>>>> >>>>> CREATE TABLE timeseries ( >>>>> event_type text, >>>>> event_source text, >>>>> total_events int, >>>>> insertion_time timestamp, >>>>> event blob, >>>>> PRIMARY KEY (event_type, event_source, total_events, insertion_time) >>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC); >>>>> >>>>> Is this not inefficient? When inserting or querying via CQL3, say in >>>>> batches of up to 1000 events, won't the type/source/count be repeated 1000 >>>>> times? Please let me know if I'm misunderstanding something, or if I >>>>> should be sticking to Thrift for situations like this involving mixed >>>>> static/dynamic data. >>>>> >>>>> Thanks! >>>>> >>>> >>> >> >> >> -- >> >> [image: datastax_logo.png] <http://www.datastax.com/> >> >> Ryan Svihla >> >> Solution Architect >> >> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png] >> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> >> >> DataStax is the fastest, most scalable distributed database technology, >> delivering Apache Cassandra to the world’s most innovative enterprises. >> Datastax is built to be agile, always-on, and predictably scalable to any >> size. With more than 500 customers in 45 countries, DataStax is the >> database technology and transactional backbone of choice for the worlds >> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >> >>