Ryan, Can you elaborate a little on "Thrift over CQL is modeling clustering columns in different nesting between rows is trivial in Thrift and not really doable in CQL"? On Dec 24, 2014 8:30 AM, "Ryan Svihla" <rsvi...@datastax.com> wrote:
> I'm not entirely certain how you can't model that to solve your use case > (wouldn't you be filtering the events as well, and therefore be able to get > all that in one query). > > What you describe there has a number of avenues (collections, just > heavier use of statics in a different order than you specified, object dump > of events in a single column, switching up the clustering columns) of > getting your question answered in one query. End of the day cql resolves to > a given SStable format, you can still open up cassandra-cli and view what a > given model looks like, when you've grokked this adequately you basically > can bend CQL to fit your logical thrift modeling, at some point like > learning any new language you'll learn to speak in both ( something I have > to do nearly daily). > > FWIW other than the primary valid complaint remaining for Thrift over CQL > is modeling clustering columns in different nesting between rows is trivial > in Thrift and not really doable in CQL (clustering columns enforce a > nesting order by logical construct), I've yet to not be able to swap a > client from thrift to CQL ,and it's always ended up faster (so far). > > The main reason for this is performance on modern Cassandra and the native > protocol is substantially better than pure thrift for many query types (see > http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster) , so > your mileage may vary, but I'd test it out first before proclaiming that > thrift is faster for your use case (and make liberal use of cql features > with cassandra-cli to make sure you know what's going on internally, > remember it's all just sstables underneath). > > > > > On Tue, Dec 23, 2014 at 12:00 PM, David Broyles <sj.clim...@gmail.com> > wrote: > >> Thanks, Ryan. I wasn't aware of static column support, and indeed they >> get me most of what I need. I think the only potential inefficiency is >> still at query time. Using Thrift, I could design the column family to get >> the all the static and dynamic content in a single query. >> If event_source and total_events are instead implemented as CQL3 statics, >> I probably need to do two queries to get data for a given event_type >> >> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?): >> SELECT event_source, total_events FROM timeseries WHERE event_type = >> 'some-type' >> >> To get the events: >> SELECT insertion_time, event FROM timeseries >> >> As a combined query, my concern is related to the overhead of repeating >> event_type/source/total_events (although with potentially many other pieces >> of static information). >> >> More generally, do you find that tuned applications tend to use Thrift, a >> combination of Thrift and CQL3, or is CQL3 really expected to replace >> Thrift? >> >> Thanks again! >> >> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvi...@datastax.com> >> wrote: >> >>> Don't static columns get you what you want? >>> >>> >>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html >>> On Dec 22, 2014 10:50 PM, "David Broyles" <sj.clim...@gmail.com> wrote: >>> >>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3. Pages >>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest >>>> new projects should use CQL3. >>>> >>>> I'm wondering, however, if there are certain use cases not well covered >>>> by CQL3. Consider the standard timeseries example: >>>> >>>> CREATE TABLE timeseries ( >>>> event_type text, >>>> insertion_time timestamp, >>>> event blob, >>>> PRIMARY KEY (event_type, insertion_time) >>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC); >>>> >>>> What happens if I want to store additional information that is shared >>>> by all events in the given series (but that I don't want to include in the >>>> row ID): e.g. the event source, a cached count of the number of events >>>> logged to date, etc.? I might try updating the definition as follows: >>>> >>>> CREATE TABLE timeseries ( >>>> event_type text, >>>> event_source text, >>>> total_events int, >>>> insertion_time timestamp, >>>> event blob, >>>> PRIMARY KEY (event_type, event_source, total_events, insertion_time) >>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC); >>>> >>>> Is this not inefficient? When inserting or querying via CQL3, say in >>>> batches of up to 1000 events, won't the type/source/count be repeated 1000 >>>> times? Please let me know if I'm misunderstanding something, or if I >>>> should be sticking to Thrift for situations like this involving mixed >>>> static/dynamic data. >>>> >>>> Thanks! >>>> >>> >> > > > -- > > [image: datastax_logo.png] <http://www.datastax.com/> > > Ryan Svihla > > Solution Architect > > [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png] > <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> > > DataStax is the fastest, most scalable distributed database technology, > delivering Apache Cassandra to the world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > >