Re: CQL3 vs Thrift

Eric Stevens Wed, 24 Dec 2014 08:44:53 -0800

As Ryan mentioned, CQL is simply a translation layer to the underlying
storage mechanism you're already familiar with with Thrift.


There are definitely corner cases where it's not possible to get a
one-for-one equivalent in CQL vs Thrift, and even when there's equivalents,
the underlying data might not look exactly the same (eg, if you used string
composites instead of native composites, or several mixed composite types,
and so on).

CQL is not meant to provide SQL equivalency.  It's not only missing many
SQL constructs, it's also got a number of unique constructs of its own.
It's meant to be familiar looking to people comfortable with SQL, but you
cannot reason about it the same way.

Everyone is of course free to use the access layer they prefer, but
personally I would recommend building all new features using a CQL oriented
approach.  The Thrift interface is frozen, it will not get new features,
and there are some really awesome features already released only for CQL,
and more are coming.  Find a path that works for you in CQL; we had to
change our thinking about a number of things, but it's worth the effort.

On Wed, Dec 24, 2014 at 8:48 AM, Peter Lin <wool...@gmail.com> wrote:

>
> basically any time you want to store maps of maps, lists of lists or
> actual java objects, CQL is not a good fit. CQL is really only good for
> primitive types, flat lists, maps and sets.
>
> Using Cassandra pure with static columns is perfectly valid, but I don't
> live in that world. Most of what I do requires dynamic columns mixed with
> static columns in a single column family. This will sounds like heresy, but
> an use case that fits perfectly in SQL model, you're better off using
> something like VoltDB which gives you 100% SQL with ACID.
>
>
>
> On Wed, Dec 24, 2014 at 10:38 AM, Kai Wang <dep...@gmail.com> wrote:
>
>> Ryan,
>>
>> Can you elaborate a little on "Thrift over CQL is modeling clustering
>> columns in different nesting between rows is trivial in Thrift and not
>> really doable in CQL"?
>> On Dec 24, 2014 8:30 AM, "Ryan Svihla" <rsvi...@datastax.com> wrote:
>>
>>> I'm not entirely certain how you can't model that to solve your use case
>>> (wouldn't you be filtering the events as well, and therefore be able to get
>>> all that in one query).
>>>
>>>  What you describe there has a number of avenues (collections, just
>>> heavier use of statics in a different order than you specified, object dump
>>> of events in a single column, switching up the clustering columns) of
>>> getting your question answered in one query. End of the day cql resolves to
>>> a given SStable format, you can still open up cassandra-cli and view what a
>>> given model looks like, when you've grokked this adequately you basically
>>> can bend CQL to fit your logical thrift modeling, at some point like
>>> learning any new language you'll learn to speak in both ( something I have
>>> to do nearly daily).
>>>
>>> FWIW other than the primary valid complaint remaining for Thrift over
>>> CQL is modeling clustering columns in different nesting between rows is
>>> trivial in Thrift and not really doable in CQL (clustering columns enforce
>>> a nesting order by logical construct), I've yet to not be able to swap a
>>> client from thrift to CQL ,and it's always ended up faster (so far).
>>>
>>> The main reason for this is performance on modern Cassandra and the
>>> native protocol is substantially better than pure thrift for many query
>>> types (see
>>> http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster) , so
>>> your mileage may vary, but I'd test it out first before proclaiming that
>>> thrift is faster for your use case (and make liberal use of cql features
>>> with cassandra-cli to make sure you know what's going on internally,
>>> remember it's all just sstables underneath).
>>>
>>>
>>>
>>>
>>> On Tue, Dec 23, 2014 at 12:00 PM, David Broyles <sj.clim...@gmail.com>
>>> wrote:
>>>
>>>> Thanks, Ryan.  I wasn't aware of static column support, and indeed they
>>>> get me most of what I need.  I think the only potential inefficiency  is
>>>> still at query time.  Using Thrift, I could design the column family to get
>>>> the all the static and dynamic content in a single query.
>>>> If event_source and total_events are instead implemented as CQL3
>>>> statics, I probably need to do two queries to get data for a given
>>>> event_type
>>>>
>>>> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?):
>>>> SELECT event_source, total_events FROM timeseries WHERE event_type =
>>>> 'some-type'
>>>>
>>>> To get the events:
>>>> SELECT insertion_time, event FROM timeseries
>>>>
>>>> As a combined query, my concern is related to the overhead of repeating
>>>> event_type/source/total_events (although with potentially many other pieces
>>>> of static information).
>>>>
>>>> More generally, do you find that tuned applications tend to use Thrift,
>>>> a combination of Thrift and CQL3, or is CQL3 really expected to replace
>>>> Thrift?
>>>>
>>>> Thanks again!
>>>>
>>>> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvi...@datastax.com>
>>>> wrote:
>>>>
>>>>> Don't static columns get you what you want?
>>>>>
>>>>>
>>>>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
>>>>>  On Dec 22, 2014 10:50 PM, "David Broyles" <sj.clim...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3.  Pages
>>>>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest
>>>>>> new projects should use CQL3.
>>>>>>
>>>>>> I'm wondering, however, if there are certain use cases not well
>>>>>> covered by CQL3.  Consider the standard timeseries example:
>>>>>>
>>>>>> CREATE TABLE timeseries (
>>>>>>    event_type text,
>>>>>>    insertion_time timestamp,
>>>>>>    event blob,
>>>>>>    PRIMARY KEY (event_type, insertion_time)
>>>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>>>
>>>>>> What happens if I want to store additional information that is shared
>>>>>> by all events in the given series (but that I don't want to include in 
>>>>>> the
>>>>>> row ID): e.g. the event source, a cached count of the number of events
>>>>>> logged to date, etc.?  I might try updating the definition as follows:
>>>>>>
>>>>>> CREATE TABLE timeseries (
>>>>>>    event_type text,
>>>>>>       event_source text,
>>>>>>    total_events int,
>>>>>>    insertion_time timestamp,
>>>>>>    event blob,
>>>>>>    PRIMARY KEY (event_type, event_source, total_events,
>>>>>> insertion_time)
>>>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>>>
>>>>>> Is this not inefficient?  When inserting or querying via CQL3, say in
>>>>>> batches of up to 1000 events, won't the type/source/count be repeated 
>>>>>> 1000
>>>>>> times?  Please let me know if I'm misunderstanding something, or if I
>>>>>> should be sticking to Thrift for situations like this involving mixed
>>>>>> static/dynamic data.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Ryan Svihla
>>>
>>> Solution Architect
>>>
>>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>>
>

Re: CQL3 vs Thrift

Reply via email to