Re: CQL3 vs Thrift

Peter Lin Wed, 24 Dec 2014 07:49:28 -0800

basically any time you want to store maps of maps, lists of lists or actual
java objects, CQL is not a good fit. CQL is really only good for primitive
types, flat lists, maps and sets.


Using Cassandra pure with static columns is perfectly valid, but I don't
live in that world. Most of what I do requires dynamic columns mixed with
static columns in a single column family. This will sounds like heresy, but
an use case that fits perfectly in SQL model, you're better off using
something like VoltDB which gives you 100% SQL with ACID.



On Wed, Dec 24, 2014 at 10:38 AM, Kai Wang <dep...@gmail.com> wrote:

> Ryan,
>
> Can you elaborate a little on "Thrift over CQL is modeling clustering
> columns in different nesting between rows is trivial in Thrift and not
> really doable in CQL"?
> On Dec 24, 2014 8:30 AM, "Ryan Svihla" <rsvi...@datastax.com> wrote:
>
>> I'm not entirely certain how you can't model that to solve your use case
>> (wouldn't you be filtering the events as well, and therefore be able to get
>> all that in one query).
>>
>>  What you describe there has a number of avenues (collections, just
>> heavier use of statics in a different order than you specified, object dump
>> of events in a single column, switching up the clustering columns) of
>> getting your question answered in one query. End of the day cql resolves to
>> a given SStable format, you can still open up cassandra-cli and view what a
>> given model looks like, when you've grokked this adequately you basically
>> can bend CQL to fit your logical thrift modeling, at some point like
>> learning any new language you'll learn to speak in both ( something I have
>> to do nearly daily).
>>
>> FWIW other than the primary valid complaint remaining for Thrift over CQL
>> is modeling clustering columns in different nesting between rows is trivial
>> in Thrift and not really doable in CQL (clustering columns enforce a
>> nesting order by logical construct), I've yet to not be able to swap a
>> client from thrift to CQL ,and it's always ended up faster (so far).
>>
>> The main reason for this is performance on modern Cassandra and the
>> native protocol is substantially better than pure thrift for many query
>> types (see
>> http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster) , so
>> your mileage may vary, but I'd test it out first before proclaiming that
>> thrift is faster for your use case (and make liberal use of cql features
>> with cassandra-cli to make sure you know what's going on internally,
>> remember it's all just sstables underneath).
>>
>>
>>
>>
>> On Tue, Dec 23, 2014 at 12:00 PM, David Broyles <sj.clim...@gmail.com>
>> wrote:
>>
>>> Thanks, Ryan.  I wasn't aware of static column support, and indeed they
>>> get me most of what I need.  I think the only potential inefficiency  is
>>> still at query time.  Using Thrift, I could design the column family to get
>>> the all the static and dynamic content in a single query.
>>> If event_source and total_events are instead implemented as CQL3
>>> statics, I probably need to do two queries to get data for a given
>>> event_type
>>>
>>> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?):
>>> SELECT event_source, total_events FROM timeseries WHERE event_type =
>>> 'some-type'
>>>
>>> To get the events:
>>> SELECT insertion_time, event FROM timeseries
>>>
>>> As a combined query, my concern is related to the overhead of repeating
>>> event_type/source/total_events (although with potentially many other pieces
>>> of static information).
>>>
>>> More generally, do you find that tuned applications tend to use Thrift,
>>> a combination of Thrift and CQL3, or is CQL3 really expected to replace
>>> Thrift?
>>>
>>> Thanks again!
>>>
>>> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvi...@datastax.com>
>>> wrote:
>>>
>>>> Don't static columns get you what you want?
>>>>
>>>>
>>>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
>>>>  On Dec 22, 2014 10:50 PM, "David Broyles" <sj.clim...@gmail.com>
>>>> wrote:
>>>>
>>>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3.  Pages
>>>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest
>>>>> new projects should use CQL3.
>>>>>
>>>>> I'm wondering, however, if there are certain use cases not well
>>>>> covered by CQL3.  Consider the standard timeseries example:
>>>>>
>>>>> CREATE TABLE timeseries (
>>>>>    event_type text,
>>>>>    insertion_time timestamp,
>>>>>    event blob,
>>>>>    PRIMARY KEY (event_type, insertion_time)
>>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>>
>>>>> What happens if I want to store additional information that is shared
>>>>> by all events in the given series (but that I don't want to include in the
>>>>> row ID): e.g. the event source, a cached count of the number of events
>>>>> logged to date, etc.?  I might try updating the definition as follows:
>>>>>
>>>>> CREATE TABLE timeseries (
>>>>>    event_type text,
>>>>>       event_source text,
>>>>>    total_events int,
>>>>>    insertion_time timestamp,
>>>>>    event blob,
>>>>>    PRIMARY KEY (event_type, event_source, total_events, insertion_time)
>>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>>
>>>>> Is this not inefficient?  When inserting or querying via CQL3, say in
>>>>> batches of up to 1000 events, won't the type/source/count be repeated 1000
>>>>> times?  Please let me know if I'm misunderstanding something, or if I
>>>>> should be sticking to Thrift for situations like this involving mixed
>>>>> static/dynamic data.
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Ryan Svihla
>>
>> Solution Architect
>>
>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>>

Re: CQL3 vs Thrift

Reply via email to