Re: CQL3 vs Thrift

Ryan Svihla Wed, 24 Dec 2014 05:33:07 -0800

Peter,

Can you come up with some specifics? I'm always interested in finding more
corner cases, but it's also possible I have a modeling alternative that you
may not have considered yet, regardless it's good practice and background
for me.


On Tue, Dec 23, 2014 at 12:26 PM, Peter Lin <wool...@gmail.com> wrote:

>
> I'm bias in favor of using both thrift and CQL3, though many people on the
> list probably think I'm crazy.
>
> CQL3 is good if what you need fits nicely in static columns, but it
> doesn't if you want to use dynamic columns and/or mix & match both in the
> same columnFamily. For a lot of what I use Cassandra for, CQL3 currently
> doesn't provide all the functionality. It is possible to extend CQL3
> further to make it handle 100% of the use cases that Thrift supports today.
>
> whether that will happen is anyone's guess. SQL "like" syntax is popular
> and many people understand it, but it doesn't necessarily line up perfectly
> with NoSql column databases.
>
>
> On Tue, Dec 23, 2014 at 1:00 PM, David Broyles <sj.clim...@gmail.com>
> wrote:
>
>> Thanks, Ryan.  I wasn't aware of static column support, and indeed they
>> get me most of what I need.  I think the only potential inefficiency  is
>> still at query time.  Using Thrift, I could design the column family to get
>> the all the static and dynamic content in a single query.
>> If event_source and total_events are instead implemented as CQL3 statics,
>> I probably need to do two queries to get data for a given event_type
>>
>> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?):
>> SELECT event_source, total_events FROM timeseries WHERE event_type =
>> 'some-type'
>>
>> To get the events:
>> SELECT insertion_time, event FROM timeseries
>>
>> As a combined query, my concern is related to the overhead of repeating
>> event_type/source/total_events (although with potentially many other pieces
>> of static information).
>>
>> More generally, do you find that tuned applications tend to use Thrift, a
>> combination of Thrift and CQL3, or is CQL3 really expected to replace
>> Thrift?
>>
>> Thanks again!
>>
>> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvi...@datastax.com>
>> wrote:
>>
>>> Don't static columns get you what you want?
>>>
>>>
>>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
>>>  On Dec 22, 2014 10:50 PM, "David Broyles" <sj.clim...@gmail.com> wrote:
>>>
>>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3.  Pages
>>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest
>>>> new projects should use CQL3.
>>>>
>>>> I'm wondering, however, if there are certain use cases not well covered
>>>> by CQL3.  Consider the standard timeseries example:
>>>>
>>>> CREATE TABLE timeseries (
>>>>    event_type text,
>>>>    insertion_time timestamp,
>>>>    event blob,
>>>>    PRIMARY KEY (event_type, insertion_time)
>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>
>>>> What happens if I want to store additional information that is shared
>>>> by all events in the given series (but that I don't want to include in the
>>>> row ID): e.g. the event source, a cached count of the number of events
>>>> logged to date, etc.?  I might try updating the definition as follows:
>>>>
>>>> CREATE TABLE timeseries (
>>>>    event_type text,
>>>>       event_source text,
>>>>    total_events int,
>>>>    insertion_time timestamp,
>>>>    event blob,
>>>>    PRIMARY KEY (event_type, event_source, total_events, insertion_time)
>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>
>>>> Is this not inefficient?  When inserting or querying via CQL3, say in
>>>> batches of up to 1000 events, won't the type/source/count be repeated 1000
>>>> times?  Please let me know if I'm misunderstanding something, or if I
>>>> should be sticking to Thrift for situations like this involving mixed
>>>> static/dynamic data.
>>>>
>>>> Thanks!
>>>>
>>>
>>
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: CQL3 vs Thrift

Reply via email to