"I would love to see Cassandra get to the point where users can define
complex queries with subqueries, like, group by and joins" --> Did you have
a look at Intravert ? I think it does union & intersection on server side
for you. Not sure about join though..


On Wed, Mar 12, 2014 at 12:44 PM, Peter Lin <wool...@gmail.com> wrote:

>
> Hi Ed,
>
> I agree Solr is deeply integrated into DSE. I've looked at Solandra in the
> past and studied the code.
>
> My understanding is DSE uses Cassandra for storage and the user has both
> API available. I do think it can be integrated further to make moderate to
> complex queries easier and probably faster. That's why we built our own
> JPA-like object query API. I would love to see Cassandra get to the point
> where users can define complex queries with subqueries, like, group by and
> joins. Clearly lots of people want these features and even google built
> their own tools to do these types of queries.
>
> I see lots of people trying to improve this with Presto, Impala, drill,
> etc. To me, it's a natural progression as NoSql databases mature. For most
> people, at some point you want to be able to report/analyze the data. Today
> some people use MapReduce to summarize the data and ETL it into a
> relational database or OLAP database for reporting. Even though I don't
> need CAS or atomic batch for what I do in cassandra today, I'm sure in the
> future it will be handy. From my experience in the financial and insurance
> sector, features like CAS and "select for update" are important for the
> kinds of transactions they handle. I'm bias, these kinds of features are
> useful and good addition to cassandra.
>
> These are interesting times in database land!
>
>
>
>
> On Tue, Mar 11, 2014 at 10:57 PM, Edward Capriolo 
> <edlinuxg...@gmail.com>wrote:
>
>> Peter,
>> Solr is deeply integrated into DSE. Seemingly this can not efficiently be
>> done client side (CQL/Thrift whatever) but the Solandra approach was to
>> embed Solr in Cassandra. I think that is actually the future client dev,
>> allowing users to embedded custom server side logic into there own API.
>>
>> Things like this take a while. Back in the day no one wanted cassandra to
>> be heavy-weight and rejected ideas like read-before write operations. The
>> common advice was "do them client side". Now in the case of collections
>> sometimes they do read-before-write and it is the "stuff users want".
>>
>>
>>
>> On Tue, Mar 11, 2014 at 10:07 PM, Peter Lin <wool...@gmail.com> wrote:
>>
>>>
>>> I'll give you a concrete example.
>>>
>>> One of the things we often need to do is do a keyword search on
>>> unstructured text. What we did in our tooling is we combined solr with
>>> cassandra, but we put an Object API infront of it. The API is inspired by
>>> JPA, but designed specifically to fit our needs.
>>>
>>> the user can do queries with like %blah% and behind the scenes we issues
>>> a query to solr to find the keys and then query cassandra for the records.
>>>
>>> With plain Cassandra, the developer has to manually do all of this stuff
>>> and integrate solr. Then they have to know which system to query and in
>>> what order.  Our tooling lets the user define the schema in a modeler. Once
>>> the model is done, it compiles the classes, configuration files, data
>>> access objects and unit tests.
>>>
>>> when the application makes a call, our query classes handle the details
>>> behind the scene. I know lots of people would like to see Solr integrated
>>> more deeply into Cassandra and CQL. I hope it happens in the future. If
>>> DataStax accepts my talk, we will be showing our temporal database and
>>> modeler in september.
>>>
>>>
>>>
>>>
>>> On Tue, Mar 11, 2014 at 9:54 PM, Steven A Robenalt <
>>> srobe...@stanford.edu> wrote:
>>>
>>>> I should add that I'm not trying to ignite a flame war. Just trying to
>>>> understand your intentions.
>>>>
>>>>
>>>> On Tue, Mar 11, 2014 at 6:50 PM, Steven A Robenalt <
>>>> srobe...@stanford.edu> wrote:
>>>>
>>>>> Okay, I'm officially lost on this thread. If you plan on forking
>>>>> Cassandra to preserve and continue to enhance the Thrift interface, you
>>>>> would also want to add a bunch of relational features to CQL as part of
>>>>> that same fork?
>>>>>
>>>>>
>>>>> On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo <
>>>>> edlinuxg...@gmail.com> wrote:
>>>>>
>>>>>> "one of the things I'd like to see happen is for Cassandra to support
>>>>>> queries with disjunction, exist, subqueries, joins and like. In theory 
>>>>>> CQL
>>>>>> could support these features in the future. Cassandra would need a new
>>>>>> query compiler and query planner. I don't see how the current design 
>>>>>> could
>>>>>> do these things without a significant redesign/enhancement. In a past 
>>>>>> life,
>>>>>> I implemented an inference rule engine, so I've spent over decade 
>>>>>> studying
>>>>>> and implementing query optimizers. All of these things can be done, it's
>>>>>> just a matter of people finding the time to do it."
>>>>>>
>>>>>> I see what your saying. CQL started as a way to make slice easier but
>>>>>> it is not even a query language, retrofitting these things is going to be
>>>>>> very hard.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 11, 2014 at 7:45 PM, Peter Lin <wool...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> I have no problems maintain my own fork :) or joining others forking
>>>>>>> cassandra.
>>>>>>>
>>>>>>> I'd be happy to work with you or anyone else to add features to
>>>>>>> thrift. That's the great thing about open source. Each person can 
>>>>>>> scratch a
>>>>>>> technical itch and do what they love. I see lots of potential for 
>>>>>>> Cassandra
>>>>>>> and many of them include improving thrift to make it happen. Some of the
>>>>>>> features in theory "could" be done in CQL, but not with the current 
>>>>>>> design.
>>>>>>>
>>>>>>> one of the things I'd like to see happen is for Cassandra to support
>>>>>>> queries with disjunction, exist, subqueries, joins and like. In theory 
>>>>>>> CQL
>>>>>>> could support these features in the future. Cassandra would need a new
>>>>>>> query compiler and query planner. I don't see how the current design 
>>>>>>> could
>>>>>>> do these things without a significant redesign/enhancement. In a past 
>>>>>>> life,
>>>>>>> I implemented an inference rule engine, so I've spent over decade 
>>>>>>> studying
>>>>>>> and implementing query optimizers. All of these things can be done, it's
>>>>>>> just a matter of people finding the time to do it.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 11, 2014 at 6:17 PM, Edward Capriolo <
>>>>>>> edlinuxg...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Peter,
>>>>>>>>
>>>>>>>> My advice. Do not bother. I have become very active recently in
>>>>>>>> attempting to add features to thrift. I had 4 open tickets I was 
>>>>>>>> actively
>>>>>>>> working on. (I even found two bugs in the Cassandra in the process).
>>>>>>>>
>>>>>>>> People were aware of this and still called this vote. Several
>>>>>>>> commit people have voted in a +1 and my -1 vote is non binding. It is a
>>>>>>>> clear message: The committers are unwilling to accept new thrift 
>>>>>>>> features
>>>>>>>> even if said features are contributed by others.
>>>>>>>>
>>>>>>>> Edward
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 11, 2014 at 5:51 PM, Peter Lin <wool...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> My bias opinion, just because some member of cassandra develop
>>>>>>>>> want to abandon Thrift, I see benefits of continuing to improve it.
>>>>>>>>>
>>>>>>>>> The great thing about open source is that as long as some people
>>>>>>>>> want to keep working on it and improve it, it can happen. I plan to 
>>>>>>>>> do my
>>>>>>>>> best to keep Thrift going, since it gives me fine grain control that 
>>>>>>>>> I want
>>>>>>>>> and need. If the ultimate goal of Cassandra is to be "as close to 
>>>>>>>>> SQL" as
>>>>>>>>> practical, my bias take is use a NewSQL database that gives you the 
>>>>>>>>> full
>>>>>>>>> power of subqueries, like, exists and disjunction.
>>>>>>>>>
>>>>>>>>> When customers ask me which database to choose and they really
>>>>>>>>> want Relational model, I tell them use NewSql. I love that Cassandra 
>>>>>>>>> sits
>>>>>>>>> between NoSql and NewSql. There are things I do in Cassandra today 
>>>>>>>>> that are
>>>>>>>>> much harder in NewSql or NoSql document databases. NewSql database can
>>>>>>>>> scale to similar sizes, so the "big" part of big data won't be a
>>>>>>>>> significant advantage forever. Looking at some of the recent NewSql
>>>>>>>>> performance numbers, it's clear the gap is closing.
>>>>>>>>>
>>>>>>>>> peter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 11, 2014 at 3:59 PM, Tyler Hobbs 
>>>>>>>>> <ty...@datastax.com>wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 11, 2014 at 2:41 PM, Shao-Chuan Wang <
>>>>>>>>>> shaochuan.w...@bloomreach.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So, does anyone know how to do "describing the splits" and
>>>>>>>>>>> "describing the local rings" using native protocol?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For a ring description, you would do something like "select peer,
>>>>>>>>>> tokens from system.peers".  I'm not sure about describe_splits().
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Also, cqlsh uses python client, which is talking via thrift
>>>>>>>>>>> protocol too. Does it mean that it will be migrated to native 
>>>>>>>>>>> protocol soon
>>>>>>>>>>> as well?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes: https://issues.apache.org/jira/browse/CASSANDRA-6307
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Tyler Hobbs
>>>>>>>>>> DataStax <http://datastax.com/>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Steve Robenalt
>>>>> Software Architect
>>>>>  HighWire | Stanford University
>>>>> 425 Broadway St, Redwood City, CA 94063
>>>>>
>>>>> srobe...@stanford.edu
>>>>> http://highwire.stanford.edu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Steve Robenalt
>>>> Software Architect
>>>> HighWire | Stanford University
>>>> 425 Broadway St, Redwood City, CA 94063
>>>>
>>>> srobe...@stanford.edu
>>>> http://highwire.stanford.edu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to