"I would love to see Cassandra get to the point where users can define complex queries with subqueries, like, group by and joins" --> Did you have a look at Intravert ? I think it does union & intersection on server side for you. Not sure about join though..
On Wed, Mar 12, 2014 at 12:44 PM, Peter Lin <wool...@gmail.com> wrote: > > Hi Ed, > > I agree Solr is deeply integrated into DSE. I've looked at Solandra in the > past and studied the code. > > My understanding is DSE uses Cassandra for storage and the user has both > API available. I do think it can be integrated further to make moderate to > complex queries easier and probably faster. That's why we built our own > JPA-like object query API. I would love to see Cassandra get to the point > where users can define complex queries with subqueries, like, group by and > joins. Clearly lots of people want these features and even google built > their own tools to do these types of queries. > > I see lots of people trying to improve this with Presto, Impala, drill, > etc. To me, it's a natural progression as NoSql databases mature. For most > people, at some point you want to be able to report/analyze the data. Today > some people use MapReduce to summarize the data and ETL it into a > relational database or OLAP database for reporting. Even though I don't > need CAS or atomic batch for what I do in cassandra today, I'm sure in the > future it will be handy. From my experience in the financial and insurance > sector, features like CAS and "select for update" are important for the > kinds of transactions they handle. I'm bias, these kinds of features are > useful and good addition to cassandra. > > These are interesting times in database land! > > > > > On Tue, Mar 11, 2014 at 10:57 PM, Edward Capriolo > <edlinuxg...@gmail.com>wrote: > >> Peter, >> Solr is deeply integrated into DSE. Seemingly this can not efficiently be >> done client side (CQL/Thrift whatever) but the Solandra approach was to >> embed Solr in Cassandra. I think that is actually the future client dev, >> allowing users to embedded custom server side logic into there own API. >> >> Things like this take a while. Back in the day no one wanted cassandra to >> be heavy-weight and rejected ideas like read-before write operations. The >> common advice was "do them client side". Now in the case of collections >> sometimes they do read-before-write and it is the "stuff users want". >> >> >> >> On Tue, Mar 11, 2014 at 10:07 PM, Peter Lin <wool...@gmail.com> wrote: >> >>> >>> I'll give you a concrete example. >>> >>> One of the things we often need to do is do a keyword search on >>> unstructured text. What we did in our tooling is we combined solr with >>> cassandra, but we put an Object API infront of it. The API is inspired by >>> JPA, but designed specifically to fit our needs. >>> >>> the user can do queries with like %blah% and behind the scenes we issues >>> a query to solr to find the keys and then query cassandra for the records. >>> >>> With plain Cassandra, the developer has to manually do all of this stuff >>> and integrate solr. Then they have to know which system to query and in >>> what order. Our tooling lets the user define the schema in a modeler. Once >>> the model is done, it compiles the classes, configuration files, data >>> access objects and unit tests. >>> >>> when the application makes a call, our query classes handle the details >>> behind the scene. I know lots of people would like to see Solr integrated >>> more deeply into Cassandra and CQL. I hope it happens in the future. If >>> DataStax accepts my talk, we will be showing our temporal database and >>> modeler in september. >>> >>> >>> >>> >>> On Tue, Mar 11, 2014 at 9:54 PM, Steven A Robenalt < >>> srobe...@stanford.edu> wrote: >>> >>>> I should add that I'm not trying to ignite a flame war. Just trying to >>>> understand your intentions. >>>> >>>> >>>> On Tue, Mar 11, 2014 at 6:50 PM, Steven A Robenalt < >>>> srobe...@stanford.edu> wrote: >>>> >>>>> Okay, I'm officially lost on this thread. If you plan on forking >>>>> Cassandra to preserve and continue to enhance the Thrift interface, you >>>>> would also want to add a bunch of relational features to CQL as part of >>>>> that same fork? >>>>> >>>>> >>>>> On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo < >>>>> edlinuxg...@gmail.com> wrote: >>>>> >>>>>> "one of the things I'd like to see happen is for Cassandra to support >>>>>> queries with disjunction, exist, subqueries, joins and like. In theory >>>>>> CQL >>>>>> could support these features in the future. Cassandra would need a new >>>>>> query compiler and query planner. I don't see how the current design >>>>>> could >>>>>> do these things without a significant redesign/enhancement. In a past >>>>>> life, >>>>>> I implemented an inference rule engine, so I've spent over decade >>>>>> studying >>>>>> and implementing query optimizers. All of these things can be done, it's >>>>>> just a matter of people finding the time to do it." >>>>>> >>>>>> I see what your saying. CQL started as a way to make slice easier but >>>>>> it is not even a query language, retrofitting these things is going to be >>>>>> very hard. >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 11, 2014 at 7:45 PM, Peter Lin <wool...@gmail.com> wrote: >>>>>> >>>>>>> >>>>>>> I have no problems maintain my own fork :) or joining others forking >>>>>>> cassandra. >>>>>>> >>>>>>> I'd be happy to work with you or anyone else to add features to >>>>>>> thrift. That's the great thing about open source. Each person can >>>>>>> scratch a >>>>>>> technical itch and do what they love. I see lots of potential for >>>>>>> Cassandra >>>>>>> and many of them include improving thrift to make it happen. Some of the >>>>>>> features in theory "could" be done in CQL, but not with the current >>>>>>> design. >>>>>>> >>>>>>> one of the things I'd like to see happen is for Cassandra to support >>>>>>> queries with disjunction, exist, subqueries, joins and like. In theory >>>>>>> CQL >>>>>>> could support these features in the future. Cassandra would need a new >>>>>>> query compiler and query planner. I don't see how the current design >>>>>>> could >>>>>>> do these things without a significant redesign/enhancement. In a past >>>>>>> life, >>>>>>> I implemented an inference rule engine, so I've spent over decade >>>>>>> studying >>>>>>> and implementing query optimizers. All of these things can be done, it's >>>>>>> just a matter of people finding the time to do it. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Mar 11, 2014 at 6:17 PM, Edward Capriolo < >>>>>>> edlinuxg...@gmail.com> wrote: >>>>>>> >>>>>>>> Peter, >>>>>>>> >>>>>>>> My advice. Do not bother. I have become very active recently in >>>>>>>> attempting to add features to thrift. I had 4 open tickets I was >>>>>>>> actively >>>>>>>> working on. (I even found two bugs in the Cassandra in the process). >>>>>>>> >>>>>>>> People were aware of this and still called this vote. Several >>>>>>>> commit people have voted in a +1 and my -1 vote is non binding. It is a >>>>>>>> clear message: The committers are unwilling to accept new thrift >>>>>>>> features >>>>>>>> even if said features are contributed by others. >>>>>>>> >>>>>>>> Edward >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Mar 11, 2014 at 5:51 PM, Peter Lin <wool...@gmail.com>wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> My bias opinion, just because some member of cassandra develop >>>>>>>>> want to abandon Thrift, I see benefits of continuing to improve it. >>>>>>>>> >>>>>>>>> The great thing about open source is that as long as some people >>>>>>>>> want to keep working on it and improve it, it can happen. I plan to >>>>>>>>> do my >>>>>>>>> best to keep Thrift going, since it gives me fine grain control that >>>>>>>>> I want >>>>>>>>> and need. If the ultimate goal of Cassandra is to be "as close to >>>>>>>>> SQL" as >>>>>>>>> practical, my bias take is use a NewSQL database that gives you the >>>>>>>>> full >>>>>>>>> power of subqueries, like, exists and disjunction. >>>>>>>>> >>>>>>>>> When customers ask me which database to choose and they really >>>>>>>>> want Relational model, I tell them use NewSql. I love that Cassandra >>>>>>>>> sits >>>>>>>>> between NoSql and NewSql. There are things I do in Cassandra today >>>>>>>>> that are >>>>>>>>> much harder in NewSql or NoSql document databases. NewSql database can >>>>>>>>> scale to similar sizes, so the "big" part of big data won't be a >>>>>>>>> significant advantage forever. Looking at some of the recent NewSql >>>>>>>>> performance numbers, it's clear the gap is closing. >>>>>>>>> >>>>>>>>> peter >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Mar 11, 2014 at 3:59 PM, Tyler Hobbs >>>>>>>>> <ty...@datastax.com>wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Mar 11, 2014 at 2:41 PM, Shao-Chuan Wang < >>>>>>>>>> shaochuan.w...@bloomreach.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> So, does anyone know how to do "describing the splits" and >>>>>>>>>>> "describing the local rings" using native protocol? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> For a ring description, you would do something like "select peer, >>>>>>>>>> tokens from system.peers". I'm not sure about describe_splits(). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Also, cqlsh uses python client, which is talking via thrift >>>>>>>>>>> protocol too. Does it mean that it will be migrated to native >>>>>>>>>>> protocol soon >>>>>>>>>>> as well? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes: https://issues.apache.org/jira/browse/CASSANDRA-6307 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Tyler Hobbs >>>>>>>>>> DataStax <http://datastax.com/> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Steve Robenalt >>>>> Software Architect >>>>> HighWire | Stanford University >>>>> 425 Broadway St, Redwood City, CA 94063 >>>>> >>>>> srobe...@stanford.edu >>>>> http://highwire.stanford.edu >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Steve Robenalt >>>> Software Architect >>>> HighWire | Stanford University >>>> 425 Broadway St, Redwood City, CA 94063 >>>> >>>> srobe...@stanford.edu >>>> http://highwire.stanford.edu >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >