@Tushpin I like that approach, right now I think of that piece as the "StorageProxy". I agree, over the years people have take that approach. Solandra and is a good example and I am guessing DSE SOLR works this way. This says something about the entire "thrift vs cql" thing as there are clearly power users writing applications that use neither.
I do feel this vote was called to shoot down any attempt to add a feature that was non CQL. However if you think you can drive something like this forward more power to you I will help out. On Wed, Mar 12, 2014 at 12:11 PM, Tupshin Harper <tups...@tupshin.com>wrote: > I agree that we are way off the initial topic, but I think we are spot on > the most important topic. As seen in various tickets, including #6704 (wide > row scanners), #6167 (end-slice termination predicate), the existence > of intravert-ug (Cassandra interface to intravert), and a number of others, > there is an increasing desire to do more complicated processing, > server-side, on a Cassandra cluster. > > I very much share those goals, and would like to propose the following > only partially hand-wavey path forward. > > Instead of creating a pluggable interface for Thrift, I'd like to create a > pluggable interface for arbitrary app-server deep integration. > > Inspired by both the existence of intravert-ug, as well as there being a > long history of various parties embedding tomcat or jetty servlet engines > inside Cassandra, I'd like to propose the creation an internal somewhat > stable (versioned?) interface that could allow any app server to achieve > deep integration with Cassandra, and as a result, these servers could > 1) host their own apis (REST, for example > 2) extend core functionality by having limited (see triggers and wide row > scanners) access to the internals of cassandra > > The hand wavey part comes because while I have been mulling this about for > a while, I have not spent any significant time into looking at the actual > surface area of intravert-ug's integration. But, using it as a model, and > also keeping in minds the general needs of your more traditional > servlet/j2ee containers, I believe we could come up with a reasonable > interface to allow any jvm app server to be integrated and maintained in or > out of the Cassandra tree. > > This would satisfy the needs that many of us (Both Ed and I, for example) > to have a much greater degree of control over server side execution, and to > be able to start building much more interestingly (and simply) tiered > applications. > > Anybody interested in working on a coherent proposal with me? > > -Tupshin > > > On Wed, Mar 12, 2014 at 10:12 AM, Brian O'Neill <b...@alumni.brown.edu>wrote: > >> >> just when you thought the thread died... >> >> >> First, let me say we are *WAY* off topic. But that is a good thing. >> I love this community because there are a ton of passionate, smart >> people. (often with differing perspectives ;) >> >> RE: Reporting against C* (@Peter Lin) >> We've had the same experience. Pig + Hadoop is painful. We are >> experimenting with Spark/Shark, operating directly against the data. >> http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html >> >> The Shark layer gives you SQL and caching capabilities that make it easy >> to use and fast (for smaller data sets). In front of this, we are going to >> add dimensional aggregations so we can operate at larger scales. (then the >> Hive reports will run against the aggregations) >> >> RE: REST Server (@Russel Bradbury) >> We had moderate success with Virgil, which was a REST server built >> directly on Thrift. We built it directly on top of Thrift, so one day it >> could be easily embedded in the C* server itself. It could be deployed >> separately, or run an embedded C*. More often than not, we ended up >> running it separately to separate the layers. (just like Titan and >> Rexster) I've started on a rewrite of Virgil called Memnon that rides on >> top of CQL. (I'd love some help) >> https://github.com/boneill42/memnon >> >> RE: CQL vs. Thrift >> We've hitched our wagons to CQL. CQL != Relational. >> We've had success translating our "native" schemas into CQL, including >> all the NoSQL goodness of wide-rows, etc. You just need a good >> understanding of how things translate into storage and underlying CFs. If >> anything, I think we could add some DESCRIBE information, which would help >> users with this, along the lines of: >> (https://issues.apache.org/jira/browse/CASSANDRA-6676) >> >> CQL does open up the *opportunity* for users to articulate more complex >> queries using more familiar syntax. (including future things such as >> joins, grouping, etc.) To me, that is exciting, and again -- one of the >> reasons we are leaning on it. >> >> my two cents, >> brian >> >> --- >> >> Brian O'Neill >> >> Chief Technology Officer >> >> >> *Health Market Science* >> >> *The Science of Better Results* >> >> 2700 Horizon Drive * King of Prussia, PA * 19406 >> >> M: 215.588.6024 * @boneill42 <http://www.twitter.com/boneill42> * >> >> healthmarketscience.com >> >> >> This information transmitted in this email message is for the intended >> recipient only and may contain confidential and/or privileged material. If >> you received this email in error and are not the intended recipient, or the >> person responsible to deliver it to the intended recipient, please contact >> the sender at the email above and delete this email and any attachments and >> destroy any copies thereof. Any review, retransmission, dissemination, >> copying or other use of, or taking any action in reliance upon, this >> information by persons or entities other than the intended recipient is >> strictly prohibited. >> >> >> >> >> From: Peter Lin <wool...@gmail.com> >> Reply-To: <user@cassandra.apache.org> >> Date: Wednesday, March 12, 2014 at 8:44 AM >> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: Re: Proposal: freeze Thrift starting with 2.1.0 >> >> >> yes, I was looking at intravert last nite. >> >> For the kinds of reports my customers ask us to do, joins and subqueries >> are important. Having tried to do a simple join in PIG, the level of pain >> is high. I'm a masochist, so I don't mind breaking a simple join into >> multiple MR tasks, though I do find myself asking "why the hell does it >> need to be so painful in PIG?" Many of my friends say "what is this crap!" >> or "this is better than writing sql queries to run reports?" >> >> Plus, using ETL techniques to extract summaries only works for cases >> where the data is small enough. Once it gets beyond a certain size, it's >> not practical, which means we're back to crappy reporting languages that >> make life painful. Lots of big healthcare companies have thousands of MOLAP >> cubes on dozens of mainframes. The old OLTP -> DW/OLAP creates it's own set >> of management headaches. >> >> being able to report directly on the raw data avoids many of the issues, >> but that's my bias perspective. >> >> >> >> >> On Wed, Mar 12, 2014 at 8:15 AM, DuyHai Doan <doanduy...@gmail.com>wrote: >> >>> "I would love to see Cassandra get to the point where users can define >>> complex queries with subqueries, like, group by and joins" --> Did you have >>> a look at Intravert ? I think it does union & intersection on server side >>> for you. Not sure about join though.. >>> >>> >>> On Wed, Mar 12, 2014 at 12:44 PM, Peter Lin <wool...@gmail.com> wrote: >>> >>>> >>>> Hi Ed, >>>> >>>> I agree Solr is deeply integrated into DSE. I've looked at Solandra in >>>> the past and studied the code. >>>> >>>> My understanding is DSE uses Cassandra for storage and the user has >>>> both API available. I do think it can be integrated further to make >>>> moderate to complex queries easier and probably faster. That's why we built >>>> our own JPA-like object query API. I would love to see Cassandra get to the >>>> point where users can define complex queries with subqueries, like, group >>>> by and joins. Clearly lots of people want these features and even google >>>> built their own tools to do these types of queries. >>>> >>>> I see lots of people trying to improve this with Presto, Impala, drill, >>>> etc. To me, it's a natural progression as NoSql databases mature. For most >>>> people, at some point you want to be able to report/analyze the data. Today >>>> some people use MapReduce to summarize the data and ETL it into a >>>> relational database or OLAP database for reporting. Even though I don't >>>> need CAS or atomic batch for what I do in cassandra today, I'm sure in the >>>> future it will be handy. From my experience in the financial and insurance >>>> sector, features like CAS and "select for update" are important for the >>>> kinds of transactions they handle. I'm bias, these kinds of features are >>>> useful and good addition to cassandra. >>>> >>>> These are interesting times in database land! >>>> >>>> >>>> >>>> >>>> On Tue, Mar 11, 2014 at 10:57 PM, Edward Capriolo < >>>> edlinuxg...@gmail.com> wrote: >>>> >>>>> Peter, >>>>> Solr is deeply integrated into DSE. Seemingly this can not efficiently >>>>> be done client side (CQL/Thrift whatever) but the Solandra approach was to >>>>> embed Solr in Cassandra. I think that is actually the future client dev, >>>>> allowing users to embedded custom server side logic into there own API. >>>>> >>>>> Things like this take a while. Back in the day no one wanted cassandra >>>>> to be heavy-weight and rejected ideas like read-before write operations. >>>>> The common advice was "do them client side". Now in the case of >>>>> collections >>>>> sometimes they do read-before-write and it is the "stuff users want". >>>>> >>>>> >>>>> >>>>> On Tue, Mar 11, 2014 at 10:07 PM, Peter Lin <wool...@gmail.com> wrote: >>>>> >>>>>> >>>>>> I'll give you a concrete example. >>>>>> >>>>>> One of the things we often need to do is do a keyword search on >>>>>> unstructured text. What we did in our tooling is we combined solr with >>>>>> cassandra, but we put an Object API infront of it. The API is inspired by >>>>>> JPA, but designed specifically to fit our needs. >>>>>> >>>>>> the user can do queries with like %blah% and behind the scenes we >>>>>> issues a query to solr to find the keys and then query cassandra for the >>>>>> records. >>>>>> >>>>>> With plain Cassandra, the developer has to manually do all of this >>>>>> stuff and integrate solr. Then they have to know which system to query >>>>>> and >>>>>> in what order. Our tooling lets the user define the schema in a modeler. >>>>>> Once the model is done, it compiles the classes, configuration files, >>>>>> data >>>>>> access objects and unit tests. >>>>>> >>>>>> when the application makes a call, our query classes handle the >>>>>> details behind the scene. I know lots of people would like to see Solr >>>>>> integrated more deeply into Cassandra and CQL. I hope it happens in the >>>>>> future. If DataStax accepts my talk, we will be showing our temporal >>>>>> database and modeler in september. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 11, 2014 at 9:54 PM, Steven A Robenalt < >>>>>> srobe...@stanford.edu> wrote: >>>>>> >>>>>>> I should add that I'm not trying to ignite a flame war. Just trying >>>>>>> to understand your intentions. >>>>>>> >>>>>>> >>>>>>> On Tue, Mar 11, 2014 at 6:50 PM, Steven A Robenalt < >>>>>>> srobe...@stanford.edu> wrote: >>>>>>> >>>>>>>> Okay, I'm officially lost on this thread. If you plan on forking >>>>>>>> Cassandra to preserve and continue to enhance the Thrift interface, you >>>>>>>> would also want to add a bunch of relational features to CQL as part of >>>>>>>> that same fork? >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo < >>>>>>>> edlinuxg...@gmail.com> wrote: >>>>>>>> >>>>>>>>> "one of the things I'd like to see happen is for Cassandra to >>>>>>>>> support queries with disjunction, exist, subqueries, joins and like. >>>>>>>>> In >>>>>>>>> theory CQL could support these features in the future. Cassandra >>>>>>>>> would need >>>>>>>>> a new query compiler and query planner. I don't see how the current >>>>>>>>> design >>>>>>>>> could do these things without a significant redesign/enhancement. In >>>>>>>>> a past >>>>>>>>> life, I implemented an inference rule engine, so I've spent over >>>>>>>>> decade >>>>>>>>> studying and implementing query optimizers. All of these things can be >>>>>>>>> done, it's just a matter of people finding the time to do it." >>>>>>>>> >>>>>>>>> I see what your saying. CQL started as a way to make slice easier >>>>>>>>> but it is not even a query language, retrofitting these things is >>>>>>>>> going to >>>>>>>>> be very hard. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Mar 11, 2014 at 7:45 PM, Peter Lin <wool...@gmail.com>wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I have no problems maintain my own fork :) or joining others >>>>>>>>>> forking cassandra. >>>>>>>>>> >>>>>>>>>> I'd be happy to work with you or anyone else to add features to >>>>>>>>>> thrift. That's the great thing about open source. Each person can >>>>>>>>>> scratch a >>>>>>>>>> technical itch and do what they love. I see lots of potential for >>>>>>>>>> Cassandra >>>>>>>>>> and many of them include improving thrift to make it happen. Some of >>>>>>>>>> the >>>>>>>>>> features in theory "could" be done in CQL, but not with the current >>>>>>>>>> design. >>>>>>>>>> >>>>>>>>>> one of the things I'd like to see happen is for Cassandra to >>>>>>>>>> support queries with disjunction, exist, subqueries, joins and like. >>>>>>>>>> In >>>>>>>>>> theory CQL could support these features in the future. Cassandra >>>>>>>>>> would need >>>>>>>>>> a new query compiler and query planner. I don't see how the current >>>>>>>>>> design >>>>>>>>>> could do these things without a significant redesign/enhancement. In >>>>>>>>>> a past >>>>>>>>>> life, I implemented an inference rule engine, so I've spent over >>>>>>>>>> decade >>>>>>>>>> studying and implementing query optimizers. All of these things can >>>>>>>>>> be >>>>>>>>>> done, it's just a matter of people finding the time to do it. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Mar 11, 2014 at 6:17 PM, Edward Capriolo < >>>>>>>>>> edlinuxg...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Peter, >>>>>>>>>>> >>>>>>>>>>> My advice. Do not bother. I have become very active recently in >>>>>>>>>>> attempting to add features to thrift. I had 4 open tickets I was >>>>>>>>>>> actively >>>>>>>>>>> working on. (I even found two bugs in the Cassandra in the process). >>>>>>>>>>> >>>>>>>>>>> People were aware of this and still called this vote. Several >>>>>>>>>>> commit people have voted in a +1 and my -1 vote is non binding. It >>>>>>>>>>> is a >>>>>>>>>>> clear message: The committers are unwilling to accept new thrift >>>>>>>>>>> features >>>>>>>>>>> even if said features are contributed by others. >>>>>>>>>>> >>>>>>>>>>> Edward >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 11, 2014 at 5:51 PM, Peter Lin <wool...@gmail.com>wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> My bias opinion, just because some member of cassandra develop >>>>>>>>>>>> want to abandon Thrift, I see benefits of continuing to improve it. >>>>>>>>>>>> >>>>>>>>>>>> The great thing about open source is that as long as some >>>>>>>>>>>> people want to keep working on it and improve it, it can happen. I >>>>>>>>>>>> plan to >>>>>>>>>>>> do my best to keep Thrift going, since it gives me fine grain >>>>>>>>>>>> control that >>>>>>>>>>>> I want and need. If the ultimate goal of Cassandra is to be "as >>>>>>>>>>>> close to >>>>>>>>>>>> SQL" as practical, my bias take is use a NewSQL database that >>>>>>>>>>>> gives you the >>>>>>>>>>>> full power of subqueries, like, exists and disjunction. >>>>>>>>>>>> >>>>>>>>>>>> When customers ask me which database to choose and they really >>>>>>>>>>>> want Relational model, I tell them use NewSql. I love that >>>>>>>>>>>> Cassandra sits >>>>>>>>>>>> between NoSql and NewSql. There are things I do in Cassandra today >>>>>>>>>>>> that are >>>>>>>>>>>> much harder in NewSql or NoSql document databases. NewSql database >>>>>>>>>>>> can >>>>>>>>>>>> scale to similar sizes, so the "big" part of big data won't be a >>>>>>>>>>>> significant advantage forever. Looking at some of the recent NewSql >>>>>>>>>>>> performance numbers, it's clear the gap is closing. >>>>>>>>>>>> >>>>>>>>>>>> peter >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Mar 11, 2014 at 3:59 PM, Tyler Hobbs < >>>>>>>>>>>> ty...@datastax.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Mar 11, 2014 at 2:41 PM, Shao-Chuan Wang < >>>>>>>>>>>>> shaochuan.w...@bloomreach.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> So, does anyone know how to do "describing the splits" and >>>>>>>>>>>>>> "describing the local rings" using native protocol? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> For a ring description, you would do something like "select >>>>>>>>>>>>> peer, tokens from system.peers". I'm not sure about >>>>>>>>>>>>> describe_splits(). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also, cqlsh uses python client, which is talking via thrift >>>>>>>>>>>>>> protocol too. Does it mean that it will be migrated to native >>>>>>>>>>>>>> protocol soon >>>>>>>>>>>>>> as well? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yes: https://issues.apache.org/jira/browse/CASSANDRA-6307 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Tyler Hobbs >>>>>>>>>>>>> DataStax <http://datastax.com/> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Steve Robenalt >>>>>>>> Software Architect >>>>>>>> HighWire | Stanford University >>>>>>>> 425 Broadway St, Redwood City, CA 94063 >>>>>>>> >>>>>>>> srobe...@stanford.edu >>>>>>>> http://highwire.stanford.edu >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Steve Robenalt >>>>>>> Software Architect >>>>>>> HighWire | Stanford University >>>>>>> 425 Broadway St, Redwood City, CA 94063 >>>>>>> >>>>>>> srobe...@stanford.edu >>>>>>> http://highwire.stanford.edu >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >