Re: Proposal: freeze Thrift starting with 2.1.0

Edward Capriolo Wed, 12 Mar 2014 09:43:39 -0700

@Tushpin

I like that approach, right now I think of that piece as the
"StorageProxy". I agree, over the years people have take that approach.
Solandra and is a good example and I am guessing DSE SOLR works this way.
This says something about the entire "thrift vs cql" thing as there are
clearly power users writing applications that use neither.


I do feel this vote was called to shoot down any attempt to add a feature
that was non CQL. However if you think you can drive something like this
forward more power to you I will help out.





On Wed, Mar 12, 2014 at 12:11 PM, Tupshin Harper <tups...@tupshin.com>wrote:

> I agree that we are way off the initial topic, but I think we are spot on
> the most important topic. As seen in various tickets, including #6704 (wide
> row scanners), #6167 (end-slice termination predicate), the existence
> of intravert-ug (Cassandra interface to intravert), and a number of others,
> there is an increasing desire to do more complicated processing,
> server-side, on a Cassandra cluster.
>
> I very much share those goals, and would like to propose the following
> only partially hand-wavey path forward.
>
> Instead of creating a pluggable interface for Thrift, I'd like to create a
> pluggable interface for arbitrary app-server deep integration.
>
> Inspired by both the existence of intravert-ug, as well as there being a
> long history of various parties embedding tomcat or jetty servlet engines
> inside Cassandra, I'd like to propose the creation an internal somewhat
> stable (versioned?) interface that could allow any app server to achieve
> deep integration with Cassandra, and as a result, these servers could
> 1) host their own apis (REST, for example
> 2) extend core functionality by having limited (see triggers and wide row
> scanners) access to the internals of cassandra
>
> The hand wavey part comes because while I have been mulling this about for
> a while, I have not spent any significant time into looking at the actual
> surface area of intravert-ug's integration. But, using it as a model, and
> also keeping in minds the general needs of your more traditional
> servlet/j2ee containers, I believe we could come up with a reasonable
> interface to allow any jvm app server to be integrated and maintained in or
> out of the Cassandra tree.
>
> This would satisfy the needs that many of us (Both Ed and I, for example)
> to have a much greater degree of control over server side execution, and to
> be able to start building much more interestingly (and simply) tiered
> applications.
>
> Anybody interested in working on a coherent proposal with me?
>
> -Tupshin
>
>
> On Wed, Mar 12, 2014 at 10:12 AM, Brian O'Neill <b...@alumni.brown.edu>wrote:
>
>>
>> just when you thought the thread died...
>>
>>
>> First, let me say we are *WAY* off topic.  But that is a good thing.
>> I love this community because there are a ton of passionate, smart
>> people. (often with differing perspectives ;)
>>
>> RE: Reporting against C* (@Peter Lin)
>> We've had the same experience.  Pig + Hadoop is painful.  We are
>> experimenting with Spark/Shark, operating directly against the data.
>> http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html
>>
>> The Shark layer gives you SQL and caching capabilities that make it easy
>> to use and fast (for smaller data sets).  In front of this, we are going to
>> add dimensional aggregations so we can operate at larger scales.  (then the
>> Hive reports will run against the aggregations)
>>
>> RE: REST Server (@Russel Bradbury)
>> We had moderate success with Virgil, which was a REST server built
>> directly on Thrift.  We built it directly on top of Thrift, so one day it
>> could be easily embedded in the C* server itself.   It could be deployed
>> separately, or run an embedded C*.  More often than not, we ended up
>> running it separately to separate the layers.  (just like Titan and
>> Rexster)  I've started on a rewrite of Virgil called Memnon that rides on
>> top of CQL. (I'd love some help)
>> https://github.com/boneill42/memnon
>>
>> RE: CQL vs. Thrift
>> We've hitched our wagons to CQL.  CQL != Relational.
>> We've had success translating our "native" schemas into CQL, including
>> all the NoSQL goodness of wide-rows, etc.  You just need a good
>> understanding of how things translate into storage and underlying CFs.  If
>> anything, I think we could add some DESCRIBE information, which would help
>> users with this, along the lines of:
>> (https://issues.apache.org/jira/browse/CASSANDRA-6676)
>>
>> CQL does open up the *opportunity* for users to articulate more complex
>> queries using more familiar syntax.  (including future things such as
>> joins, grouping, etc.)   To me, that is exciting, and again -- one of the
>> reasons we are leaning on it.
>>
>> my two cents,
>> brian
>>
>> ---
>>
>> Brian O'Neill
>>
>> Chief Technology Officer
>>
>>
>> *Health Market Science*
>>
>> *The Science of Better Results*
>>
>> 2700 Horizon Drive * King of Prussia, PA * 19406
>>
>> M: 215.588.6024 * @boneill42 <http://www.twitter.com/boneill42>  *
>>
>> healthmarketscience.com
>>
>>
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material. If
>> you received this email in error and are not the intended recipient, or the
>> person responsible to deliver it to the intended recipient, please contact
>> the sender at the email above and delete this email and any attachments and
>> destroy any copies thereof. Any review, retransmission, dissemination,
>> copying or other use of, or taking any action in reliance upon, this
>> information by persons or entities other than the intended recipient is
>> strictly prohibited.
>>
>>
>>
>>
>> From: Peter Lin <wool...@gmail.com>
>> Reply-To: <user@cassandra.apache.org>
>> Date: Wednesday, March 12, 2014 at 8:44 AM
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Proposal: freeze Thrift starting with 2.1.0
>>
>>
>> yes, I was looking at intravert last nite.
>>
>> For the kinds of reports my customers ask us to do, joins and subqueries
>> are important. Having tried to do a simple join in PIG, the level of pain
>> is  high. I'm a masochist, so I don't mind breaking a simple join into
>> multiple MR tasks, though I do find myself asking "why the hell does it
>> need to be so painful in PIG?" Many of my friends say "what is this crap!"
>> or "this is better than writing sql queries to run reports?"
>>
>> Plus, using ETL techniques to extract summaries only works for cases
>> where the data is small enough. Once it gets beyond a certain size, it's
>> not practical, which means we're back to crappy reporting languages that
>> make life painful. Lots of big healthcare companies have thousands of MOLAP
>> cubes on dozens of mainframes. The old OLTP -> DW/OLAP creates it's own set
>> of management headaches.
>>
>> being able to report directly on the raw data avoids many of the issues,
>> but that's my bias perspective.
>>
>>
>>
>>
>> On Wed, Mar 12, 2014 at 8:15 AM, DuyHai Doan <doanduy...@gmail.com>wrote:
>>
>>> "I would love to see Cassandra get to the point where users can define
>>> complex queries with subqueries, like, group by and joins" --> Did you have
>>> a look at Intravert ? I think it does union & intersection on server side
>>> for you. Not sure about join though..
>>>
>>>
>>> On Wed, Mar 12, 2014 at 12:44 PM, Peter Lin <wool...@gmail.com> wrote:
>>>
>>>>
>>>> Hi Ed,
>>>>
>>>> I agree Solr is deeply integrated into DSE. I've looked at Solandra in
>>>> the past and studied the code.
>>>>
>>>> My understanding is DSE uses Cassandra for storage and the user has
>>>> both API available. I do think it can be integrated further to make
>>>> moderate to complex queries easier and probably faster. That's why we built
>>>> our own JPA-like object query API. I would love to see Cassandra get to the
>>>> point where users can define complex queries with subqueries, like, group
>>>> by and joins. Clearly lots of people want these features and even google
>>>> built their own tools to do these types of queries.
>>>>
>>>> I see lots of people trying to improve this with Presto, Impala, drill,
>>>> etc. To me, it's a natural progression as NoSql databases mature. For most
>>>> people, at some point you want to be able to report/analyze the data. Today
>>>> some people use MapReduce to summarize the data and ETL it into a
>>>> relational database or OLAP database for reporting. Even though I don't
>>>> need CAS or atomic batch for what I do in cassandra today, I'm sure in the
>>>> future it will be handy. From my experience in the financial and insurance
>>>> sector, features like CAS and "select for update" are important for the
>>>> kinds of transactions they handle. I'm bias, these kinds of features are
>>>> useful and good addition to cassandra.
>>>>
>>>> These are interesting times in database land!
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 11, 2014 at 10:57 PM, Edward Capriolo <
>>>> edlinuxg...@gmail.com> wrote:
>>>>
>>>>> Peter,
>>>>> Solr is deeply integrated into DSE. Seemingly this can not efficiently
>>>>> be done client side (CQL/Thrift whatever) but the Solandra approach was to
>>>>> embed Solr in Cassandra. I think that is actually the future client dev,
>>>>> allowing users to embedded custom server side logic into there own API.
>>>>>
>>>>> Things like this take a while. Back in the day no one wanted cassandra
>>>>> to be heavy-weight and rejected ideas like read-before write operations.
>>>>> The common advice was "do them client side". Now in the case of 
>>>>> collections
>>>>> sometimes they do read-before-write and it is the "stuff users want".
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 11, 2014 at 10:07 PM, Peter Lin <wool...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> I'll give you a concrete example.
>>>>>>
>>>>>> One of the things we often need to do is do a keyword search on
>>>>>> unstructured text. What we did in our tooling is we combined solr with
>>>>>> cassandra, but we put an Object API infront of it. The API is inspired by
>>>>>> JPA, but designed specifically to fit our needs.
>>>>>>
>>>>>> the user can do queries with like %blah% and behind the scenes we
>>>>>> issues a query to solr to find the keys and then query cassandra for the
>>>>>> records.
>>>>>>
>>>>>> With plain Cassandra, the developer has to manually do all of this
>>>>>> stuff and integrate solr. Then they have to know which system to query 
>>>>>> and
>>>>>> in what order.  Our tooling lets the user define the schema in a modeler.
>>>>>> Once the model is done, it compiles the classes, configuration files, 
>>>>>> data
>>>>>> access objects and unit tests.
>>>>>>
>>>>>> when the application makes a call, our query classes handle the
>>>>>> details behind the scene. I know lots of people would like to see Solr
>>>>>> integrated more deeply into Cassandra and CQL. I hope it happens in the
>>>>>> future. If DataStax accepts my talk, we will be showing our temporal
>>>>>> database and modeler in september.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 11, 2014 at 9:54 PM, Steven A Robenalt <
>>>>>> srobe...@stanford.edu> wrote:
>>>>>>
>>>>>>> I should add that I'm not trying to ignite a flame war. Just trying
>>>>>>> to understand your intentions.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 11, 2014 at 6:50 PM, Steven A Robenalt <
>>>>>>> srobe...@stanford.edu> wrote:
>>>>>>>
>>>>>>>> Okay, I'm officially lost on this thread. If you plan on forking
>>>>>>>> Cassandra to preserve and continue to enhance the Thrift interface, you
>>>>>>>> would also want to add a bunch of relational features to CQL as part of
>>>>>>>> that same fork?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo <
>>>>>>>> edlinuxg...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> "one of the things I'd like to see happen is for Cassandra to
>>>>>>>>> support queries with disjunction, exist, subqueries, joins and like. 
>>>>>>>>> In
>>>>>>>>> theory CQL could support these features in the future. Cassandra 
>>>>>>>>> would need
>>>>>>>>> a new query compiler and query planner. I don't see how the current 
>>>>>>>>> design
>>>>>>>>> could do these things without a significant redesign/enhancement. In 
>>>>>>>>> a past
>>>>>>>>> life, I implemented an inference rule engine, so I've spent over 
>>>>>>>>> decade
>>>>>>>>> studying and implementing query optimizers. All of these things can be
>>>>>>>>> done, it's just a matter of people finding the time to do it."
>>>>>>>>>
>>>>>>>>> I see what your saying. CQL started as a way to make slice easier
>>>>>>>>> but it is not even a query language, retrofitting these things is 
>>>>>>>>> going to
>>>>>>>>> be very hard.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 11, 2014 at 7:45 PM, Peter Lin <wool...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have no problems maintain my own fork :) or joining others
>>>>>>>>>> forking cassandra.
>>>>>>>>>>
>>>>>>>>>> I'd be happy to work with you or anyone else to add features to
>>>>>>>>>> thrift. That's the great thing about open source. Each person can 
>>>>>>>>>> scratch a
>>>>>>>>>> technical itch and do what they love. I see lots of potential for 
>>>>>>>>>> Cassandra
>>>>>>>>>> and many of them include improving thrift to make it happen. Some of 
>>>>>>>>>> the
>>>>>>>>>> features in theory "could" be done in CQL, but not with the current 
>>>>>>>>>> design.
>>>>>>>>>>
>>>>>>>>>> one of the things I'd like to see happen is for Cassandra to
>>>>>>>>>> support queries with disjunction, exist, subqueries, joins and like. 
>>>>>>>>>> In
>>>>>>>>>> theory CQL could support these features in the future. Cassandra 
>>>>>>>>>> would need
>>>>>>>>>> a new query compiler and query planner. I don't see how the current 
>>>>>>>>>> design
>>>>>>>>>> could do these things without a significant redesign/enhancement. In 
>>>>>>>>>> a past
>>>>>>>>>> life, I implemented an inference rule engine, so I've spent over 
>>>>>>>>>> decade
>>>>>>>>>> studying and implementing query optimizers. All of these things can 
>>>>>>>>>> be
>>>>>>>>>> done, it's just a matter of people finding the time to do it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 11, 2014 at 6:17 PM, Edward Capriolo <
>>>>>>>>>> edlinuxg...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Peter,
>>>>>>>>>>>
>>>>>>>>>>> My advice. Do not bother. I have become very active recently in
>>>>>>>>>>> attempting to add features to thrift. I had 4 open tickets I was 
>>>>>>>>>>> actively
>>>>>>>>>>> working on. (I even found two bugs in the Cassandra in the process).
>>>>>>>>>>>
>>>>>>>>>>> People were aware of this and still called this vote. Several
>>>>>>>>>>> commit people have voted in a +1 and my -1 vote is non binding. It 
>>>>>>>>>>> is a
>>>>>>>>>>> clear message: The committers are unwilling to accept new thrift 
>>>>>>>>>>> features
>>>>>>>>>>> even if said features are contributed by others.
>>>>>>>>>>>
>>>>>>>>>>> Edward
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 11, 2014 at 5:51 PM, Peter Lin <wool...@gmail.com>wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> My bias opinion, just because some member of cassandra develop
>>>>>>>>>>>> want to abandon Thrift, I see benefits of continuing to improve it.
>>>>>>>>>>>>
>>>>>>>>>>>> The great thing about open source is that as long as some
>>>>>>>>>>>> people want to keep working on it and improve it, it can happen. I 
>>>>>>>>>>>> plan to
>>>>>>>>>>>> do my best to keep Thrift going, since it gives me fine grain 
>>>>>>>>>>>> control that
>>>>>>>>>>>> I want and need. If the ultimate goal of Cassandra is to be "as 
>>>>>>>>>>>> close to
>>>>>>>>>>>> SQL" as practical, my bias take is use a NewSQL database that 
>>>>>>>>>>>> gives you the
>>>>>>>>>>>> full power of subqueries, like, exists and disjunction.
>>>>>>>>>>>>
>>>>>>>>>>>> When customers ask me which database to choose and they really
>>>>>>>>>>>> want Relational model, I tell them use NewSql. I love that 
>>>>>>>>>>>> Cassandra sits
>>>>>>>>>>>> between NoSql and NewSql. There are things I do in Cassandra today 
>>>>>>>>>>>> that are
>>>>>>>>>>>> much harder in NewSql or NoSql document databases. NewSql database 
>>>>>>>>>>>> can
>>>>>>>>>>>> scale to similar sizes, so the "big" part of big data won't be a
>>>>>>>>>>>> significant advantage forever. Looking at some of the recent NewSql
>>>>>>>>>>>> performance numbers, it's clear the gap is closing.
>>>>>>>>>>>>
>>>>>>>>>>>> peter
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 11, 2014 at 3:59 PM, Tyler Hobbs <
>>>>>>>>>>>> ty...@datastax.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 11, 2014 at 2:41 PM, Shao-Chuan Wang <
>>>>>>>>>>>>> shaochuan.w...@bloomreach.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, does anyone know how to do "describing the splits" and
>>>>>>>>>>>>>> "describing the local rings" using native protocol?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> For a ring description, you would do something like "select
>>>>>>>>>>>>> peer, tokens from system.peers".  I'm not sure about 
>>>>>>>>>>>>> describe_splits().
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, cqlsh uses python client, which is talking via thrift
>>>>>>>>>>>>>> protocol too. Does it mean that it will be migrated to native 
>>>>>>>>>>>>>> protocol soon
>>>>>>>>>>>>>> as well?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes: https://issues.apache.org/jira/browse/CASSANDRA-6307
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Tyler Hobbs
>>>>>>>>>>>>> DataStax <http://datastax.com/>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Steve Robenalt
>>>>>>>> Software Architect
>>>>>>>> HighWire | Stanford University
>>>>>>>> 425 Broadway St, Redwood City, CA 94063
>>>>>>>>
>>>>>>>> srobe...@stanford.edu
>>>>>>>> http://highwire.stanford.edu
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Steve Robenalt
>>>>>>> Software Architect
>>>>>>> HighWire | Stanford University
>>>>>>> 425 Broadway St, Redwood City, CA 94063
>>>>>>>
>>>>>>> srobe...@stanford.edu
>>>>>>> http://highwire.stanford.edu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Proposal: freeze Thrift starting with 2.1.0

Reply via email to