Hi Julian, Thank you for your comments. You can find the discussions about common representation layer at:
https://issues.apache.org/jira/browse/SAMZA-483 And a draft Streaming SQL API at: https://reviews.apache.org/r/30287/ I also prefer a extended relation algebra model (logical algebra) instead of SQL like model because of the fact that, we can model the whole query using basic constructs like windowing, selection, projection, join and other set operations. I assume this logical algebra will be close to the execution plan, than the SQL model and make it easy to think about possible optimizations and parallelizations. Before Samza Streaming SQL project, I was working on a CQL based Clojure DSL and was thinking generating a relation algebra like model from the DSL. Here is a very brief (but interesting) document on SQL to relational algebra translation if someone is interested to know more: http://cs.ulb.ac.be/public/_media/teaching/infoh417/sql2alg_eng.pdf I believe that the CQL's 'instantaneous relation' concept we are using during the execution phase, allows us to reuse all (or most) of the SQL to relational algebra translation techniques with minor extensions to accommodate windowing and partitioning. I can volunteer to write a document demonstrating this translation process for couple of sample queries in our Streaming SQL spec document. Then we can build our design on top of that and this will be a good exercise for us to identify complexities. Even if we decided to go with logical algebra like model, I believe we can still use the Streaming SQL API to quickly prototype translation step without waiting for the complete parser and validator to be implemented. Thanks Milinda On Wed, Jan 28, 2015 at 1:54 AM, Jon Bringhurst < jbringhu...@linkedin.com.invalid> wrote: > Hey Julian, > > Most of the general discussion surrounding a high level language for Samza > can be found at: > > https://issues.apache.org/jira/browse/SAMZA-390 > > Early mockups of Yi's work on what the lower level APIs (phases after AST > rewriting) might look like can also be found at: > > https://issues.apache.org/jira/browse/SAMZA-482 > > Much of the work is actually derived from CQL. However, CQL is a bit > obscure, so the work is commonly being compared to the SQL in casual > discussion since it's more widely known. > > We can always use another set of eyes on something this complex and would > appreciate any comments you have. > > -Jon > > On Jan 27, 2015, at 5:34 PM, Julian Hyde <jh...@apache.org> wrote: > > > Hi all, > > > > This is my first post to the Samza list. I heard from Chris and Jay > > that you guys were looking into putting a SQL interface on Samza, so I > > thought I'd take a look. > > > > My background is in the SQL world, most recently with Apache Calcite, > > (although I have quite a lot of experience with streaming too) so > > forgive me if I am speaking a foreign language or seem to be coming at > > this from a completely different direction. Also forgive me if I have > > missed preceding discussions and I am opening up areas that have been > > settled already. > > > > I was surprised that one of the first goals is to create a SQL API. > > SQL is a textual language; a lot of the nuance (e.g. scope of > > identifiers) is lost when you convert it to a linear builder API. Now, > > it definitely makes sense to have a SQL AST (abstract syntax tree), > > that can be created by hand-written code or by a parser. And you can > > create an AST builder, if you like. But there is not a simple mapping > > between true SQL and a data-flow graph that you can execute. If you > > imagine that there is a simple mapping, you will achieve great results > > with simple SELECT-FROM-WHERE queries but hit the wall when you hit > > the hard stuff. You will end up -- as so many others have -- with a > > SQL-like language. Close but no cigar. > > > > Case in point: Spark (and Spark-streaming) is a SQL-like language that > > looks similar to the proposed Samza API, and now they are building > > SparkSQL from the ground up. > > > > I think the way to approach this is to have a SQL parser and a logical > > algebra. The logical algebra looks very similar to relational algebra, > > maybe with one or two extensions for streaming. (A lot of SQL features > > -- such as query blocks, sub-queries, correlated variables, aliases, > > views and the HAVING clause -- are not present in the algebra.) > > Between the parser and the logical algebra is an AST, a validator, and > > a translator from AST the the algebra. And then there is a physical > > algebra, which is Samza of course. > > > > Maybe the proposed SQL object model is in fact that logical algebra. > > But I'd recommend that you not call it SQL; in fact it should be > > non-goal that an end-user would use that API and think that they are > > in any way creating a "SQL query". > > > > Julian > > -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org