One more: I noticed in the above discussion, "SQL API", "Streaming SQL API" have been used frequently. But I am not sure what exactly Julian means by "SQL API". Julian, could you clarify on this? Were you referring to the Streaming SQL syntax/grammar definition, the common representation layer, or the physical operator API in Samza?
Thanks! -Yi On Wed, Jan 28, 2015 at 10:02 AM, Yi Pan <nickpa...@gmail.com> wrote: > Hi, Julian, > > First, welcome to join the community! Let me try to answer some of your > comments, in addition to what Jon, Milinda, and others already commented on. > > In general, I don't think that our thoughts differ too far. As you > mentioned, the full-stack would be a SQL parser -> AST -> a logic algebra > -> a physical algebra on top of SAMZA operator APIs. We had a discussion > earlier in SAMZA-390 that the logic algebra (i.e. the SQL-like object model > that Milinda is working on) could potentially be used by other DSLs on top > of SAMZA as well. Hence, what we propose to do is to have this intermediate > layer as close to the physical operators as possible. The purpose of this > layer is to provide an isolation between the frontend language parser and > the backend implementation, not to provide a programming API for a user to > write a query. > > I try to understand your comments below: "But there is not a simple > mapping between true SQL and a data-flow graph that you can execute." > What is the specific meaning of this statement? Could you elaborate on this > a bit more? > > Thanks! > > -Yi > > On Tue, Jan 27, 2015 at 5:34 PM, Julian Hyde <jh...@apache.org> wrote: > >> Hi all, >> >> This is my first post to the Samza list. I heard from Chris and Jay >> that you guys were looking into putting a SQL interface on Samza, so I >> thought I'd take a look. >> >> My background is in the SQL world, most recently with Apache Calcite, >> (although I have quite a lot of experience with streaming too) so >> forgive me if I am speaking a foreign language or seem to be coming at >> this from a completely different direction. Also forgive me if I have >> missed preceding discussions and I am opening up areas that have been >> settled already. >> >> I was surprised that one of the first goals is to create a SQL API. >> SQL is a textual language; a lot of the nuance (e.g. scope of >> identifiers) is lost when you convert it to a linear builder API. Now, >> it definitely makes sense to have a SQL AST (abstract syntax tree), >> that can be created by hand-written code or by a parser. And you can >> create an AST builder, if you like. But there is not a simple mapping >> between true SQL and a data-flow graph that you can execute. If you >> imagine that there is a simple mapping, you will achieve great results >> with simple SELECT-FROM-WHERE queries but hit the wall when you hit >> the hard stuff. You will end up -- as so many others have -- with a >> SQL-like language. Close but no cigar. >> >> Case in point: Spark (and Spark-streaming) is a SQL-like language that >> looks similar to the proposed Samza API, and now they are building >> SparkSQL from the ground up. >> >> I think the way to approach this is to have a SQL parser and a logical >> algebra. The logical algebra looks very similar to relational algebra, >> maybe with one or two extensions for streaming. (A lot of SQL features >> -- such as query blocks, sub-queries, correlated variables, aliases, >> views and the HAVING clause -- are not present in the algebra.) >> Between the parser and the logical algebra is an AST, a validator, and >> a translator from AST the the algebra. And then there is a physical >> algebra, which is Samza of course. >> >> Maybe the proposed SQL object model is in fact that logical algebra. >> But I'd recommend that you not call it SQL; in fact it should be >> non-goal that an end-user would use that API and think that they are >> in any way creating a "SQL query". >> >> Julian >> > >