One more:
I noticed in the above discussion, "SQL API", "Streaming SQL API" have been
used frequently. But I am not sure what exactly Julian means by "SQL API".
Julian, could you clarify on this? Were you referring to the Streaming SQL
syntax/grammar definition, the common representation layer, or the physical
operator API in Samza?

Thanks!

-Yi

On Wed, Jan 28, 2015 at 10:02 AM, Yi Pan <nickpa...@gmail.com> wrote:

> Hi, Julian,
>
> First, welcome to join the community! Let me try to answer some of your
> comments, in addition to what Jon, Milinda, and others already commented on.
>
> In general, I don't think that our thoughts differ too far. As you
> mentioned, the full-stack would be a SQL parser -> AST -> a logic algebra
> -> a physical algebra on top of SAMZA operator APIs. We had a discussion
> earlier in SAMZA-390 that the logic algebra (i.e. the SQL-like object model
> that Milinda is working on) could potentially be used by other DSLs on top
> of SAMZA as well. Hence, what we propose to do is to have this intermediate
> layer as close to the physical operators as possible. The purpose of this
> layer is to provide an isolation between the frontend language parser and
> the backend implementation, not to provide a programming API for a user to
> write a query.
>
> I try to understand your comments below: "But there is not a simple
> mapping between true SQL and a data-flow graph that you can execute."
> What is the specific meaning of this statement? Could you elaborate on this
> a bit more?
>
> Thanks!
>
> -Yi
>
> On Tue, Jan 27, 2015 at 5:34 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> Hi all,
>>
>> This is my first post to the Samza list. I heard from Chris and Jay
>> that you guys were looking into putting a SQL interface on Samza, so I
>> thought I'd take a look.
>>
>> My background is in the SQL world, most recently with Apache Calcite,
>> (although I have quite a lot of experience with streaming too) so
>> forgive me if I am speaking a foreign language or seem to be coming at
>> this from a completely different direction. Also forgive me if I have
>> missed preceding discussions and I am opening up areas that have been
>> settled already.
>>
>> I was surprised that one of the first goals is to create a SQL API.
>> SQL is a textual language; a lot of the nuance (e.g. scope of
>> identifiers) is lost when you convert it to a linear builder API. Now,
>> it definitely makes sense to have a SQL AST (abstract syntax tree),
>> that can be created by hand-written code or by a parser. And you can
>> create an AST builder, if you like. But there is not a simple mapping
>> between true SQL and a data-flow graph that you can execute. If you
>> imagine that there is a simple mapping, you will achieve great results
>> with simple SELECT-FROM-WHERE queries but hit the wall when you hit
>> the hard stuff. You will end up -- as so many others have -- with a
>> SQL-like language. Close but no cigar.
>>
>> Case in point: Spark (and Spark-streaming) is a SQL-like language that
>> looks similar to the proposed Samza API, and now they are building
>> SparkSQL from the ground up.
>>
>> I think the way to approach this is to have a SQL parser and a logical
>> algebra. The logical algebra looks very similar to relational algebra,
>> maybe with one or two extensions for streaming. (A lot of SQL features
>> -- such as query blocks, sub-queries, correlated variables, aliases,
>> views and the HAVING clause -- are not present in the algebra.)
>> Between the parser and the logical algebra is an AST, a validator, and
>> a translator from AST the the algebra. And then there is a physical
>> algebra, which is Samza of course.
>>
>> Maybe the proposed SQL object model is in fact that logical algebra.
>> But I'd recommend that you not call it SQL; in fact it should be
>> non-goal that an end-user would use that API and think that they are
>> in any way creating a "SQL query".
>>
>> Julian
>>
>
>

Reply via email to