Hive and Spark SQL using antlras SQL Parserto translate SQL It's could use to building a samza job which looks like Storm Trident
在 15-1-28 上午9:34, Julian Hyde 写道:
Hi all, This is my first post to the Samza list. I heard from Chris and Jay that you guys were looking into putting a SQL interface on Samza, so I thought I'd take a look. My background is in the SQL world, most recently with Apache Calcite, (although I have quite a lot of experience with streaming too) so forgive me if I am speaking a foreign language or seem to be coming at this from a completely different direction. Also forgive me if I have missed preceding discussions and I am opening up areas that have been settled already. I was surprised that one of the first goals is to create a SQL API. SQL is a textual language; a lot of the nuance (e.g. scope of identifiers) is lost when you convert it to a linear builder API. Now, it definitely makes sense to have a SQL AST (abstract syntax tree), that can be created by hand-written code or by a parser. And you can create an AST builder, if you like. But there is not a simple mapping between true SQL and a data-flow graph that you can execute. If you imagine that there is a simple mapping, you will achieve great results with simple SELECT-FROM-WHERE queries but hit the wall when you hit the hard stuff. You will end up -- as so many others have -- with a SQL-like language. Close but no cigar. Case in point: Spark (and Spark-streaming) is a SQL-like language that looks similar to the proposed Samza API, and now they are building SparkSQL from the ground up. I think the way to approach this is to have a SQL parser and a logical algebra. The logical algebra looks very similar to relational algebra, maybe with one or two extensions for streaming. (A lot of SQL features -- such as query blocks, sub-queries, correlated variables, aliases, views and the HAVING clause -- are not present in the algebra.) Between the parser and the logical algebra is an AST, a validator, and a translator from AST the the algebra. And then there is a physical algebra, which is Samza of course. Maybe the proposed SQL object model is in fact that logical algebra. But I'd recommend that you not call it SQL; in fact it should be non-goal that an end-user would use that API and think that they are in any way creating a "SQL query". Julian