Re: Streaming SQL - object models, ASTs and algebras

2015-02-01 Thread Jay Kreps
Great summary Chris. I agree that we should really really strive not to have another mandatory operational dependency for out of the box usage. To clarify what I was saying, I am less concerned about the partition count which is readily discoverable and basically just part of the "ddl" for the to

Re: Streaming SQL - object models, ASTs and algebras

2015-01-31 Thread Chris Riccomini
Hey all, Trying to respond in order. > If one wants finer-grained control over what the output topic would be like, wouldn't it make sense to use a CREATE TABLE AS statement? Yes. CREATE TABLE AS could be used to define all three things that I mentioned (partition key, partition count, and sche

Re: Streaming SQL - object models, ASTs and algebras

2015-01-30 Thread Jay Kreps
Chris, I think the schema repository acts as the stand in for the database catalog or hcat. People who don't have that will have to give their schema with the query. I think that can be some kind of plugin to provide the schema so it is automatically inferred for the avro people and manually provi

Re: Streaming SQL - object models, ASTs and algebras

2015-01-30 Thread Julian Hyde
This isn't a huge deal for me. Since there isn't a metadata repository it makes sense to include that kind of metadata in the query. (The database world is all about "late schema" and "schema on read" these days, so there are plenty of precedents in modern SQL implementations.) We could allow p

Re: Streaming SQL - object models, ASTs and algebras

2015-01-30 Thread Jay Kreps
I am not convinced that the partitioning and partition key issue is quite equivalent to partitions in a relational database. In a relational database partitions have no semantic meaning and are just a way to organize data for query efficiency. That is not the case in Kafka. Let my try to argue that

RE: Streaming SQL - object models, ASTs and algebras

2015-01-30 Thread Felix GV
lixgv From: Chris Riccomini [criccom...@apache.org] Sent: Friday, January 30, 2015 12:06 PM To: Chris Riccomini Cc: dev@samza.apache.org Subject: Re: Streaming SQL - object models, ASTs and algebras Hey all, I have a few more comments on the metadata issue t

RE: Streaming SQL - object models, ASTs and algebras

2015-01-30 Thread Felix GV
___ From: Chris Riccomini [criccom...@apache.org] Sent: Friday, January 30, 2015 10:43 AM To: dev@samza.apache.org Subject: Re: Streaming SQL - object models, ASTs and algebras Hey all, Just catching up on this thread. The Calcite + Samza approach seems pretty compelling to me.

Re: Streaming SQL - object models, ASTs and algebras

2015-01-30 Thread Milinda Pathirage
Hi Chris, As a temporary solution we can use a JSON based Calcite schema feature. https://github.com/julianhyde/incubator-calcite/blob/master/example/csv/src/test/resources/model.json We can implement our own Schema factory and use Calcite JSON model with additional attributes as required. User

Re: Streaming SQL - object models, ASTs and algebras

2015-01-30 Thread Chris Riccomini
Hey all, I have a few more comments on the metadata issue that I brought up. The three things that we lack right now are: 1. Partition count. 2. Partition key. 3. Stream (or message) schema. These have to be evaluated both on the ingest (first consumers) and egress (final producers) of a query.

Re: Streaming SQL - object models, ASTs and algebras

2015-01-30 Thread Chris Riccomini
Hey all, Just catching up on this thread. The Calcite + Samza approach seems pretty compelling to me. I think most of what Julian is arguing for makes sense. My main concern is with practicalities. One specific case of this is the discussion about the partitioning model. In an ideal world, I agre

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
> On Jan 29, 2015, at 4:38 PM, Yi Pan wrote: > > I am wondering if I can get an average that's per 30 min window averages? > I.e. the following is the input events in a stream: > {10:01, ORCL, 10, 10} > {10:02, MSFT, 30, 30} > {10:03, ORCL, 100, 110} > {10:17, MSFT, 45, 75} > {10:59, ORCL,

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Yi Pan
Thanks! Posted to SAMZA-390. On Thu, Jan 29, 2015 at 4:44 PM, Julian Hyde wrote: > > > On Jan 29, 2015, at 4:42 PM, Yi Pan wrote: > > > > One more, Julian, do you mind if I post your proposed SQL model to > > SAMZA-390? That way, more ppl can view it and we should continue > discussion > > ther

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
> On Jan 29, 2015, at 4:42 PM, Yi Pan wrote: > > One more, Julian, do you mind if I post your proposed SQL model to > SAMZA-390? That way, more ppl can view it and we should continue discussion > there. Yes of course - feel free to post anything I post on public lists. Julian

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Yi Pan
One more, Julian, do you mind if I post your proposed SQL model to SAMZA-390? That way, more ppl can view it and we should continue discussion there. Thanks! -Yi On Thu, Jan 29, 2015 at 4:27 PM, Julian Hyde wrote: > The validation logic is extensible within Calcite (for example, the > validato

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
I got a bit carried away with the nuances of the relational algebra there, so I forgot to discuss where the code would go. Some of the work in this proposal (such as parsing and validation) would most naturally involve changes to Calcite (adding functionality that would simply be dormant for tr

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Yi Pan
Hi, Julian, Forgive me if I am slow in following your examples. {quote} You can also define a "paged" window, for example the cumulative total trades since the top of the hour: select stream rowtime, ticker, amount, sum(amount) over (order by rowtime partition by ticker, trunc(rowtime to hour)

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
The validation logic is extensible within Calcite (for example, the validator has an interface SqlValidatorNamespace that represents anything that can be used as a table alias, such as table usages in the FROM clause or sub-queries), but I think it would be very complex to abstract that logic as

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Milinda Pathirage
Hi Julian, I like your proposal. I think it saves us lot of time. But I have a question regarding SQL validation. I'm not an SQL expert, so I may be wrong. As I understand some SQL constructs such as NOT IN, ALL, EXCEPT will not valid in the context of some stream queries due to there blocking na

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
> On Jan 29, 2015, at 3:04 PM, Yi Pan wrote: > > Hi, Julian, > > Thanks for sharing your idea! It is interesting and well organized. Let me > try to summarize the main difference between yours and the current proposal > are: > - removing the '[]' used to define the window specification, using O

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Yi Pan
Hi, Julian, Thanks for sharing your idea! It is interesting and well organized. Let me try to summarize the main difference between yours and the current proposal are: - removing the '[]' used to define the window specification, using OVER on the stream/table instead - join/select output can be ei

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
Let me propose an alternative approach. The deliverables and the technology stack would be different, but I think we still fulfill the spirit of the proposal, and there are benefits from better interoperability, standards compliance, and building on existing code that already works. First, I pr

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Yi Pan
Hi, Julian, Thanks for explanation. I got your point that the physical layer "stream-scan" can be used to get the delta(filter(..)) in the logical algebra. My question on this model is: If a window operation is implemented as filter(tuple.isInWindow(), stream-scan(Orders)) in the physical layer, i

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Julian Hyde
Consider this simple query (I'll express in 3 equivalent ways): * select stream * from Orders where state = 'CA' (in streaming SQL) * istream [ select * from Orders where state = 'CA' ] (in CQL) * delta(filter(state = 'CA', scan(Orders))) (in logical algebra) In CQL there are no named streams, ju

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Yi Pan
Hi, Julian, Thanks! I think we all agreed on the point to isolate between SQL AST and the logical algebra. Focusing on your comment below: "The stream-to-relation and relation-to-stream operators are in the logical algebra but very likely have disappeared by the time you get to the physical algeb

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Julian Hyde
On Jan 28, 2015, at 10:02 AM, Yi Pan wrote: > I try to understand your comments below: "But there is not a simple > mapping between > true SQL and a data-flow graph that you can execute." What is the specific > meaning of this statement? Could you elaborate on this a bit more? The structure of

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Julian Hyde
On Jan 28, 2015, at 10:05 AM, Yi Pan wrote: > One more: > I noticed in the above discussion, "SQL API", "Streaming SQL API" have been > used frequently. But I am not sure what exactly Julian means by "SQL API". > Julian, could you clarify on this? Were you referring to the Streaming SQL > syntax/

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Yi Pan
One more: I noticed in the above discussion, "SQL API", "Streaming SQL API" have been used frequently. But I am not sure what exactly Julian means by "SQL API". Julian, could you clarify on this? Were you referring to the Streaming SQL syntax/grammar definition, the common representation layer, or

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Yi Pan
Hi, Julian, First, welcome to join the community! Let me try to answer some of your comments, in addition to what Jon, Milinda, and others already commented on. In general, I don't think that our thoughts differ too far. As you mentioned, the full-stack would be a SQL parser -> AST -> a logic alg

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Milinda Pathirage
Hi Julian, Thank you for your comments. You can find the discussions about common representation layer at: https://issues.apache.org/jira/browse/SAMZA-483 And a draft Streaming SQL API at: https://reviews.apache.org/r/30287/ I also prefer a extended relation algebra model (logical algebra) ins

Re: Streaming SQL - object models, ASTs and algebras

2015-01-27 Thread Jon Bringhurst
Hey Julian, Most of the general discussion surrounding a high level language for Samza can be found at: https://issues.apache.org/jira/browse/SAMZA-390 Early mockups of Yi's work on what the lower level APIs (phases after AST rewriting) might look like can also be found at: https://issues.apa

Re: Streaming SQL - object models, ASTs and algebras

2015-01-27 Thread 王辰光
Hive and Spark SQL using antlras SQL Parserto translate SQL It's could use to building a samza job which looks like Storm Trident 在 15-1-28 上午9:34, Julian Hyde 写道: Hi all, This is my first post to the Samza list. I heard from Chris and Jay that you guys were looking into putting a SQL interfac