Great summary Chris.
I agree that we should really really strive not to have another mandatory
operational dependency for out of the box usage.
To clarify what I was saying, I am less concerned about the partition count
which is readily discoverable and basically just part of the "ddl" for the
to
Hey all,
Trying to respond in order.
> If one wants finer-grained control over what the output topic would be
like, wouldn't it make sense to use a CREATE TABLE AS statement?
Yes. CREATE TABLE AS could be used to define all three things that I
mentioned (partition key, partition count, and sche
Chris,
I think the schema repository acts as the stand in for the database catalog
or hcat. People who don't have that will have to give their schema with the
query. I think that can be some kind of plugin to provide the schema so it
is automatically inferred for the avro people and manually provi
This isn't a huge deal for me. Since there isn't a metadata repository it makes
sense to include that kind of metadata in the query. (The database world is all
about "late schema" and "schema on read" these days, so there are plenty of
precedents in modern SQL implementations.)
We could allow p
I am not convinced that the partitioning and partition key issue is quite
equivalent to partitions in a relational database. In a relational database
partitions have no semantic meaning and are just a way to organize data for
query efficiency. That is not the case in Kafka. Let my try to argue that
lixgv
From: Chris Riccomini [criccom...@apache.org]
Sent: Friday, January 30, 2015 12:06 PM
To: Chris Riccomini
Cc: dev@samza.apache.org
Subject: Re: Streaming SQL - object models, ASTs and algebras
Hey all,
I have a few more comments on the metadata issue t
___
From: Chris Riccomini [criccom...@apache.org]
Sent: Friday, January 30, 2015 10:43 AM
To: dev@samza.apache.org
Subject: Re: Streaming SQL - object models, ASTs and algebras
Hey all,
Just catching up on this thread. The Calcite + Samza approach seems pretty
compelling to me.
Hi Chris,
As a temporary solution we can use a JSON based Calcite schema feature.
https://github.com/julianhyde/incubator-calcite/blob/master/example/csv/src/test/resources/model.json
We can implement our own Schema factory and use Calcite JSON model with
additional attributes as required.
User
Hey all,
I have a few more comments on the metadata issue that I brought up. The
three things that we lack right now are:
1. Partition count.
2. Partition key.
3. Stream (or message) schema.
These have to be evaluated both on the ingest (first consumers) and egress
(final producers) of a query.
Hey all,
Just catching up on this thread. The Calcite + Samza approach seems pretty
compelling to me. I think most of what Julian is arguing for makes sense.
My main concern is with practicalities.
One specific case of this is the discussion about the partitioning model.
In an ideal world, I agre
> On Jan 29, 2015, at 4:38 PM, Yi Pan wrote:
>
> I am wondering if I can get an average that's per 30 min window averages?
> I.e. the following is the input events in a stream:
> {10:01, ORCL, 10, 10}
> {10:02, MSFT, 30, 30}
> {10:03, ORCL, 100, 110}
> {10:17, MSFT, 45, 75}
> {10:59, ORCL,
Thanks! Posted to SAMZA-390.
On Thu, Jan 29, 2015 at 4:44 PM, Julian Hyde wrote:
>
> > On Jan 29, 2015, at 4:42 PM, Yi Pan wrote:
> >
> > One more, Julian, do you mind if I post your proposed SQL model to
> > SAMZA-390? That way, more ppl can view it and we should continue
> discussion
> > ther
> On Jan 29, 2015, at 4:42 PM, Yi Pan wrote:
>
> One more, Julian, do you mind if I post your proposed SQL model to
> SAMZA-390? That way, more ppl can view it and we should continue discussion
> there.
Yes of course - feel free to post anything I post on public lists.
Julian
One more, Julian, do you mind if I post your proposed SQL model to
SAMZA-390? That way, more ppl can view it and we should continue discussion
there.
Thanks!
-Yi
On Thu, Jan 29, 2015 at 4:27 PM, Julian Hyde wrote:
> The validation logic is extensible within Calcite (for example, the
> validato
I got a bit carried away with the nuances of the relational algebra there, so I
forgot to discuss where the code would go.
Some of the work in this proposal (such as parsing and validation) would most
naturally involve changes to Calcite (adding functionality that would simply be
dormant for tr
Hi, Julian,
Forgive me if I am slow in following your examples.
{quote}
You can also define a "paged" window, for example the cumulative total
trades since the top of the hour:
select stream rowtime, ticker, amount,
sum(amount) over (order by rowtime partition by ticker, trunc(rowtime to
hour)
The validation logic is extensible within Calcite (for example, the validator
has an interface SqlValidatorNamespace that represents anything that can be
used as a table alias, such as table usages in the FROM clause or sub-queries),
but I think it would be very complex to abstract that logic as
Hi Julian,
I like your proposal. I think it saves us lot of time. But I have a
question regarding SQL validation. I'm not an SQL expert, so I may be wrong.
As I understand some SQL constructs such as NOT IN, ALL, EXCEPT will not
valid in the context of some stream queries due to there blocking na
> On Jan 29, 2015, at 3:04 PM, Yi Pan wrote:
>
> Hi, Julian,
>
> Thanks for sharing your idea! It is interesting and well organized. Let me
> try to summarize the main difference between yours and the current proposal
> are:
> - removing the '[]' used to define the window specification, using O
Hi, Julian,
Thanks for sharing your idea! It is interesting and well organized. Let me
try to summarize the main difference between yours and the current proposal
are:
- removing the '[]' used to define the window specification, using OVER on
the stream/table instead
- join/select output can be ei
Let me propose an alternative approach. The deliverables and the technology
stack would be different, but I think we still fulfill the spirit of the
proposal, and there are benefits from better interoperability, standards
compliance, and building on existing code that already works.
First, I pr
Hi, Julian,
Thanks for explanation. I got your point that the physical layer
"stream-scan" can be used to get the delta(filter(..)) in the logical
algebra.
My question on this model is:
If a window operation is implemented as filter(tuple.isInWindow(),
stream-scan(Orders)) in the physical layer, i
Consider this simple query (I'll express in 3 equivalent ways):
* select stream * from Orders where state = 'CA' (in streaming SQL)
* istream [ select * from Orders where state = 'CA' ] (in CQL)
* delta(filter(state = 'CA', scan(Orders))) (in logical algebra)
In CQL there are no named streams, ju
Hi, Julian,
Thanks! I think we all agreed on the point to isolate between SQL AST and
the logical algebra.
Focusing on your comment below:
"The stream-to-relation and relation-to-stream operators are in the logical
algebra but very likely have disappeared by the time you get to the
physical algeb
On Jan 28, 2015, at 10:02 AM, Yi Pan wrote:
> I try to understand your comments below: "But there is not a simple
> mapping between
> true SQL and a data-flow graph that you can execute." What is the specific
> meaning of this statement? Could you elaborate on this a bit more?
The structure of
On Jan 28, 2015, at 10:05 AM, Yi Pan wrote:
> One more:
> I noticed in the above discussion, "SQL API", "Streaming SQL API" have been
> used frequently. But I am not sure what exactly Julian means by "SQL API".
> Julian, could you clarify on this? Were you referring to the Streaming SQL
> syntax/
One more:
I noticed in the above discussion, "SQL API", "Streaming SQL API" have been
used frequently. But I am not sure what exactly Julian means by "SQL API".
Julian, could you clarify on this? Were you referring to the Streaming SQL
syntax/grammar definition, the common representation layer, or
Hi, Julian,
First, welcome to join the community! Let me try to answer some of your
comments, in addition to what Jon, Milinda, and others already commented on.
In general, I don't think that our thoughts differ too far. As you
mentioned, the full-stack would be a SQL parser -> AST -> a logic alg
Hi Julian,
Thank you for your comments. You can find the discussions about common
representation layer at:
https://issues.apache.org/jira/browse/SAMZA-483
And a draft Streaming SQL API at:
https://reviews.apache.org/r/30287/
I also prefer a extended relation algebra model (logical algebra) ins
Hey Julian,
Most of the general discussion surrounding a high level language for Samza can
be found at:
https://issues.apache.org/jira/browse/SAMZA-390
Early mockups of Yi's work on what the lower level APIs (phases after AST
rewriting) might look like can also be found at:
https://issues.apa
Hive and Spark SQL using antlras SQL Parserto translate SQL
It's could use to building a samza job which looks like Storm Trident
在 15-1-28 上午9:34, Julian Hyde 写道:
Hi all,
This is my first post to the Samza list. I heard from Chris and Jay
that you guys were looking into putting a SQL interfac
Hi all,
This is my first post to the Samza list. I heard from Chris and Jay
that you guys were looking into putting a SQL interface on Samza, so I
thought I'd take a look.
My background is in the SQL world, most recently with Apache Calcite,
(although I have quite a lot of experience with streami
32 matches
Mail list logo