Re: [DISCUSS] End of Stream Feature in Samza

2016-08-30 Thread Julian Hyde
> On Aug 30, 2016, at 4:44 PM, Xinyu Liu wrote: > > It's very exciting that Samza is adding support of bounded input streams. +1!

Re: [DISCUSS] [VOTE] Apache Samza 0.10.1 RC0

2016-08-01 Thread Julian Hyde
> On Aug 1, 2016, at 5:24 PM, Navina Ramesh > wrote: > > Are you referring to the Gemfile.lock in docs/ ? I do see that in the > source release. I was mistaken. Apparently it is OK to check in Gemfile.lock; see http://yehudakatz.com/2010/12/16/clarifying-the-roles-of-the-gemspec-and-gemfile/

Re: [DISCUSS] [VOTE] Apache Samza 0.10.1 RC0

2016-08-01 Thread Julian Hyde
+1 (non-binding) Downloaded, built, ran tests, checked L&N. Instructions on installing gradle were nice and clear. I ran rat (I presume that ‘./gradlew rat’ is the correct command) and it passed. I believe that best practice is to put the release bits under https://dist.apache.org/repos/dist/d

Re: SamzaSQL document required

2016-07-27 Thread Julian Hyde
Thanks Yi, that was a very helpful overview! > On Jul 27, 2016, at 12:27 PM, Yi Pan wrote: > > Hi, Ankita, > > There is no official release documentation for SamzaSQL yet. If you are > referring to the paper in HPBDC this year by Milinda, it is based on > several patches under development. I wi

Re: About planner questions!

2016-06-15 Thread Julian Hyde
I agree with Milinda said. I’ll add that real, complex systems often use multiple phases of planning, some of which are rule-based, some cost-based, using different sets of rules or cost models at each stage. The nice thing about Calcite’s model is that you can re-use the same transformation rul

Re: [DISCUSS] Moving to github/pull-request for code review and check-in

2016-02-19 Thread Julian Hyde
PRs have worked well for us in Calcite. We still accept patches, if contributors are adamant, but it’s unusual. We don’t use RB. We (or I) haven’t managed to fully automate submission. I pull down to my sandbox, rebase, and merge --ff-only, because in Calcite (as I think in Samza) our policy i

Re: How to do aggregation in Samza?

2015-10-21 Thread Julian Hyde
I am helping with the SQL support. I don’t know timelines but I wanted to chime in on the different aggregate operations. There are several ways to aggregate streams: tumbling, hopping, sliding windows. For example, if you want to periodically emit totals that collapse many rows into one total,

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Julian Hyde
I broadly support it, with one big proviso. One of the attractive things about Kafka has been its minimalism -- the fact that it solves one part of the problem, simply, and very well. It is very important that it continues to do that, and that people continue to perceive it that way. Make Kafka in

Re: Thoughts and obesrvations on Samza

2015-07-09 Thread Julian Hyde
Wow, what a great discussion. A brave discussion, since no project wants to reduce its scope. And important, because "right-sizing" technology components can help them win in the long run. I have a couple of let-me-play-devil's-advocate questions. 1. Community over code Let's look at this in ter

Re: Hopping and tumbling windows in streaming SQL

2015-06-25 Thread Julian Hyde
ased model we have to perform date/time arithmetic to have tumbling > windows such as 5 minutes tumbling windows (May be there is a better way > that I don't know). But TUMBLE function that can specify the parameters > such as window size would be nice. I am +1 for other extensions as

Hopping and tumbling windows in streaming SQL

2015-06-24 Thread Julian Hyde
Hi all, Forgive the cross-post. This is for Calcite devs interested in streaming and Samza devs interested in SQL. I've been thinking some more about how to implement hopping and tumbling windows in streaming SQL. I was previously at a loss to find a concise syntax that is consistent with how SQL

Re: What next for streaming SQL?

2015-05-05 Thread Julian Hyde
answer my question > regarding to the sliding windows in the previous email? > > Thanks a lot! > > -Yi > > On Tue, May 5, 2015 at 10:46 AM, Julian Hyde wrote: > >> >> On May 4, 2015, at 10:52 AM, Yi Pan wrote: >> >>> Just one observation

Re: What next for streaming SQL?

2015-05-05 Thread Julian Hyde
On May 4, 2015, at 10:52 AM, Yi Pan wrote: > Just one observation that I wanted to add in: I noted that actually any > range-based query clause on an ordered stream essentially means the need > for a windowing method in the ordered stream scan. Is it possible to > identify a common syntax expres

Re: What next for streaming SQL?

2015-04-30 Thread Julian Hyde
ata and flags to handle these scenarios. I think Yi may have a better > idea about this because he was working on the window operator design. You can > find his design document here [1]. > > CALCITE-704 looks interesting. I'll have a look at it. > > Thanks > Milinda

Re: What next for streaming SQL?

2015-04-29 Thread Julian Hyde
a window closing policy where we will not handle tuples > arriving after the window timeout. Yi's window operator design document > contains most of the details required. What do you think about this approach > to implement tumbling windows? We highly appreciate your feedback on thi

What next for streaming SQL?

2015-04-27 Thread Julian Hyde
Milinda, I have seen your work adding initial streaming SQL to Samza. Good stuff. Which types of query are you thinking of doing next? As of calcite-1.2, the streaming extensions are in Calcite’s master branch. (See https://github.com/apache/incubator-calcite/blob/master/doc/STREAM.md.) We are

Re: Review Request 33142: [SAMZA-561] Review in progress

2015-04-20 Thread Julian Hyde
know whether nulls are possible. - Julian Hyde On April 13, 2015, 9:04 p.m., Yi Pan (Data Infrastructure) wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.a

Re: Review Request 33142: [SAMZA-561] Review in progress

2015-04-20 Thread Julian Hyde
> On April 14, 2015, 10:14 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-sql/src/main/java/org/apache/samza/sql/metadata/Stream.java, line 37 > > > > > > What's the meaning of this parent? > > Milinda Pathirage w

Re: Review Request 33142: [SAMZA-561] Review in progress

2015-04-20 Thread Julian Hyde
> On April 14, 2015, 10:14 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-sql/src/main/java/org/apache/samza/sql/expressions/RexToJavaCompiler.java, > > line 153 > > > > > > Is "Buzz" the intended real class name

Re: Review Request 33142: [SAMZA-561] Review in progress

2015-04-20 Thread Julian Hyde
> On April 14, 2015, 10:14 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-sql/src/main/java/org/apache/samza/sql/expressions/RexToJavaCompiler.java, > > line 102 > > > > > > Just a question, is it supposed to thr

Re: Joining Avro records

2015-04-13 Thread Julian Hyde
t;, "type" : "record", "fields": [{"name": "a", "type" : > "int"}]}, >{"name": "B", "type" : "record", "fields": [{"name": "b", "type" : "

Re: Joining Avro records

2015-04-09 Thread Julian Hyde
Much of this is about mapping from logical fields (i.e. the fields you can reference in SQL) down to the Avro representation; I’m no expert on that mapping, so I’ll focus on the SQL stuff. First, SQL doesn’t allow a record to have two fields of the same name, so you wouldn’t be allowed to have

Temporal support in SQL:2011

2015-03-24 Thread Julian Hyde
(Apologies for the cross-post. It seemed to be of interest to both lists.) The latest incarnation of the SQL standard, SQL:2011, adds temporal support. Time is an important concept in streams, so my hunch is that temporal database features will also be useful in streaming SQL. While we’re thinkin

Re: A question regarding to the default semantic meaning of join

2015-03-10 Thread Julian Hyde
See answers inline. Proving, yet again, that there's no such thing as a short semantics email. :) > On Mar 9, 2015, at 12:48 PM, Yi Pan wrote: > > Hi, Julian, > > Thanks for the reply. I want to make sure that I understand your > explanation on windows in JOIN more explicitly. > For the follow

Re: A question regarding to the default semantic meaning of join

2015-03-07 Thread Julian Hyde
First of all, if you want a stream output, you should add the 'STREAM' keyword after 'SELECT'. There isn't quite enough information to make the query well-defined. I need to assume that each stream has an increasing rowtime column, and that the "OVER (ROWS 3 PRECEDING)" windows on each stream are

Re: Handling defaults and windowed aggregates in stream queries

2015-03-04 Thread Julian Hyde
ing whether enforcing CQL > semantics as we have in our current operator layer limits the flexibility > and increase the complexity of query plan to operator router generation. > Anyway, I am going to take a step back and think more about Julian's > comments. I'll put my t

Re: Handling defaults and windowed aggregates in stream queries

2015-03-03 Thread Julian Hyde
Sorry to show up late to this party. I've had my head down writing a description of streaming SQL which I hoped would answer questions like this. Here is the latest draft: https://github.com/julianhyde/incubator-calcite/blob/chi/doc/STREAM.md I've been avoiding windows for now. They are not nee

Re: Re-processing a la Kappa/Liquid

2015-02-22 Thread Julian Hyde
Can I quibble with semantics? This problem seems to be more naturally a stream-to-stream join, not a stream-to-table join. It seems unreasonable to expect the system to be able to give you the state of a table at a given moment in the past, but it is reasonable ask for the stream up to that po

Re: [DISCUSS] JDK7

2015-02-18 Thread Julian Hyde
Another data point. Calcite just dropped support for JDK 1.6. Calcite 1.0 supports 1.6, 1.7, 1.8, but Calcite 1.1 will only support 1.7, 1.8. We could be persuaded to reconsider. Julian > On Feb 18, 2015, at 09:40, Chris Riccomini wrote: > > Hey all, > > Ruslan has been working on upgradin

Terminology: Tumbling and sliding windows

2015-02-16 Thread Julian Hyde
I’d like to check that we’re using the same terminology. The Azure Stream Analytics documentation has concise definitions for tumbling, hopping and sliding windows, including some diagrams that I think are helpful. Tumbling: https://msdn.microsoft.com/en-us/library/azure/dn835055.aspx

Re: Windowing Guarantees in samza

2015-02-15 Thread Julian Hyde
+1 As far as possible, behavior should be deterministic, that is, determined by the data rather than when the query was started or the arrival time of the data. Of course, for the query to make progress, there should be ways to discard late data and to indicate that a producer is alive but do

Re: [DISCUSS] SQL workflow

2015-02-12 Thread Julian Hyde
blication. >>> 2) Create a samza-sql branch. >>> 3) Migrate all existing samza-sql commits from master into the samza-sql >>> branch. >>> >>> Sorry for the churn on this all. I hadn't considered the binary dependency >>> issue. Is everyone O

Re: [DISCUSS] SQL workflow

2015-02-11 Thread Julian Hyde
This seems more like a branch than a classifier. Calcite is developing in a branch, and would produce snapshots from that branch. The rule of thumb I’ve learned integrating Calcite into Hive is that only a branch should depend on snapshot versions of another component. (Hive broke this rule and

Re: Window spec in SQL language vs Samza system details

2015-02-10 Thread Julian Hyde
The answer depends on your design philosophy. We need to strike a balance between making it possible and making it easy. Because SQL is a powerful closed language, we can achieve a lot by combining the elements. For example, I think that your example can be solved by joining a "heartbeat" stream

Re: Review Request 30667: Samza Calcite Integration Prototype (SAMZA-483)

2015-02-05 Thread Julian Hyde
sharing values. - Julian Hyde On Feb. 5, 2015, 4:07 p.m., Milinda Pathirage wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https:

Re: Streaming SQL - object models, ASTs and algebras

2015-01-30 Thread Julian Hyde
Hive metadata store. > > In person, one thing you mentioned, Julian, was using hints, rather than > stuff baked into the syntax. If that's stomach-able, we could support > partitioning through hints, until we have a full blown metadata store. > > Thoughts? > > Cheers,

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
> On Jan 29, 2015, at 4:38 PM, Yi Pan wrote: > > I am wondering if I can get an average that's per 30 min window averages? > I.e. the following is the input events in a stream: > {10:01, ORCL, 10, 10} > {10:02, MSFT, 30, 30} > {10:03, ORCL, 100, 110} > {10:17, MSFT, 45, 75} > {10:59, ORCL,

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
> On Jan 29, 2015, at 4:42 PM, Yi Pan wrote: > > One more, Julian, do you mind if I post your proposed SQL model to > SAMZA-390? That way, more ppl can view it and we should continue discussion > there. Yes of course - feel free to post anything I post on public lists. Julian

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
s Calcite as a library. Similarly, if Samza wants a JDBC driver, they could distribute or sub-class Calcite's skeleton driver. Julian > On Jan 29, 2015, at 4:27 PM, Julian Hyde wrote: > > The validation logic is extensible within Calcite (for example, the validator > has an inter

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
ephrase it, is validation logic extensible? > > Thanks > Milinda > > On Thu, Jan 29, 2015 at 6:32 PM, Julian Hyde wrote: > >> >>> On Jan 29, 2015, at 3:04 PM, Yi Pan wrote: >>> >>> Hi, Julian, >>> >>> Thanks for sharing

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
> On Jan 29, 2015, at 3:04 PM, Yi Pan wrote: > > Hi, Julian, > > Thanks for sharing your idea! It is interesting and well organized. Let me > try to summarize the main difference between yours and the current proposal > are: > - removing the '[]' used to define the window specification, using O

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Julian Hyde
d by scanning the regular relational table during the operation. > Then, I agree that essentially the physical operators for non-stream and > stream queries may be merged in one model. Am I interpreting your idea > correctly? > > On Wed, Jan 28, 2015 at 4:52 PM, Julian Hyde wrote:

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Julian Hyde
s of new rows in a time-varying relation and > output to a stream of tuples. I agree on your comments on rstream, which > seems just have academic meanings. But I am not sure w/o the physical > operators performing the relation/stream conversions, how do we implement > the window operator? >

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Julian Hyde
On Jan 28, 2015, at 10:02 AM, Yi Pan wrote: > I try to understand your comments below: "But there is not a simple > mapping between > true SQL and a data-flow graph that you can execute." What is the specific > meaning of this statement? Could you elaborate on this a bit more? The structure of

Re: Streaming SQL - object models, ASTs and algebras

2015-01-28 Thread Julian Hyde
On Jan 28, 2015, at 10:05 AM, Yi Pan wrote: > One more: > I noticed in the above discussion, "SQL API", "Streaming SQL API" have been > used frequently. But I am not sure what exactly Julian means by "SQL API". > Julian, could you clarify on this? Were you referring to the Streaming SQL > syntax/

Streaming SQL - object models, ASTs and algebras

2015-01-27 Thread Julian Hyde
Hi all, This is my first post to the Samza list. I heard from Chris and Jay that you guys were looking into putting a SQL interface on Samza, so I thought I'd take a look. My background is in the SQL world, most recently with Apache Calcite, (although I have quite a lot of experience with streami