Re: Towards a spec for robust streaming SQL, Part 1

Tyler Akidau Mon, 08 May 2017 15:34:37 -0700

Any thoughts here Fabian? I'm planning to start sending out some more
emails towards the end of the week.


-Tyler


On Wed, Apr 26, 2017 at 8:18 AM Tyler Akidau <taki...@google.com> wrote:

> No worries, thanks for the heads up. Good luck wrapping all that stuff up.
>
> -Tyler
>
> On Tue, Apr 25, 2017 at 12:07 AM Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hi Tyler,
>>
>> thanks for pushing this effort and including the Flink list.
>> I haven't managed to read the doc yet, but just wanted to thank you for
>> the
>> write-up and let you know that I'm very interested in this discussion.
>>
>> We are very close to the feature freeze of Flink 1.3 and I'm quite busy
>> getting as many contributions merged before the release is forked off.
>> When that happened, I'll have more time to read and comment.
>>
>> Thanks,
>> Fabian
>>
>>
>> 2017-04-22 0:16 GMT+02:00 Tyler Akidau <taki...@google.com.invalid>:
>>
>> > Good point, when you start talking about anything less than a full join,
>> > triggers get involved to describe how one actually achieves the desired
>> > semantics, and they may end up being tied to just one of the inputs
>> (e.g.,
>> > you may only care about the watermark for one side of the join). Am
>> > expecting us to address these sorts of details more precisely in doc #2.
>> >
>> > -Tyler
>> >
>> > On Fri, Apr 21, 2017 at 2:26 PM Kenneth Knowles <k...@google.com.invalid
>> >
>> > wrote:
>> >
>> > > There's something to be said about having different triggering
>> depending
>> > on
>> > > which side of a join data comes from, perhaps?
>> > >
>> > > (delightful doc, as usual)
>> > >
>> > > Kenn
>> > >
>> > > On Fri, Apr 21, 2017 at 1:33 PM, Tyler Akidau
>> <taki...@google.com.invalid
>> > >
>> > > wrote:
>> > >
>> > > > Thanks for reading, Luke. The simple answer is that CoGBK is
>> basically
>> > > > flatten + GBK. Flatten is a non-grouping operation that merges the
>> > input
>> > > > streams into a single output stream. GBK then groups the data within
>> > that
>> > > > single union stream as you might otherwise expect, yielding a single
>> > > table.
>> > > > So I think it doesn't really impact things much. Grouping,
>> aggregation,
>> > > > window merging etc all just act upon the merged stream and generate
>> > what
>> > > is
>> > > > effectively a merged table.
>> > > >
>> > > > -Tyler
>> > > >
>> > > > On Fri, Apr 21, 2017 at 12:36 PM Lukasz Cwik
>> <lc...@google.com.invalid
>> > >
>> > > > wrote:
>> > > >
>> > > > > The doc is a good read.
>> > > > >
>> > > > > I think you do a great job of explaining table -> stream, stream
>> ->
>> > > > stream,
>> > > > > and stream -> table when there is only one stream.
>> > > > > But when there are multiple streams reading/writing to a table,
>> how
>> > > does
>> > > > > that impact what occurs?
>> > > > > For example, with CoGBK you have multiple streams writing to a
>> table,
>> > > how
>> > > > > does that impact window merging?
>> > > > >
>> > > > > On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau
>> > > <taki...@google.com.invalid
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hello Beam, Calcite, and Flink dev lists!
>> > > > > >
>> > > > > > Apologies for the big cross post, but I thought this might be
>> > > something
>> > > > > all
>> > > > > > three communities would find relevant.
>> > > > > >
>> > > > > > Beam is finally making progress on a SQL DSL utilizing Calcite,
>> > > thanks
>> > > > to
>> > > > > > Mingmin Xu. As you can imagine, we need to come to some
>> conclusion
>> > > > about
>> > > > > > how to elegantly support the full suite of streaming
>> functionality
>> > in
>> > > > the
>> > > > > > Beam model in via Calcite SQL. You folks in the Flink community
>> > have
>> > > > been
>> > > > > > pushing on this (e.g., adding windowing constructs, amongst
>> others,
>> > > > thank
>> > > > > > you! :-), but from my understanding we still don't have a full
>> spec
>> > > for
>> > > > > how
>> > > > > > to support robust streaming in SQL (including but not limited
>> to,
>> > > > e.g., a
>> > > > > > triggers analogue such as EMIT).
>> > > > > >
>> > > > > > I've been spending a lot of time thinking about this and have
>> some
>> > > > > opinions
>> > > > > > about how I think it should look that I've already written down,
>> > so I
>> > > > > > volunteered to try to drive forward agreement on a general
>> > streaming
>> > > > SQL
>> > > > > > spec between our three communities (well, technically I
>> volunteered
>> > > to
>> > > > do
>> > > > > > that w/ Beam and Calcite, but I figured you Flink folks might
>> want
>> > to
>> > > > > join
>> > > > > > in since you're going that direction already anyway and will
>> have
>> > > > useful
>> > > > > > insights :-).
>> > > > > >
>> > > > > > My plan was to do this by sharing two docs:
>> > > > > >
>> > > > > >    1. The Beam Model : Streams & Tables - This one is for
>> context,
>> > > and
>> > > > > >    really only mentions SQL in passing. But it describes the
>> > > > relationship
>> > > > > >    between the Beam Model and the "streams & tables" way of
>> > thinking,
>> > > > > which
>> > > > > >    turns out to be useful in understanding what robust
>> streaming in
>> > > SQL
>> > > > > > might
>> > > > > >    look like. Many of you probably already know some or all of
>> > what's
>> > > > in
>> > > > > > here,
>> > > > > >    but I felt it was necessary to have it all written down in
>> order
>> > > to
>> > > > > > justify
>> > > > > >    some of the proposals I wanted to make in the second doc.
>> > > > > >
>> > > > > >    2. A streaming SQL spec for Calcite - The goal for this doc
>> is
>> > > that
>> > > > it
>> > > > > >    would become a general specification for what robust
>> streaming
>> > SQL
>> > > > in
>> > > > > >    Calcite should look like. It would start out as a basic
>> proposal
>> > > of
>> > > > > what
>> > > > > >    things *could* look like (combining both what things look
>> like
>> > now
>> > > > as
>> > > > > > well
>> > > > > >    as a set of proposed changes for the future), and we could
>> all
>> > > > iterate
>> > > > > > on
>> > > > > >    it together until we get to something we're happy with.
>> > > > > >
>> > > > > > At this point, I have doc #1 ready, and it's a bit of a monster,
>> > so I
>> > > > > > figured I'd share it and let folks hack at it with comments if
>> they
>> > > > have
>> > > > > > any, while I try to get the second doc ready in the meantime. As
>> > part
>> > > > of
>> > > > > > getting doc #2 ready, I'll be starting a separate thread to try
>> to
>> > > > gather
>> > > > > > input on what things are already in flight for streaming SQL
>> across
>> > > the
>> > > > > > various communities, to make sure the proposal captures
>> everything
>> > > > that's
>> > > > > > going on as accurately as it can.
>> > > > > >
>> > > > > > If you have any questions or comments, I'm interested to hear
>> them.
>> > > > > > Otherwise, here's doc #1, "The Beam Model : Streams & Tables":
>> > > > > >
>> > > > > >   http://s.apache.org/beam-streams-tables
>> > > > > >
>> > > > > > -Tyler
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Towards a spec for robust streaming SQL, Part 1

Reply via email to