Re: Towards a spec for robust streaming SQL, Part 1

Kenneth Knowles Fri, 21 Apr 2017 14:27:06 -0700

There's something to be said about having different triggering depending on
which side of a join data comes from, perhaps?


(delightful doc, as usual)

Kenn

On Fri, Apr 21, 2017 at 1:33 PM, Tyler Akidau <[email protected]>
wrote:

> Thanks for reading, Luke. The simple answer is that CoGBK is basically
> flatten + GBK. Flatten is a non-grouping operation that merges the input
> streams into a single output stream. GBK then groups the data within that
> single union stream as you might otherwise expect, yielding a single table.
> So I think it doesn't really impact things much. Grouping, aggregation,
> window merging etc all just act upon the merged stream and generate what is
> effectively a merged table.
>
> -Tyler
>
> On Fri, Apr 21, 2017 at 12:36 PM Lukasz Cwik <[email protected]>
> wrote:
>
> > The doc is a good read.
> >
> > I think you do a great job of explaining table -> stream, stream ->
> stream,
> > and stream -> table when there is only one stream.
> > But when there are multiple streams reading/writing to a table, how does
> > that impact what occurs?
> > For example, with CoGBK you have multiple streams writing to a table, how
> > does that impact window merging?
> >
> > On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau <[email protected]
> >
> > wrote:
> >
> > > Hello Beam, Calcite, and Flink dev lists!
> > >
> > > Apologies for the big cross post, but I thought this might be something
> > all
> > > three communities would find relevant.
> > >
> > > Beam is finally making progress on a SQL DSL utilizing Calcite, thanks
> to
> > > Mingmin Xu. As you can imagine, we need to come to some conclusion
> about
> > > how to elegantly support the full suite of streaming functionality in
> the
> > > Beam model in via Calcite SQL. You folks in the Flink community have
> been
> > > pushing on this (e.g., adding windowing constructs, amongst others,
> thank
> > > you! :-), but from my understanding we still don't have a full spec for
> > how
> > > to support robust streaming in SQL (including but not limited to,
> e.g., a
> > > triggers analogue such as EMIT).
> > >
> > > I've been spending a lot of time thinking about this and have some
> > opinions
> > > about how I think it should look that I've already written down, so I
> > > volunteered to try to drive forward agreement on a general streaming
> SQL
> > > spec between our three communities (well, technically I volunteered to
> do
> > > that w/ Beam and Calcite, but I figured you Flink folks might want to
> > join
> > > in since you're going that direction already anyway and will have
> useful
> > > insights :-).
> > >
> > > My plan was to do this by sharing two docs:
> > >
> > >    1. The Beam Model : Streams & Tables - This one is for context, and
> > >    really only mentions SQL in passing. But it describes the
> relationship
> > >    between the Beam Model and the "streams & tables" way of thinking,
> > which
> > >    turns out to be useful in understanding what robust streaming in SQL
> > > might
> > >    look like. Many of you probably already know some or all of what's
> in
> > > here,
> > >    but I felt it was necessary to have it all written down in order to
> > > justify
> > >    some of the proposals I wanted to make in the second doc.
> > >
> > >    2. A streaming SQL spec for Calcite - The goal for this doc is that
> it
> > >    would become a general specification for what robust streaming SQL
> in
> > >    Calcite should look like. It would start out as a basic proposal of
> > what
> > >    things *could* look like (combining both what things look like now
> as
> > > well
> > >    as a set of proposed changes for the future), and we could all
> iterate
> > > on
> > >    it together until we get to something we're happy with.
> > >
> > > At this point, I have doc #1 ready, and it's a bit of a monster, so I
> > > figured I'd share it and let folks hack at it with comments if they
> have
> > > any, while I try to get the second doc ready in the meantime. As part
> of
> > > getting doc #2 ready, I'll be starting a separate thread to try to
> gather
> > > input on what things are already in flight for streaming SQL across the
> > > various communities, to make sure the proposal captures everything
> that's
> > > going on as accurately as it can.
> > >
> > > If you have any questions or comments, I'm interested to hear them.
> > > Otherwise, here's doc #1, "The Beam Model : Streams & Tables":
> > >
> > >   http://s.apache.org/beam-streams-tables
> > >
> > > -Tyler
> > >
> >
>

Re: Towards a spec for robust streaming SQL, Part 1

Reply via email to