Re: Towards a spec for robust streaming SQL, Part 1

Tyler Akidau Fri, 21 Apr 2017 13:33:59 -0700

Thanks for reading, Luke. The simple answer is that CoGBK is basically
flatten + GBK. Flatten is a non-grouping operation that merges the input
streams into a single output stream. GBK then groups the data within that
single union stream as you might otherwise expect, yielding a single table.
So I think it doesn't really impact things much. Grouping, aggregation,
window merging etc all just act upon the merged stream and generate what is
effectively a merged table.


-Tyler

On Fri, Apr 21, 2017 at 12:36 PM Lukasz Cwik <[email protected]>
wrote:

> The doc is a good read.
>
> I think you do a great job of explaining table -> stream, stream -> stream,
> and stream -> table when there is only one stream.
> But when there are multiple streams reading/writing to a table, how does
> that impact what occurs?
> For example, with CoGBK you have multiple streams writing to a table, how
> does that impact window merging?
>
> On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau <[email protected]>
> wrote:
>
> > Hello Beam, Calcite, and Flink dev lists!
> >
> > Apologies for the big cross post, but I thought this might be something
> all
> > three communities would find relevant.
> >
> > Beam is finally making progress on a SQL DSL utilizing Calcite, thanks to
> > Mingmin Xu. As you can imagine, we need to come to some conclusion about
> > how to elegantly support the full suite of streaming functionality in the
> > Beam model in via Calcite SQL. You folks in the Flink community have been
> > pushing on this (e.g., adding windowing constructs, amongst others, thank
> > you! :-), but from my understanding we still don't have a full spec for
> how
> > to support robust streaming in SQL (including but not limited to, e.g., a
> > triggers analogue such as EMIT).
> >
> > I've been spending a lot of time thinking about this and have some
> opinions
> > about how I think it should look that I've already written down, so I
> > volunteered to try to drive forward agreement on a general streaming SQL
> > spec between our three communities (well, technically I volunteered to do
> > that w/ Beam and Calcite, but I figured you Flink folks might want to
> join
> > in since you're going that direction already anyway and will have useful
> > insights :-).
> >
> > My plan was to do this by sharing two docs:
> >
> >    1. The Beam Model : Streams & Tables - This one is for context, and
> >    really only mentions SQL in passing. But it describes the relationship
> >    between the Beam Model and the "streams & tables" way of thinking,
> which
> >    turns out to be useful in understanding what robust streaming in SQL
> > might
> >    look like. Many of you probably already know some or all of what's in
> > here,
> >    but I felt it was necessary to have it all written down in order to
> > justify
> >    some of the proposals I wanted to make in the second doc.
> >
> >    2. A streaming SQL spec for Calcite - The goal for this doc is that it
> >    would become a general specification for what robust streaming SQL in
> >    Calcite should look like. It would start out as a basic proposal of
> what
> >    things *could* look like (combining both what things look like now as
> > well
> >    as a set of proposed changes for the future), and we could all iterate
> > on
> >    it together until we get to something we're happy with.
> >
> > At this point, I have doc #1 ready, and it's a bit of a monster, so I
> > figured I'd share it and let folks hack at it with comments if they have
> > any, while I try to get the second doc ready in the meantime. As part of
> > getting doc #2 ready, I'll be starting a separate thread to try to gather
> > input on what things are already in flight for streaming SQL across the
> > various communities, to make sure the proposal captures everything that's
> > going on as accurately as it can.
> >
> > If you have any questions or comments, I'm interested to hear them.
> > Otherwise, here's doc #1, "The Beam Model : Streams & Tables":
> >
> >   http://s.apache.org/beam-streams-tables
> >
> > -Tyler
> >
>

Re: Towards a spec for robust streaming SQL, Part 1

Reply via email to