Thanks for reading, Luke. The simple answer is that CoGBK is basically flatten + GBK. Flatten is a non-grouping operation that merges the input streams into a single output stream. GBK then groups the data within that single union stream as you might otherwise expect, yielding a single table. So I think it doesn't really impact things much. Grouping, aggregation, window merging etc all just act upon the merged stream and generate what is effectively a merged table.
-Tyler On Fri, Apr 21, 2017 at 12:36 PM Lukasz Cwik <lc...@google.com.invalid> wrote: > The doc is a good read. > > I think you do a great job of explaining table -> stream, stream -> stream, > and stream -> table when there is only one stream. > But when there are multiple streams reading/writing to a table, how does > that impact what occurs? > For example, with CoGBK you have multiple streams writing to a table, how > does that impact window merging? > > On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau <taki...@google.com.invalid> > wrote: > > > Hello Beam, Calcite, and Flink dev lists! > > > > Apologies for the big cross post, but I thought this might be something > all > > three communities would find relevant. > > > > Beam is finally making progress on a SQL DSL utilizing Calcite, thanks to > > Mingmin Xu. As you can imagine, we need to come to some conclusion about > > how to elegantly support the full suite of streaming functionality in the > > Beam model in via Calcite SQL. You folks in the Flink community have been > > pushing on this (e.g., adding windowing constructs, amongst others, thank > > you! :-), but from my understanding we still don't have a full spec for > how > > to support robust streaming in SQL (including but not limited to, e.g., a > > triggers analogue such as EMIT). > > > > I've been spending a lot of time thinking about this and have some > opinions > > about how I think it should look that I've already written down, so I > > volunteered to try to drive forward agreement on a general streaming SQL > > spec between our three communities (well, technically I volunteered to do > > that w/ Beam and Calcite, but I figured you Flink folks might want to > join > > in since you're going that direction already anyway and will have useful > > insights :-). > > > > My plan was to do this by sharing two docs: > > > > 1. The Beam Model : Streams & Tables - This one is for context, and > > really only mentions SQL in passing. But it describes the relationship > > between the Beam Model and the "streams & tables" way of thinking, > which > > turns out to be useful in understanding what robust streaming in SQL > > might > > look like. Many of you probably already know some or all of what's in > > here, > > but I felt it was necessary to have it all written down in order to > > justify > > some of the proposals I wanted to make in the second doc. > > > > 2. A streaming SQL spec for Calcite - The goal for this doc is that it > > would become a general specification for what robust streaming SQL in > > Calcite should look like. It would start out as a basic proposal of > what > > things *could* look like (combining both what things look like now as > > well > > as a set of proposed changes for the future), and we could all iterate > > on > > it together until we get to something we're happy with. > > > > At this point, I have doc #1 ready, and it's a bit of a monster, so I > > figured I'd share it and let folks hack at it with comments if they have > > any, while I try to get the second doc ready in the meantime. As part of > > getting doc #2 ready, I'll be starting a separate thread to try to gather > > input on what things are already in flight for streaming SQL across the > > various communities, to make sure the proposal captures everything that's > > going on as accurately as it can. > > > > If you have any questions or comments, I'm interested to hear them. > > Otherwise, here's doc #1, "The Beam Model : Streams & Tables": > > > > http://s.apache.org/beam-streams-tables > > > > -Tyler > > >