Tyler, Yes, dynamic table changes over time. You can find more details about dynamic table from this Flink blog ( https://flink.apache.org/news/2017/04/04/dynamic-tables.html). Fabian, me and Xiaowei posted it a week before the flink-forward@SF. "A dynamic table is a table that is continuously updated and can be queried like a regular, static table. However, in contrast to a query on a batch table which terminates and returns a static table as result, a query on a dynamic table runs continuously and produces a table that is continuously updated depending on the modification on the input table. Hence, the resulting table is a dynamic table as well."
Regards, Shaoxuan On Sat, May 13, 2017 at 1:05 AM, Tyler Akidau <taki...@google.com.invalid> wrote: > Being able to support an EMIT config independent of the query itself sounds > great for compatible use cases (which should be many :-). > > Shaoxuan, can you please refresh my memory what a dynamic table means in > Flink? It's basically just a state table, right? The "dynamic" part of the > name is to simply to imply it can evolve and change over time? > > -Tyler > > > On Fri, May 12, 2017 at 1:59 AM Shaoxuan Wang <wshaox...@gmail.com> wrote: > > > Thanks to Tyler and Fabian for sharing your thoughts. > > > > Regarding to the early/late update control of FLINK. IMO, each dynamic > > table can have an EMIT config. For FLINK table-API, this can be easily > > implemented in different manners, case by case. For instance, in window > > aggregate, we could define "when EMIT a result" via a windowConf per each > > window when we create windows. Unfortunately we do not have such > > flexibility (as compared with TableAPI) in SQL query, we need find a way > to > > embed this EMIT config. > > > > Regards, > > Shaoxuan > > > > > > On Fri, May 12, 2017 at 4:28 PM, Fabian Hueske <fhue...@gmail.com> > wrote: > > > > > 2017-05-11 7:14 GMT+02:00 Tyler Akidau <taki...@google.com.invalid>: > > > > > > > On Tue, May 9, 2017 at 3:06 PM Fabian Hueske <fhue...@gmail.com> > > wrote: > > > > > > > > > Hi Tyler, > > > > > > > > > > thank you very much for this excellent write-up and the super nice > > > > > visualizations! > > > > > You are discussing a lot of the things that we have been thinking > > about > > > > as > > > > > well from a different perspective. > > > > > IMO, yours and the Flink model are pretty much aligned although we > > use > > > a > > > > > different terminology (which is not yet completely established). So > > > there > > > > > should be room for unification ;-) > > > > > > > > > > > > > Good to hear, thanks for giving it a look. :-) > > > > > > > > > > > > > Allow me a few words on the current state in Flink. In the upcoming > > > 1.3.0 > > > > > release, we will have support for group window (TUMBLE, HOP, > > SESSION), > > > > OVER > > > > > RANGE/ROW window (without FOLLOWING), and non-windowed GROUP BY > > > > > aggregations. The group windows are triggered by watermark and the > > over > > > > > window and non-windowed aggregations emit for each input record > > > > > (AtCount(1)). The window aggregations do not support early or late > > > firing > > > > > (late records are dropped), so no updates here. However, the > > > non-windowed > > > > > aggregates produce updates (in acc and acc/retract mode). Based on > > this > > > > we > > > > > will work on better control for late updates and early firing as > well > > > as > > > > > joins in the next release. > > > > > > > > > > > > > Impressive. At this rate there's a good chance we'll just be doing > > > catchup > > > > and thanking you for building everything. ;-) Do you have ideas for > > what > > > > you want your early/late updates control to look like? That's one of > > the > > > > areas I'd like to see better defined for us. And how deep are you > going > > > > with joins? > > > > > > > > > > Right now (well actually I merged the change 1h ago) we are using a > > > QueryConfig object to specify state retention intervals to be able to > > clean > > > up state for inactive keys. > > > A running a query looks like this: > > > > > > // ------- > > > val env = StreamExecutionEnvironment.getExecutionEnvironment > > > val tEnv = TableEnvironment.getTableEnvironment(env) > > > > > > val qConf = tEnv.queryConfig.withIdleStateRetentionTime( > Time.hours(12)) > > // > > > state of inactive keys is kept for 12 hours > > > > > > val t: Table = tEnv.sql("SELECT a, COUNT(*) AS cnt FROM MyTable GROUP > BY > > > a") > > > val stream: DataStream[(Boolean, Row)] = t.toRetractStream[Row](qConf) > // > > > boolean flag for acc/retract > > > > > > env.execute() > > > // ------- > > > > > > We plan to use the QueryConfig also to specify early/late updates. Our > > main > > > motivation is to have a uniform and standard SQL for batch and > streaming. > > > Hence, we have to move the configuration out of the query. But I agree, > > > that it would be very nice to be able to include it in the query. I > think > > > it should not be difficult to optionally support an EMIT clause as > well. > > > > > > > > > > > > > > > Reading the document, I did not find any major difference in our > > > > concepts. > > > > > In fact, we are aiming to support the cases you are describing as > > well. > > > > > I have a question though. Would you classify an OVER aggregation > as a > > > > > stream -> stream or stream -> table operation? It collects records > to > > > > > aggregate them, but emits one record for each input row. Depending > on > > > the > > > > > window definition (i.e., with FOLLOWING CURRENT ROW), you can > compute > > > and > > > > > emit the result record when the input record is received. > > > > > > > > > > > > > I would call it a composite stream → stream operation (since SQL, > like > > > the > > > > standard Beam/Flink APIs, is a higher level set of constructs than > raw > > > > streams/tables operations) consisting of a stream → table windowed > > > grouping > > > > followed by a table → stream triggering on every element, basically > as > > > you > > > > described in the previous paragraph. > > > > > > > > > > > That makes sense. Thanks :-) > > > > > > > > > > -Tyler > > > > > > > > > > > > > > > > > > I'm looking forward to the second part. > > > > > > > > > > Cheers, Fabian > > > > > > > > > > > > > > > > > > > > 2017-05-09 0:34 GMT+02:00 Tyler Akidau <taki...@google.com.invalid > >: > > > > > > > > > > > Any thoughts here Fabian? I'm planning to start sending out some > > more > > > > > > emails towards the end of the week. > > > > > > > > > > > > -Tyler > > > > > > > > > > > > > > > > > > On Wed, Apr 26, 2017 at 8:18 AM Tyler Akidau <taki...@google.com > > > > > > wrote: > > > > > > > > > > > > > No worries, thanks for the heads up. Good luck wrapping all > that > > > > stuff > > > > > > up. > > > > > > > > > > > > > > -Tyler > > > > > > > > > > > > > > On Tue, Apr 25, 2017 at 12:07 AM Fabian Hueske < > > fhue...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > >> Hi Tyler, > > > > > > >> > > > > > > >> thanks for pushing this effort and including the Flink list. > > > > > > >> I haven't managed to read the doc yet, but just wanted to > thank > > > you > > > > > for > > > > > > >> the > > > > > > >> write-up and let you know that I'm very interested in this > > > > discussion. > > > > > > >> > > > > > > >> We are very close to the feature freeze of Flink 1.3 and I'm > > quite > > > > > busy > > > > > > >> getting as many contributions merged before the release is > > forked > > > > off. > > > > > > >> When that happened, I'll have more time to read and comment. > > > > > > >> > > > > > > >> Thanks, > > > > > > >> Fabian > > > > > > >> > > > > > > >> > > > > > > >> 2017-04-22 0:16 GMT+02:00 Tyler Akidau > > <taki...@google.com.invalid > > > > >: > > > > > > >> > > > > > > >> > Good point, when you start talking about anything less than > a > > > full > > > > > > join, > > > > > > >> > triggers get involved to describe how one actually achieves > > the > > > > > > desired > > > > > > >> > semantics, and they may end up being tied to just one of the > > > > inputs > > > > > > >> (e.g., > > > > > > >> > you may only care about the watermark for one side of the > > join). > > > > Am > > > > > > >> > expecting us to address these sorts of details more > precisely > > in > > > > doc > > > > > > #2. > > > > > > >> > > > > > > > >> > -Tyler > > > > > > >> > > > > > > > >> > On Fri, Apr 21, 2017 at 2:26 PM Kenneth Knowles > > > > > > <k...@google.com.invalid > > > > > > >> > > > > > > > >> > wrote: > > > > > > >> > > > > > > > >> > > There's something to be said about having different > > triggering > > > > > > >> depending > > > > > > >> > on > > > > > > >> > > which side of a join data comes from, perhaps? > > > > > > >> > > > > > > > > >> > > (delightful doc, as usual) > > > > > > >> > > > > > > > > >> > > Kenn > > > > > > >> > > > > > > > > >> > > On Fri, Apr 21, 2017 at 1:33 PM, Tyler Akidau > > > > > > >> <taki...@google.com.invalid > > > > > > >> > > > > > > > > >> > > wrote: > > > > > > >> > > > > > > > > >> > > > Thanks for reading, Luke. The simple answer is that > CoGBK > > is > > > > > > >> basically > > > > > > >> > > > flatten + GBK. Flatten is a non-grouping operation that > > > merges > > > > > the > > > > > > >> > input > > > > > > >> > > > streams into a single output stream. GBK then groups the > > > data > > > > > > within > > > > > > >> > that > > > > > > >> > > > single union stream as you might otherwise expect, > > yielding > > > a > > > > > > single > > > > > > >> > > table. > > > > > > >> > > > So I think it doesn't really impact things much. > Grouping, > > > > > > >> aggregation, > > > > > > >> > > > window merging etc all just act upon the merged stream > and > > > > > > generate > > > > > > >> > what > > > > > > >> > > is > > > > > > >> > > > effectively a merged table. > > > > > > >> > > > > > > > > > >> > > > -Tyler > > > > > > >> > > > > > > > > > >> > > > On Fri, Apr 21, 2017 at 12:36 PM Lukasz Cwik > > > > > > >> <lc...@google.com.invalid > > > > > > >> > > > > > > > > >> > > > wrote: > > > > > > >> > > > > > > > > > >> > > > > The doc is a good read. > > > > > > >> > > > > > > > > > > >> > > > > I think you do a great job of explaining table -> > > stream, > > > > > stream > > > > > > >> -> > > > > > > >> > > > stream, > > > > > > >> > > > > and stream -> table when there is only one stream. > > > > > > >> > > > > But when there are multiple streams reading/writing > to a > > > > > table, > > > > > > >> how > > > > > > >> > > does > > > > > > >> > > > > that impact what occurs? > > > > > > >> > > > > For example, with CoGBK you have multiple streams > > writing > > > > to a > > > > > > >> table, > > > > > > >> > > how > > > > > > >> > > > > does that impact window merging? > > > > > > >> > > > > > > > > > > >> > > > > On Thu, Apr 20, 2017 at 5:57 PM, Tyler Akidau > > > > > > >> > > <taki...@google.com.invalid > > > > > > >> > > > > > > > > > > >> > > > > wrote: > > > > > > >> > > > > > > > > > > >> > > > > > Hello Beam, Calcite, and Flink dev lists! > > > > > > >> > > > > > > > > > > > >> > > > > > Apologies for the big cross post, but I thought this > > > might > > > > > be > > > > > > >> > > something > > > > > > >> > > > > all > > > > > > >> > > > > > three communities would find relevant. > > > > > > >> > > > > > > > > > > > >> > > > > > Beam is finally making progress on a SQL DSL > utilizing > > > > > > Calcite, > > > > > > >> > > thanks > > > > > > >> > > > to > > > > > > >> > > > > > Mingmin Xu. As you can imagine, we need to come to > > some > > > > > > >> conclusion > > > > > > >> > > > about > > > > > > >> > > > > > how to elegantly support the full suite of streaming > > > > > > >> functionality > > > > > > >> > in > > > > > > >> > > > the > > > > > > >> > > > > > Beam model in via Calcite SQL. You folks in the > Flink > > > > > > community > > > > > > >> > have > > > > > > >> > > > been > > > > > > >> > > > > > pushing on this (e.g., adding windowing constructs, > > > > amongst > > > > > > >> others, > > > > > > >> > > > thank > > > > > > >> > > > > > you! :-), but from my understanding we still don't > > have > > > a > > > > > full > > > > > > >> spec > > > > > > >> > > for > > > > > > >> > > > > how > > > > > > >> > > > > > to support robust streaming in SQL (including but > not > > > > > limited > > > > > > >> to, > > > > > > >> > > > e.g., a > > > > > > >> > > > > > triggers analogue such as EMIT). > > > > > > >> > > > > > > > > > > > >> > > > > > I've been spending a lot of time thinking about this > > and > > > > > have > > > > > > >> some > > > > > > >> > > > > opinions > > > > > > >> > > > > > about how I think it should look that I've already > > > written > > > > > > down, > > > > > > >> > so I > > > > > > >> > > > > > volunteered to try to drive forward agreement on a > > > general > > > > > > >> > streaming > > > > > > >> > > > SQL > > > > > > >> > > > > > spec between our three communities (well, > technically > > I > > > > > > >> volunteered > > > > > > >> > > to > > > > > > >> > > > do > > > > > > >> > > > > > that w/ Beam and Calcite, but I figured you Flink > > folks > > > > > might > > > > > > >> want > > > > > > >> > to > > > > > > >> > > > > join > > > > > > >> > > > > > in since you're going that direction already anyway > > and > > > > will > > > > > > >> have > > > > > > >> > > > useful > > > > > > >> > > > > > insights :-). > > > > > > >> > > > > > > > > > > > >> > > > > > My plan was to do this by sharing two docs: > > > > > > >> > > > > > > > > > > > >> > > > > > 1. The Beam Model : Streams & Tables - This one > is > > > for > > > > > > >> context, > > > > > > >> > > and > > > > > > >> > > > > > really only mentions SQL in passing. But it > > describes > > > > the > > > > > > >> > > > relationship > > > > > > >> > > > > > between the Beam Model and the "streams & tables" > > way > > > > of > > > > > > >> > thinking, > > > > > > >> > > > > which > > > > > > >> > > > > > turns out to be useful in understanding what > robust > > > > > > >> streaming in > > > > > > >> > > SQL > > > > > > >> > > > > > might > > > > > > >> > > > > > look like. Many of you probably already know some > > or > > > > all > > > > > of > > > > > > >> > what's > > > > > > >> > > > in > > > > > > >> > > > > > here, > > > > > > >> > > > > > but I felt it was necessary to have it all > written > > > down > > > > > in > > > > > > >> order > > > > > > >> > > to > > > > > > >> > > > > > justify > > > > > > >> > > > > > some of the proposals I wanted to make in the > > second > > > > doc. > > > > > > >> > > > > > > > > > > > >> > > > > > 2. A streaming SQL spec for Calcite - The goal > for > > > this > > > > > doc > > > > > > >> is > > > > > > >> > > that > > > > > > >> > > > it > > > > > > >> > > > > > would become a general specification for what > > robust > > > > > > >> streaming > > > > > > >> > SQL > > > > > > >> > > > in > > > > > > >> > > > > > Calcite should look like. It would start out as a > > > basic > > > > > > >> proposal > > > > > > >> > > of > > > > > > >> > > > > what > > > > > > >> > > > > > things *could* look like (combining both what > > things > > > > look > > > > > > >> like > > > > > > >> > now > > > > > > >> > > > as > > > > > > >> > > > > > well > > > > > > >> > > > > > as a set of proposed changes for the future), and > > we > > > > > could > > > > > > >> all > > > > > > >> > > > iterate > > > > > > >> > > > > > on > > > > > > >> > > > > > it together until we get to something we're happy > > > with. > > > > > > >> > > > > > > > > > > > >> > > > > > At this point, I have doc #1 ready, and it's a bit > of > > a > > > > > > monster, > > > > > > >> > so I > > > > > > >> > > > > > figured I'd share it and let folks hack at it with > > > > comments > > > > > if > > > > > > >> they > > > > > > >> > > > have > > > > > > >> > > > > > any, while I try to get the second doc ready in the > > > > > meantime. > > > > > > As > > > > > > >> > part > > > > > > >> > > > of > > > > > > >> > > > > > getting doc #2 ready, I'll be starting a separate > > thread > > > > to > > > > > > try > > > > > > >> to > > > > > > >> > > > gather > > > > > > >> > > > > > input on what things are already in flight for > > streaming > > > > SQL > > > > > > >> across > > > > > > >> > > the > > > > > > >> > > > > > various communities, to make sure the proposal > > captures > > > > > > >> everything > > > > > > >> > > > that's > > > > > > >> > > > > > going on as accurately as it can. > > > > > > >> > > > > > > > > > > > >> > > > > > If you have any questions or comments, I'm > interested > > to > > > > > hear > > > > > > >> them. > > > > > > >> > > > > > Otherwise, here's doc #1, "The Beam Model : Streams > & > > > > > Tables": > > > > > > >> > > > > > > > > > > > >> > > > > > http://s.apache.org/beam-streams-tables > > > > > > >> > > > > > > > > > > > >> > > > > > -Tyler > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >