Re: [ANNOUNCE] Apache Flink 1.10.3 released
Yes, thanks for taking over the release! Best, Matthias On Mon, Feb 1, 2021 at 5:04 AM Zhu Zhu wrote: > Thanks Xintong for being the release manager and everyone who helped with > the release! > > Cheers, > Zhu > > Dian Fu 于2021年1月29日周五 下午5:56写道: > >> Thanks Xintong for driving this release! >> >> Regards, >> Dian >> >> 在 2021年1月29日,下午5:24,Till Rohrmann 写道: >> >> Thanks Xintong for being our release manager. Well done! >> >> Cheers, >> Till >> >> On Fri, Jan 29, 2021 at 9:50 AM Yang Wang wrote: >> >>> Thanks Xintong for driving this release. >>> >>> Best, >>> Yang >>> >>> Yu Li 于2021年1月29日周五 下午3:52写道: >>> Thanks Xintong for being our release manager and everyone else who made the release possible! Best Regards, Yu On Fri, 29 Jan 2021 at 15:05, Xintong Song wrote: > The Apache Flink community is very happy to announce the release of > Apache > Flink 1.10.3, which is the third bugfix release for the Apache Flink > 1.10 > series. > > Apache Flink® is an open-source stream processing framework for > distributed, high-performing, always-available, and accurate data > streaming > applications. > > The release is available for download at: > https://flink.apache.org/downloads.html > > Please check out the release blog post for an overview of the > improvements > for this bugfix release: > https://flink.apache.org/news/2021/01/29/release-1.10.3.html > > The full release notes are available in Jira: > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12348668 > > We would like to thank all contributors of the Apache Flink community > who > made this release possible! > > Regards, > Xintong Song >
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
Thanks Jane for starting the discussion. Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted as "setting the current order of modules", which is similar to "setting the current catalog" for `USE CATALOG`. Regarding #3, I'm fine to map modules purely by name because I think it satisfies all the use cases we have at hand. But I guess we need to make sure we're backward compatible, i.e. users don't need to change their yaml files to configure the modules. On Mon, Feb 1, 2021 at 3:10 PM Jark Wu wrote: > Thanks Jane for the summary and starting the discussion in the mailing > list. > > Here are my thoughts: > > 1) syntax to reorder modules > I agree with Rui Li it would be quite useful if we can have some syntax to > reorder modules. > I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`, > because USE has a more sense of effective and specifying ordering, than > RELOAD. > From my feeling, RELOAD just means we unregister and register x,y,z modules > again, > it sounds like other registered modules are still in use and in the order. > > 3) mapping modules purely by name > This can definitely improve the usability of loading modules, because > the 'type=' property > looks really redundant. We can think of this as a syntax sugar that the > default type value is the module name. > And we can support to specify 'type=' property in the future to allow > multiple modules for one module type. > > Besides, I would like to mention one more change, that the module name > proposed in FLIP-68 is a string literal. > But I think we are all on the same page to change it into a simple > (non-compound) identifier. > > LOAD/UNLOAD MODULE 'core' > ==> > LOAD/UNLOAD MODULE core > > > Best, > Jark > > > On Sat, 30 Jan 2021 at 04:00, Jane Chan wrote: > > > Hi everyone, > > > > I would like to start a discussion on FLINK-21045 [1] about supporting > > `LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by > > FLIP-68 [2] as following. > > > > -- load a module with the given name and append it to the end of the > module > > list > > LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)] > > > > --unload a module by name from the module list and other modules remain > in > > the same relative positions > > UNLOAD MODULE 'name' > > > > After a round of discussion on the Jira ticket, it seems some unanswered > > questions need more opinions and suggestions. > > > > 1. The way to redefine resolution order easily > > > > Rui Li suggested introducing `USE MODULES` and adding similar > > functionality to the API because > > > > > 1) It's very tedious to unload old modules just to reorder them. > > > > 2) Users may not even know how to "re-load" an old module if it was not > > > initially loaded by the user, e.g. don't know which type to use. > > > > > > Jane Chan wondered that module is not like the catalog which has a > > concept of namespace could specify, and `USE` sounds like a > > mutual-exclusive concept. > > Maybe `RELOAD MODULES` can express upgrading the priority of the > loaded > > module(s). > > > > > > 2. `LOAD/UNLOAD MODULE` v.s. `CREATE/DROP MODULE` syntax > > Jark Wu and Nicholas Jiang proposed to use `CREATE/DROP MODULE` > instead > > of `LOAD/UNLOAD MODULE` because > > > > > 1) From a pure SQL user's perspective, maybe `CREATE MODULE + USE > > MODULE` > > > is easier to use rather than `LOAD/UNLOAD`. > > > 2) This will be very similar to what the catalog used now. > > > > > > Timo Walther would rather stick to the agreed design because > > loading/unloading modules is a concept known from kernels etc. > > > > 3. Simplify the module design by mapping modules purely by name > > > > LOAD MODULE geo_utils > > LOAD MODULE hive WITH ('version'='2.1') -- no dedicated > 'type='/'module=' > > but allow only 1 module to be loaded parameterized > > UNLOAD hive > > USE MODULES hive, core > > > > > > Please find more details in the reference link. Looking forward to your > > feedback. > > > > [1] https://issues.apache.org/jira/browse/FLINK-21045# > > < > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules > > > > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules > > > > Best, > > Jane > > > -- Best regards! Rui Li
[jira] [Created] (FLINK-21225) OverConvertRule does not consider distinct
Timo Walther created FLINK-21225: Summary: OverConvertRule does not consider distinct Key: FLINK-21225 URL: https://issues.apache.org/jira/browse/FLINK-21225 Project: Flink Issue Type: Bug Components: Table SQL / Planner Reporter: Timo Walther We don't support OVER window distinct aggregates in Table API. Even though this is explicitly documented: https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tableApi.html#aggregations {code} // Distinct aggregation on over window Table result = orders .window(Over .partitionBy($("a")) .orderBy($("rowtime")) .preceding(UNBOUNDED_RANGE) .as("w")) .select( $("a"), $("b").avg().distinct().over($("w")), $("b").max().over($("w")), $("b").min().over($("w")) ); {code} The distinct flag is set to false in {{OverConvertRule}}. See also http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unknown-call-expression-avg-amount-when-use-distinct-in-Flink-Thanks-td40905.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] FLIP-159: Reactive Mode
Thanks Robert and congratulations on your first FLIP. +1 (non-binding) Matthias On Mon, Feb 1, 2021 at 4:22 AM Zhu Zhu wrote: > +1 (binding) > > Thanks, > Zhu > > Till Rohrmann 于2021年1月29日周五 下午10:23写道: > > > LGTM. Thanks for the work Robert! > > > > +1 (binding) > > > > Cheers, > > Till > > > > On Thu, Jan 28, 2021 at 11:27 AM Yang Wang > wrote: > > > > > Thanks Robert for your great work on this FLIP. This is really a big > step > > > to make Flink auto scalable. > > > > > > +1 (non-binding) > > > > > > > > > Best, > > > Yang > > > > > > Robert Metzger 于2021年1月28日周四 下午4:32写道: > > > > > > > @Yangze: That's something I overlooked. I should have waited. If > > FLIP-160 > > > > is rejected or undergoes fundamental changes, I'll cancel this vote > and > > > > rewrite FLIP-159. > > > > But I have the impression that there were no major concerns regarding > > > > FLIP-160 so far. > > > > > > > > On Thu, Jan 28, 2021 at 8:46 AM Yangze Guo > wrote: > > > > > > > > > Thanks for driving this, Robert! LGTM. > > > > > > > > > > +1 > > > > > > > > > > minor: Just a little confused about the program. It seems this > > > > > proposal relies on the FLIP-160, which is still under discussion. > > > > > Should we always vote for the prerequisite first? > > > > > > > > > > Best, > > > > > Yangze Guo > > > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 3:27 PM Xintong Song < > tonysong...@gmail.com> > > > > > wrote: > > > > > > > > > > > > Thanks Robert. LGTM. > > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 2:50 PM Robert Metzger < > > rmetz...@apache.org> > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > since the discussion [1] about FLIP-159 [2] seems to have > > reached a > > > > > > > consensus, I'd like to start a formal vote for the FLIP. > > > > > > > > > > > > > > Please vote +1 to approve the FLIP, or -1 with a comment. The > > vote > > > > > will be > > > > > > > open at least until Tuesday, Feb 2nd. > > > > > > > > > > > > > > Best, > > > > > > > Robert > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/ra688faf9dca036500f0445c55671e70ba96c70f942afe650e9db8374%40%3Cdev.flink.apache.org%3E > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-159%3A+Reactive+Mode > > > > > > > > > > > > > > > > > > > > >
Re: [VOTE] FLIP-160: Declarative scheduler
+1 (non-binding) Thanks, Matthias On Mon, Feb 1, 2021 at 4:22 AM Zhu Zhu wrote: > +1 (binding) > > Thanks, > Zhu > > Yang Wang 于2021年2月1日周一 上午11:04写道: > > > +1 (non-binding) > > > > Best, > > Yang > > > > Yangze Guo 于2021年2月1日周一 上午9:50写道: > > > > > +1 (non-binding) > > > > > > Best, > > > Yangze Guo > > > > > > On Sat, Jan 30, 2021 at 8:40 AM Xintong Song > > > wrote: > > > > > > > > +1 (binding) > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Fri, Jan 29, 2021 at 10:41 PM Robert Metzger > > > > wrote: > > > > > > > > > ... and thanks a lot for your work :) I'm really excited about > > finally > > > > > adding this feature to Flink! > > > > > > > > > > > > > > > On Fri, Jan 29, 2021 at 3:40 PM Robert Metzger < > rmetz...@apache.org> > > > > > wrote: > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > On Fri, Jan 29, 2021 at 3:23 PM Till Rohrmann < > > trohrm...@apache.org> > > > > > > wrote: > > > > > > > > > > > >> Hi all, > > > > > >> > > > > > >> since the discussion [1] about FLIP-160 [2] seems to have > reached > > a > > > > > >> consensus, I'd like to start a formal vote for the FLIP. > > > > > >> > > > > > >> Please vote +1 to approve the FLIP, or -1 with a comment. The > vote > > > will > > > > > be > > > > > >> open at least until Wednesday, Feb 3rd. > > > > > >> > > > > > >> Cheers, > > > > > >> Till > > > > > >> > > > > > >> [1] > > > > > >> > > > > > >> > > > > > > > > > > > https://lists.apache.org/thread.html/r604a01f739639e2a5f093fbe7894c172125530332747ecf6990a6ce4%40%3Cdev.flink.apache.org%3E > > > > > >> [2] > > > > > >> > > > > > >> > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Declarative+Scheduler > > > > > >> > > > > > > > > > > > > > > > >
[jira] [Created] (FLINK-21226) Reintroduce TableColumn.of for backwards compatibility
Timo Walther created FLINK-21226: Summary: Reintroduce TableColumn.of for backwards compatibility Key: FLINK-21226 URL: https://issues.apache.org/jira/browse/FLINK-21226 Project: Flink Issue Type: Bug Components: Table SQL / API Reporter: Timo Walther Assignee: Timo Walther FLINK-19341 accidentally dropped the {{TableColumn.of}} method that might be used frequently by downstream projects. We should reintroduce it for 1-2 releases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
Thanks for starting the discussion Jane. I'm fine with using `USE` for reordering the modules. I agree with Jark to not use a string literal for the module name but an identifer. However, to simplify the design I would completely remove the `type=` property because having multiple ways of defining the same thing might be confusing without providing additional benefits. I also think that users should not be able to load the same module multiple times. Regarding Rui's comment, the YAML file should not be affected by this change and we can leave this part of the API untouched. We need to update the `ModuleFactory` anyways because it still uses the deprecated `TableFactory` class. Regards, Timo On 01.02.21 09:18, Rui Li wrote: Thanks Jane for starting the discussion. Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted as "setting the current order of modules", which is similar to "setting the current catalog" for `USE CATALOG`. Regarding #3, I'm fine to map modules purely by name because I think it satisfies all the use cases we have at hand. But I guess we need to make sure we're backward compatible, i.e. users don't need to change their yaml files to configure the modules. On Mon, Feb 1, 2021 at 3:10 PM Jark Wu wrote: Thanks Jane for the summary and starting the discussion in the mailing list. Here are my thoughts: 1) syntax to reorder modules I agree with Rui Li it would be quite useful if we can have some syntax to reorder modules. I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`, because USE has a more sense of effective and specifying ordering, than RELOAD. From my feeling, RELOAD just means we unregister and register x,y,z modules again, it sounds like other registered modules are still in use and in the order. 3) mapping modules purely by name This can definitely improve the usability of loading modules, because the 'type=' property looks really redundant. We can think of this as a syntax sugar that the default type value is the module name. And we can support to specify 'type=' property in the future to allow multiple modules for one module type. Besides, I would like to mention one more change, that the module name proposed in FLIP-68 is a string literal. But I think we are all on the same page to change it into a simple (non-compound) identifier. LOAD/UNLOAD MODULE 'core' ==> LOAD/UNLOAD MODULE core Best, Jark On Sat, 30 Jan 2021 at 04:00, Jane Chan wrote: Hi everyone, I would like to start a discussion on FLINK-21045 [1] about supporting `LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by FLIP-68 [2] as following. -- load a module with the given name and append it to the end of the module list LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)] --unload a module by name from the module list and other modules remain in the same relative positions UNLOAD MODULE 'name' After a round of discussion on the Jira ticket, it seems some unanswered questions need more opinions and suggestions. 1. The way to redefine resolution order easily Rui Li suggested introducing `USE MODULES` and adding similar functionality to the API because 1) It's very tedious to unload old modules just to reorder them. 2) Users may not even know how to "re-load" an old module if it was not initially loaded by the user, e.g. don't know which type to use. Jane Chan wondered that module is not like the catalog which has a concept of namespace could specify, and `USE` sounds like a mutual-exclusive concept. Maybe `RELOAD MODULES` can express upgrading the priority of the loaded module(s). 2. `LOAD/UNLOAD MODULE` v.s. `CREATE/DROP MODULE` syntax Jark Wu and Nicholas Jiang proposed to use `CREATE/DROP MODULE` instead of `LOAD/UNLOAD MODULE` because 1) From a pure SQL user's perspective, maybe `CREATE MODULE + USE MODULE` is easier to use rather than `LOAD/UNLOAD`. 2) This will be very similar to what the catalog used now. Timo Walther would rather stick to the agreed design because loading/unloading modules is a concept known from kernels etc. 3. Simplify the module design by mapping modules purely by name LOAD MODULE geo_utils LOAD MODULE hive WITH ('version'='2.1') -- no dedicated 'type='/'module=' but allow only 1 module to be loaded parameterized UNLOAD hive USE MODULES hive, core Please find more details in the reference link. Looking forward to your feedback. [1] https://issues.apache.org/jira/browse/FLINK-21045# < https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules Best, Jane
Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior
Parts of the FLIP can already be implemented without a completed voting, e.g. there is no doubt that we should support TIME(9). However, I don't see a benefit of reworking the time functions to rework them again later. If we lock the time on query-start the implementation of the previsouly mentioned functions will be completely different. Regards, Timo On 01.02.21 02:37, Kurt Young wrote: I also prefer to not expand this FLIP further, but we could open a discussion thread right after this FLIP being accepted and start coding & reviewing. Make technique discussion and coding more pipelined will improve efficiency. Best, Kurt On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu wrote: Hi, Timo I do think that this topic must be part of the FLIP as well. Esp. if the FLIP has the title "time function behavior" and this is clearly a behavioral aspect. We are performing a heavy refactoring of the SQL query semantics in Flink here which will affect a lot of users. We cannot rework the time functions a third time after this. I checked a couple of other vendors. It seems that they all lock the timestamp when the query is started. And as you said, in this case both mature (Oracle) and less mature systems (Hive, MySQL) have the same behavior. FLIP-162> “These problems come from the fact that lots of time-related functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone." The motivation of FLIP-162 is to correct the wrong time-related function value which caused by timezone. And after our discussed before, we found it's related to the function return type compared to SQL standard and other vendors and thus we proposed make the function return type also consistent. This is the exact meaning of the FLIP title and that the FLIP plans to do. But for the function materialization mechanism, we didn't consider yet as a part of our plan because we need to fix the timezone and function type issues no matter we modify the function materialization mechanism in the future or not. So I think it's not belong to this FLIP scope. It will have been a great work if we can fix current FLIP's 7 proposals well, we don't want to expand the scope again Eps it's not part of our plan. What do you think? @Timo And what’s others' thoughts? @Jark @Kurt Best, Leonard Flink should not differ. I fear that we have to adopt this behavior as well to call us standard compliant. Otherwise it will also not be possible to have Hive compatibility with proper semantics. It could lead to unintended behavior. I see two options for this topic: 1) Clearly distinguish between query-start and processing time MySQL offers NOW() and SYSDATE() to distinguish the two semantics. We could run all the previously discussed functions that have a meaning in other systems in query-start time and use a different name for processing time. `SYS_TIMESTAMP`, `SYS_DATE`, `SYS_TIME`, `SYS_LOCALTIMESTAMP`, `SYS_LOCALDATE`, `SYS_LOCALTIME`? 2) Introduce a config option We are non-compliant by default and allow typical batch behavior if needed via a config option. But batch/stream unification should not mean that we disable certain unification aspects by default. What do you think? Regards, Timo On 28.01.21 16:51, Leonard Xu wrote: Hi, Timo I'm sorry that I need to open another discussion thread befoe voting but I think we should also discuss this in this FLIP before it pops up at a later stage. How do we want our time functions to behave in long running queries? It’s okay to open this thread. Although I don’t want to consider the function value materialization in this FLIP scope, I could try explain something. See also: https://stackoverflow.com/questions/5522656/sql-now-in-long-running-query I think this was never discussed thoroughly. Actually CURRENT_TIMESTAMP/NOW/LOCALTIMESTAMP should have slightly different semantics than PROCTIME(). What it is our current behavior? Are we materializing those time values during planning? Currently CURRENT_TIMESTAMP/NOW/LOCALTIMESTAMP keeps same behavior in both Batch and Stream world, the function value is materialized for per record not the query start(plan phase). For PROCTIME(), it also keeps same behavior in both Batch and Stream world, in fact we just supported PROCTIME() in Batch last week[1]. In one word, we keep same semantics/behavior for Batch and Stream. Esp. long running batch queries might suffer from inconsistencies here. When a timestamp is produced by one operator using CURRENT_TIMESTAMP and a different one might filter relating to CURRENT_TIMESTAMP. It’s a good question, and I've found some users have asked simillar questions in user/user-zh mail-list, given a fact that many Batch systems like Hive/Presto using the value of query start, but it’s not suitable for Stream engine, for example user will use CURRENT_TIMESTAMP to define event time. As a unified Batch/Stream SQL engine, keep sa
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
Hi Jark and Rui, Thanks for the discussions. Regarding #1, I'm fine with `USE MODULES` syntax, and > It can be interpreted as "setting the current order of modules", which is > similar to "setting the current catalog" for `USE CATALOG`. > I would like to confirm that the unmentioned modules remain in the same relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`, then `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`. Regarding #3, I'm fine with mapping modules purely by name, and I think Jark raised a good point on making the module name a simple identifier instead of a string literal. For backward compatibility, since we haven't supported this syntax yet, the affected users are those who defined modules in the YAML configuration file. Maybe we can eliminate the 'type' from the 'requiredContext' to make it optional. Thus the proposed mapping mechanism could use the module name to lookup the suitable factory, and in the meanwhile updating documentation to encourage users to simplify their YAML configuration. And in the long run, we can deprecate the 'type'. Best, Jane On Mon, Feb 1, 2021 at 4:19 PM Rui Li wrote: > Thanks Jane for starting the discussion. > > Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted as > "setting the current order of modules", which is similar to "setting the > current catalog" for `USE CATALOG`. > > Regarding #3, I'm fine to map modules purely by name because I think it > satisfies all the use cases we have at hand. But I guess we need to make > sure we're backward compatible, i.e. users don't need to change their yaml > files to configure the modules. > > On Mon, Feb 1, 2021 at 3:10 PM Jark Wu wrote: > > > Thanks Jane for the summary and starting the discussion in the mailing > > list. > > > > Here are my thoughts: > > > > 1) syntax to reorder modules > > I agree with Rui Li it would be quite useful if we can have some syntax > to > > reorder modules. > > I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`, > > because USE has a more sense of effective and specifying ordering, than > > RELOAD. > > From my feeling, RELOAD just means we unregister and register x,y,z > modules > > again, > > it sounds like other registered modules are still in use and in the > order. > > > > 3) mapping modules purely by name > > This can definitely improve the usability of loading modules, because > > the 'type=' property > > looks really redundant. We can think of this as a syntax sugar that the > > default type value is the module name. > > And we can support to specify 'type=' property in the future to allow > > multiple modules for one module type. > > > > Besides, I would like to mention one more change, that the module name > > proposed in FLIP-68 is a string literal. > > But I think we are all on the same page to change it into a simple > > (non-compound) identifier. > > > > LOAD/UNLOAD MODULE 'core' > > ==> > > LOAD/UNLOAD MODULE core > > > > > > Best, > > Jark > > > > > > On Sat, 30 Jan 2021 at 04:00, Jane Chan wrote: > > > > > Hi everyone, > > > > > > I would like to start a discussion on FLINK-21045 [1] about supporting > > > `LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by > > > FLIP-68 [2] as following. > > > > > > -- load a module with the given name and append it to the end of the > > module > > > list > > > LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)] > > > > > > --unload a module by name from the module list and other modules remain > > in > > > the same relative positions > > > UNLOAD MODULE 'name' > > > > > > After a round of discussion on the Jira ticket, it seems some > unanswered > > > questions need more opinions and suggestions. > > > > > > 1. The way to redefine resolution order easily > > > > > > Rui Li suggested introducing `USE MODULES` and adding similar > > > functionality to the API because > > > > > > > 1) It's very tedious to unload old modules just to reorder them. > > > > > > 2) Users may not even know how to "re-load" an old module if it was > not > > > > initially loaded by the user, e.g. don't know which type to use. > > > > > > > > > Jane Chan wondered that module is not like the catalog which has a > > > concept of namespace could specify, and `USE` sounds like a > > > mutual-exclusive concept. > > > Maybe `RELOAD MODULES` can express upgrading the priority of the > > loaded > > > module(s). > > > > > > > > > 2. `LOAD/UNLOAD MODULE` v.s. `CREATE/DROP MODULE` syntax > > > Jark Wu and Nicholas Jiang proposed to use `CREATE/DROP MODULE` > > instead > > > of `LOAD/UNLOAD MODULE` because > > > > > > > 1) From a pure SQL user's perspective, maybe `CREATE MODULE + USE > > > MODULE` > > > > is easier to use rather than `LOAD/UNLOAD`. > > > > 2) This will be very similar to what the catalog used now. > > > > > > > > > Timo Walther would rather stick to the agreed design because > > > loading/unloading modules is a concept known from
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
IMHO I would rather unload the not mentioned modules. The statement expresses `USE` that implicilty implies that the other modules are "not used". What do others think? Regards, Timo On 01.02.21 11:28, Jane Chan wrote: Hi Jark and Rui, Thanks for the discussions. Regarding #1, I'm fine with `USE MODULES` syntax, and It can be interpreted as "setting the current order of modules", which is similar to "setting the current catalog" for `USE CATALOG`. I would like to confirm that the unmentioned modules remain in the same relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`, then `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`. Regarding #3, I'm fine with mapping modules purely by name, and I think Jark raised a good point on making the module name a simple identifier instead of a string literal. For backward compatibility, since we haven't supported this syntax yet, the affected users are those who defined modules in the YAML configuration file. Maybe we can eliminate the 'type' from the 'requiredContext' to make it optional. Thus the proposed mapping mechanism could use the module name to lookup the suitable factory, and in the meanwhile updating documentation to encourage users to simplify their YAML configuration. And in the long run, we can deprecate the 'type'. Best, Jane On Mon, Feb 1, 2021 at 4:19 PM Rui Li wrote: Thanks Jane for starting the discussion. Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted as "setting the current order of modules", which is similar to "setting the current catalog" for `USE CATALOG`. Regarding #3, I'm fine to map modules purely by name because I think it satisfies all the use cases we have at hand. But I guess we need to make sure we're backward compatible, i.e. users don't need to change their yaml files to configure the modules. On Mon, Feb 1, 2021 at 3:10 PM Jark Wu wrote: Thanks Jane for the summary and starting the discussion in the mailing list. Here are my thoughts: 1) syntax to reorder modules I agree with Rui Li it would be quite useful if we can have some syntax to reorder modules. I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`, because USE has a more sense of effective and specifying ordering, than RELOAD. From my feeling, RELOAD just means we unregister and register x,y,z modules again, it sounds like other registered modules are still in use and in the order. 3) mapping modules purely by name This can definitely improve the usability of loading modules, because the 'type=' property looks really redundant. We can think of this as a syntax sugar that the default type value is the module name. And we can support to specify 'type=' property in the future to allow multiple modules for one module type. Besides, I would like to mention one more change, that the module name proposed in FLIP-68 is a string literal. But I think we are all on the same page to change it into a simple (non-compound) identifier. LOAD/UNLOAD MODULE 'core' ==> LOAD/UNLOAD MODULE core Best, Jark On Sat, 30 Jan 2021 at 04:00, Jane Chan wrote: Hi everyone, I would like to start a discussion on FLINK-21045 [1] about supporting `LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by FLIP-68 [2] as following. -- load a module with the given name and append it to the end of the module list LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)] --unload a module by name from the module list and other modules remain in the same relative positions UNLOAD MODULE 'name' After a round of discussion on the Jira ticket, it seems some unanswered questions need more opinions and suggestions. 1. The way to redefine resolution order easily Rui Li suggested introducing `USE MODULES` and adding similar functionality to the API because 1) It's very tedious to unload old modules just to reorder them. 2) Users may not even know how to "re-load" an old module if it was not initially loaded by the user, e.g. don't know which type to use. Jane Chan wondered that module is not like the catalog which has a concept of namespace could specify, and `USE` sounds like a mutual-exclusive concept. Maybe `RELOAD MODULES` can express upgrading the priority of the loaded module(s). 2. `LOAD/UNLOAD MODULE` v.s. `CREATE/DROP MODULE` syntax Jark Wu and Nicholas Jiang proposed to use `CREATE/DROP MODULE` instead of `LOAD/UNLOAD MODULE` because 1) From a pure SQL user's perspective, maybe `CREATE MODULE + USE MODULE` is easier to use rather than `LOAD/UNLOAD`. 2) This will be very similar to what the catalog used now. Timo Walther would rather stick to the agreed design because loading/unloading modules is a concept known from kernels etc. 3. Simplify the module design by mapping modules purely by name LOAD MODULE geo_utils LOAD MODULE hive WITH ('version'='2.1') -- no dedicated 'type='/'module=' but allow only 1 module to be lo
[jira] [Created] (FLINK-21227) Fixed: Upgrade Version com.google.protobuf:protoc:3.5.1:exe to 3.7.0 for (power)ppc64le support
Bivas created FLINK-21227: - Summary: Fixed: Upgrade Version com.google.protobuf:protoc:3.5.1:exe to 3.7.0 for (power)ppc64le support Key: FLINK-21227 URL: https://issues.apache.org/jira/browse/FLINK-21227 Project: Flink Issue Type: Improvement Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Reporter: Bivas com.google.protobuf:*protoc:3.5.1:exe* was not supported by power. Later versions released multi-arch support including power(ppc64le).Using *protoc:3.7.0:exe* able to build and E2E tests passed successfully. https://github.com/bivasda1/flink/blob/master/flink-formats/flink-parquet/pom.xml#L253 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-21228) [Kinesis][Producer] Deadlock in KinesisProducer
Danny Cranmer created FLINK-21228: - Summary: [Kinesis][Producer] Deadlock in KinesisProducer Key: FLINK-21228 URL: https://issues.apache.org/jira/browse/FLINK-21228 Project: Flink Issue Type: Bug Components: Connectors / Kinesis Affects Versions: 1.12.1 Reporter: Danny Cranmer *Background* Application sink failed and resulted in: - Indefinite backpressure being applied - Exception never thrown causing job to fail Application running with: {code:java} flinkKinesisProducer.setQueueLimit(1); flinkKinesisProducer.setFailOnError(true); {code} - {{KinesisProducer}} is waiting for queue to empty before sending the next record ([code|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L303]) - KPL ran out of memory, which raised an error, however this is processed async ([code|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L275]) - {{KinesisProducer}} would have rethrown the error and restarted the job, however operator stuck in an infinite loop enforcing the queue limit (which never clears) ([code|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L306]) *Proposal* - {{checkAndPropagateAsyncError()}} while enforcing queue limit in {{enforceQueueLimit()}} to break deadlock -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
Hi Timo, thanks for the discussion. It seems to reach an agreement regarding #3 that <1> Module name should better be a simple identifier rather than a string literal. <2> Property `type` is redundant and should be removed, and mapping will rely on the module name because loading a module multiple times just using a different module name doesn't make much sense. <3> We should migrate to the newer API rather than the deprecated `TableFactory` class. Regarding #1, I think the point lies in whether changing the resolution order implies an `unload` operation explicitly (i.e., users could sense it). What do others think? Best, Jane On Mon, Feb 1, 2021 at 6:41 PM Timo Walther wrote: > IMHO I would rather unload the not mentioned modules. The statement > expresses `USE` that implicilty implies that the other modules are "not > used". What do others think? > > Regards, > Timo > > > On 01.02.21 11:28, Jane Chan wrote: > > Hi Jark and Rui, > > > > Thanks for the discussions. > > > > Regarding #1, I'm fine with `USE MODULES` syntax, and > > > >> It can be interpreted as "setting the current order of modules", which > is > >> similar to "setting the current catalog" for `USE CATALOG`. > >> > > I would like to confirm that the unmentioned modules remain in the same > > relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`, > then > > `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`. > > > > Regarding #3, I'm fine with mapping modules purely by name, and I think > > Jark raised a good point on making the module name a simple identifier > > instead of a string literal. For backward compatibility, since we haven't > > supported this syntax yet, the affected users are those who defined > modules > > in the YAML configuration file. Maybe we can eliminate the 'type' from > the > > 'requiredContext' to make it optional. Thus the proposed mapping > mechanism > > could use the module name to lookup the suitable factory, and in the > > meanwhile updating documentation to encourage users to simplify their > YAML > > configuration. And in the long run, we can deprecate the 'type'. > > > > Best, > > Jane > > > > On Mon, Feb 1, 2021 at 4:19 PM Rui Li wrote: > > > >> Thanks Jane for starting the discussion. > >> > >> Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted > as > >> "setting the current order of modules", which is similar to "setting the > >> current catalog" for `USE CATALOG`. > >> > >> Regarding #3, I'm fine to map modules purely by name because I think it > >> satisfies all the use cases we have at hand. But I guess we need to make > >> sure we're backward compatible, i.e. users don't need to change their > yaml > >> files to configure the modules. > >> > >> On Mon, Feb 1, 2021 at 3:10 PM Jark Wu wrote: > >> > >>> Thanks Jane for the summary and starting the discussion in the mailing > >>> list. > >>> > >>> Here are my thoughts: > >>> > >>> 1) syntax to reorder modules > >>> I agree with Rui Li it would be quite useful if we can have some syntax > >> to > >>> reorder modules. > >>> I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`, > >>> because USE has a more sense of effective and specifying ordering, than > >>> RELOAD. > >>> From my feeling, RELOAD just means we unregister and register x,y,z > >> modules > >>> again, > >>> it sounds like other registered modules are still in use and in the > >> order. > >>> > >>> 3) mapping modules purely by name > >>> This can definitely improve the usability of loading modules, because > >>> the 'type=' property > >>> looks really redundant. We can think of this as a syntax sugar that the > >>> default type value is the module name. > >>> And we can support to specify 'type=' property in the future to allow > >>> multiple modules for one module type. > >>> > >>> Besides, I would like to mention one more change, that the module name > >>> proposed in FLIP-68 is a string literal. > >>> But I think we are all on the same page to change it into a simple > >>> (non-compound) identifier. > >>> > >>> LOAD/UNLOAD MODULE 'core' > >>> ==> > >>> LOAD/UNLOAD MODULE core > >>> > >>> > >>> Best, > >>> Jark > >>> > >>> > >>> On Sat, 30 Jan 2021 at 04:00, Jane Chan wrote: > >>> > Hi everyone, > > I would like to start a discussion on FLINK-21045 [1] about supporting > `LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by > FLIP-68 [2] as following. > > -- load a module with the given name and append it to the end of the > >>> module > list > LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)] > > --unload a module by name from the module list and other modules > remain > >>> in > the same relative positions > UNLOAD MODULE 'name' > > After a round of discussion on the Jira ticket, it seems some > >> unanswered > questions need more opinions and suggestions. > > 1. The way to redefine resolution order easi
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
I agree with Timo that the USE implies the specified modules are in use in the specified order and others are not used. This would be easier to know what's the result list and order after the USE statement. That means: if current modules in order are x, y, z. And `USE MODULES z, y` means current modules in order are z, y. But I would like to not unload the unmentioned modules in the USE statement. Because it seems strange that USE will implicitly remove modules. In the above example, the user may type the wrong modules list using USE by mistake and would like to declare the list again, the user has to create the module again with some properties he may don't know. Therefore, I propose the USE statement just specifies the current module lists and doesn't unload modules. Besides that, we may need a new syntax to list all the modules including not used but loaded. We can introduce SHOW FULL MODULES for this purpose with an additional `used` column. For example: Flink SQL> list modules: --- | modules | --- | x | | y | | z | --- Flink SQL> USE MODULES z, y; Flink SQL> show modules: --- | modules | --- | z | | y | --- Flink SQL> show FULL modules; --- | modules | used | --- | z | true | | y | true | | x | false | --- Flink SQL> USE MODULES z, y, x; Flink SQL> show modules; --- | modules | --- | z | | y | | x | --- What do you think? Best, Jark On Mon, 1 Feb 2021 at 19:02, Jane Chan wrote: > Hi Timo, thanks for the discussion. > > It seems to reach an agreement regarding #3 that <1> Module name should > better be a simple identifier rather than a string literal. <2> Property > `type` is redundant and should be removed, and mapping will rely on the > module name because loading a module multiple times just using a different > module name doesn't make much sense. <3> We should migrate to the newer API > rather than the deprecated `TableFactory` class. > > Regarding #1, I think the point lies in whether changing the resolution > order implies an `unload` operation explicitly (i.e., users could sense > it). What do others think? > > Best, > Jane > > On Mon, Feb 1, 2021 at 6:41 PM Timo Walther wrote: > > > IMHO I would rather unload the not mentioned modules. The statement > > expresses `USE` that implicilty implies that the other modules are "not > > used". What do others think? > > > > Regards, > > Timo > > > > > > On 01.02.21 11:28, Jane Chan wrote: > > > Hi Jark and Rui, > > > > > > Thanks for the discussions. > > > > > > Regarding #1, I'm fine with `USE MODULES` syntax, and > > > > > >> It can be interpreted as "setting the current order of modules", which > > is > > >> similar to "setting the current catalog" for `USE CATALOG`. > > >> > > > I would like to confirm that the unmentioned modules remain in the same > > > relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`, > > then > > > `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`. > > > > > > Regarding #3, I'm fine with mapping modules purely by name, and I think > > > Jark raised a good point on making the module name a simple identifier > > > instead of a string literal. For backward compatibility, since we > haven't > > > supported this syntax yet, the affected users are those who defined > > modules > > > in the YAML configuration file. Maybe we can eliminate the 'type' from > > the > > > 'requiredContext' to make it optional. Thus the proposed mapping > > mechanism > > > could use the module name to lookup the suitable factory, and in the > > > meanwhile updating documentation to encourage users to simplify their > > YAML > > > configuration. And in the long run, we can deprecate the 'type'. > > > > > > Best, > > > Jane > > > > > > On Mon, Feb 1, 2021 at 4:19 PM Rui Li wrote: > > > > > >> Thanks Jane for starting the discussion. > > >> > > >> Regarding #1, I also prefer `USE MODULES` syntax. It can be > interpreted > > as > > >> "setting the current order of modules", which is similar to "setting > the > > >> current catalog" for `USE CATALOG`. > > >> > > >> Regarding #3, I'm fine to map modules purely by name because I think > it > > >> satisfies all the use cases we have at hand. But I guess we need to > make > > >> sure we're backward compatible, i.e. users don't need to change their > > yaml > > >> files to configure the modules. > > >> > > >> On Mon, Feb 1, 2021 at 3:10 PM Jark Wu wrote: > > >> > > >>> Thanks Jane for the summary and starting the discussion in the > mailing > > >>> list. > > >>> > > >>> Here are my thoughts: > > >>> > > >>> 1) syntax to reorder modules > > >>> I agree with Rui Li it would be quite useful if we can have some > syntax > > >> to > > >>> reorder modules. > > >>> I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, > z`, > > >>> because USE has a more sense of effective and specif
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
If `USE MODULES` implies unloading modules that are not listed, does it also imply loading modules that are not previously loaded, especially since we're mapping modules by name now? On Mon, Feb 1, 2021 at 8:20 PM Jark Wu wrote: > I agree with Timo that the USE implies the specified modules are in use in > the specified order and others are not used. > This would be easier to know what's the result list and order after the USE > statement. > That means: if current modules in order are x, y, z. And `USE MODULES z, y` > means current modules in order are z, y. > > But I would like to not unload the unmentioned modules in the USE > statement. Because it seems strange that USE > will implicitly remove modules. In the above example, the user may type the > wrong modules list using USE by mistake > and would like to declare the list again, the user has to create the > module again with some properties he may don't know. Therefore, I propose > the USE statement just specifies the current module lists and doesn't > unload modules. > Besides that, we may need a new syntax to list all the modules including > not used but loaded. > We can introduce SHOW FULL MODULES for this purpose with an additional > `used` column. > > For example: > > Flink SQL> list modules: > --- > | modules | > --- > | x | > | y | > | z | > --- > Flink SQL> USE MODULES z, y; > Flink SQL> show modules: > --- > | modules | > --- > | z | > | y | > --- > Flink SQL> show FULL modules; > --- > | modules | used | > --- > | z | true | > | y | true | > | x | false | > --- > Flink SQL> USE MODULES z, y, x; > Flink SQL> show modules; > --- > | modules | > --- > | z | > | y | > | x | > --- > > What do you think? > > Best, > Jark > > On Mon, 1 Feb 2021 at 19:02, Jane Chan wrote: > > > Hi Timo, thanks for the discussion. > > > > It seems to reach an agreement regarding #3 that <1> Module name should > > better be a simple identifier rather than a string literal. <2> Property > > `type` is redundant and should be removed, and mapping will rely on the > > module name because loading a module multiple times just using a > different > > module name doesn't make much sense. <3> We should migrate to the newer > API > > rather than the deprecated `TableFactory` class. > > > > Regarding #1, I think the point lies in whether changing the resolution > > order implies an `unload` operation explicitly (i.e., users could sense > > it). What do others think? > > > > Best, > > Jane > > > > On Mon, Feb 1, 2021 at 6:41 PM Timo Walther wrote: > > > > > IMHO I would rather unload the not mentioned modules. The statement > > > expresses `USE` that implicilty implies that the other modules are "not > > > used". What do others think? > > > > > > Regards, > > > Timo > > > > > > > > > On 01.02.21 11:28, Jane Chan wrote: > > > > Hi Jark and Rui, > > > > > > > > Thanks for the discussions. > > > > > > > > Regarding #1, I'm fine with `USE MODULES` syntax, and > > > > > > > >> It can be interpreted as "setting the current order of modules", > which > > > is > > > >> similar to "setting the current catalog" for `USE CATALOG`. > > > >> > > > > I would like to confirm that the unmentioned modules remain in the > same > > > > relative order? E.g., if there are three loaded modules `X`, `Y`, > `Z`, > > > then > > > > `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`. > > > > > > > > Regarding #3, I'm fine with mapping modules purely by name, and I > think > > > > Jark raised a good point on making the module name a simple > identifier > > > > instead of a string literal. For backward compatibility, since we > > haven't > > > > supported this syntax yet, the affected users are those who defined > > > modules > > > > in the YAML configuration file. Maybe we can eliminate the 'type' > from > > > the > > > > 'requiredContext' to make it optional. Thus the proposed mapping > > > mechanism > > > > could use the module name to lookup the suitable factory, and in the > > > > meanwhile updating documentation to encourage users to simplify their > > > YAML > > > > configuration. And in the long run, we can deprecate the 'type'. > > > > > > > > Best, > > > > Jane > > > > > > > > On Mon, Feb 1, 2021 at 4:19 PM Rui Li wrote: > > > > > > > >> Thanks Jane for starting the discussion. > > > >> > > > >> Regarding #1, I also prefer `USE MODULES` syntax. It can be > > interpreted > > > as > > > >> "setting the current order of modules", which is similar to "setting > > the > > > >> current catalog" for `USE CATALOG`. > > > >> > > > >> Regarding #3, I'm fine to map modules purely by name because I think > > it > > > >> satisfies all the use cases we have at hand. But I guess we need to > > make > > > >> sure we're backward compatible, i.e. users don't need to change > their > > > yaml > > > >> files to co
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
+1 to Jark's proposal I like the difference between just loading and actually enabling these modules. @Rui: I would use the same behavior as catalogs here. You cannot `USE` a catalog without creating it before. Another question is whether a LOAD operation also adds the module to the enabled list by default? Regards, Timo On 01.02.21 13:52, Rui Li wrote: If `USE MODULES` implies unloading modules that are not listed, does it also imply loading modules that are not previously loaded, especially since we're mapping modules by name now? On Mon, Feb 1, 2021 at 8:20 PM Jark Wu wrote: I agree with Timo that the USE implies the specified modules are in use in the specified order and others are not used. This would be easier to know what's the result list and order after the USE statement. That means: if current modules in order are x, y, z. And `USE MODULES z, y` means current modules in order are z, y. But I would like to not unload the unmentioned modules in the USE statement. Because it seems strange that USE will implicitly remove modules. In the above example, the user may type the wrong modules list using USE by mistake and would like to declare the list again, the user has to create the module again with some properties he may don't know. Therefore, I propose the USE statement just specifies the current module lists and doesn't unload modules. Besides that, we may need a new syntax to list all the modules including not used but loaded. We can introduce SHOW FULL MODULES for this purpose with an additional `used` column. For example: Flink SQL> list modules: --- | modules | --- | x | | y | | z | --- Flink SQL> USE MODULES z, y; Flink SQL> show modules: --- | modules | --- | z | | y | --- Flink SQL> show FULL modules; --- | modules | used | --- | z | true | | y | true | | x | false | --- Flink SQL> USE MODULES z, y, x; Flink SQL> show modules; --- | modules | --- | z | | y | | x | --- What do you think? Best, Jark On Mon, 1 Feb 2021 at 19:02, Jane Chan wrote: Hi Timo, thanks for the discussion. It seems to reach an agreement regarding #3 that <1> Module name should better be a simple identifier rather than a string literal. <2> Property `type` is redundant and should be removed, and mapping will rely on the module name because loading a module multiple times just using a different module name doesn't make much sense. <3> We should migrate to the newer API rather than the deprecated `TableFactory` class. Regarding #1, I think the point lies in whether changing the resolution order implies an `unload` operation explicitly (i.e., users could sense it). What do others think? Best, Jane On Mon, Feb 1, 2021 at 6:41 PM Timo Walther wrote: IMHO I would rather unload the not mentioned modules. The statement expresses `USE` that implicilty implies that the other modules are "not used". What do others think? Regards, Timo On 01.02.21 11:28, Jane Chan wrote: Hi Jark and Rui, Thanks for the discussions. Regarding #1, I'm fine with `USE MODULES` syntax, and It can be interpreted as "setting the current order of modules", which is similar to "setting the current catalog" for `USE CATALOG`. I would like to confirm that the unmentioned modules remain in the same relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`, then `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`. Regarding #3, I'm fine with mapping modules purely by name, and I think Jark raised a good point on making the module name a simple identifier instead of a string literal. For backward compatibility, since we haven't supported this syntax yet, the affected users are those who defined modules in the YAML configuration file. Maybe we can eliminate the 'type' from the 'requiredContext' to make it optional. Thus the proposed mapping mechanism could use the module name to lookup the suitable factory, and in the meanwhile updating documentation to encourage users to simplify their YAML configuration. And in the long run, we can deprecate the 'type'. Best, Jane On Mon, Feb 1, 2021 at 4:19 PM Rui Li wrote: Thanks Jane for starting the discussion. Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted as "setting the current order of modules", which is similar to "setting the current catalog" for `USE CATALOG`. Regarding #3, I'm fine to map modules purely by name because I think it satisfies all the use cases we have at hand. But I guess we need to make sure we're backward compatible, i.e. users don't need to change their yaml files to configure the modules. On Mon, Feb 1, 2021 at 3:10 PM Jark Wu wrote: Thanks Jane for the summary and starting the discussion in the mailing list. Here are my thoughts: 1) syntax to reorder modules I agree wit
Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior
Hi, all I’ve discussed with @Timo @Jark about the time function evaluation further. We reach a consensus that we’d better address the time function evaluation(function value materialization) in this FLIP as well. We’re fine with introducing an option table.exec.time-function-evaluation to control the materialize time point of time function value. The time function includes LOCALTIME LOCALTIMESTAMP CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP NOW() The default value of table.exec.time-function-evaluation is 'per-record', which means Flink evaluates the function value per record, we recommend users config this option value for their streaming pipe lines. Another valid option value is ’query-start’, which means Flink evaluates the function value at the query start, we recommend users config this option value for their batch pipelines. In the future, more valid evaluation option value like ‘auto' may be supported if there’re new requirements, e.g: support ‘auto’ option which evaluates time function value per-record in streaming mode and evaluates time function value at query start in batch mode. Alternative1: Introduce function like CURRENT_TIMESTAMP2/CURRENT_TIMESTAMP_NOW which evaluates function value at query start. This may confuse users a bit that we provide two similar functions but with different return value. Alternative2: Do not introduce any configuration/function, control the function evaluation by pipeline execution mode. This may produce different result when user use their streaming pipeline sql to run a batch pipeline(e.g backfilling), and user also can not control these function behavior. How do you think ? Thanks, Leonard > 在 2021年2月1日,18:23,Timo Walther 写道: > > Parts of the FLIP can already be implemented without a completed voting, e.g. > there is no doubt that we should support TIME(9). > > However, I don't see a benefit of reworking the time functions to rework them > again later. If we lock the time on query-start the implementation of the > previsouly mentioned functions will be completely different. > > Regards, > Timo > > > On 01.02.21 02:37, Kurt Young wrote: >> I also prefer to not expand this FLIP further, but we could open a >> discussion thread >> right after this FLIP being accepted and start coding & reviewing. Make >> technique >> discussion and coding more pipelined will improve efficiency. >> Best, >> Kurt >> On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu wrote: >>> Hi, Timo >>> I do think that this topic must be part of the FLIP as well. Esp. if the >>> FLIP has the title "time function behavior" and this is clearly a >>> behavioral aspect. We are performing a heavy refactoring of the SQL query >>> semantics in Flink here which will affect a lot of users. We cannot rework >>> the time functions a third time after this. I checked a couple of other vendors. It seems that they all lock the >>> timestamp when the query is started. And as you said, in this case both >>> mature (Oracle) and less mature systems (Hive, MySQL) have the same >>> behavior. >>> >>> FLIP-162> “These problems come from the fact that lots of time-related >>> functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and >>> CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone." >>> The motivation of FLIP-162 is to correct the wrong time-related function >>> value which caused by timezone. And after our discussed before, we found >>> it's related to the function return type compared to SQL standard and other >>> vendors and thus we proposed make the function return type also consistent. >>> This is the exact meaning of the FLIP title and that the FLIP plans to do. >>> >>> But for the function materialization mechanism, we didn't consider yet as >>> a part of our plan because we need to fix the timezone and function type >>> issues no matter we modify the function materialization mechanism in the >>> future or not. >>> So I think it's not belong to this FLIP scope. >>> >>> It will have been a great work if we can fix current FLIP's 7 proposals >>> well, we don't want to expand the scope again Eps it's not part of our >>> plan. >>> >>> What do you think? @Timo >>> >>> And what’s others' thoughts? @Jark @Kurt >>> >>> Best, >>> Leonard >>> >>> >>> >>> Flink should not differ. I fear that we have to adopt this behavior as >>> well to call us standard compliant. Otherwise it will also not be possible >>> to have Hive compatibility with proper semantics. It could lead to >>> unintended behavior. I see two options for this topic: 1) Clearly distinguish between query-start and processing time MySQL offers NOW() and SYSDATE() to distinguish the two semantics. We >>> could run all the previously discussed functions that have a meaning in >>> other systems in query-start time and use a different name for processing >>> time. `SYS_TIMESTAMP`, `SYS_DATE`, `SYS_TIME`, `SYS_LOC
[jira] [Created] (FLINK-21229) Support ssl connection with schema registry format
Dawid Wysakowicz created FLINK-21229: Summary: Support ssl connection with schema registry format Key: FLINK-21229 URL: https://issues.apache.org/jira/browse/FLINK-21229 Project: Flink Issue Type: Improvement Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table SQL / Ecosystem Reporter: Dawid Wysakowicz There is no way to pass an ssl configuration to the Confluent schema registry format. We should be able to pass: {code} - schema.registry.ssl.truststore.location - schema.registry.ssl.truststore.password - schema.registry.ssl.keystore.location - schema.registry.ssl.keystore.password {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-21230) Add protobuf wrapper types for the StateFun SDK types.
Igal Shilman created FLINK-21230: Summary: Add protobuf wrapper types for the StateFun SDK types. Key: FLINK-21230 URL: https://issues.apache.org/jira/browse/FLINK-21230 Project: Flink Issue Type: Task Components: Stateful Functions Reporter: Igal Shilman Add primitive wrapper types to be used for messaging and state as part of the new type system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior
Hi Leonard, thanks for considering this issue as well. +1 for the proposed config option. Let's start a voting thread once the FLIP document has been updated if there are no other concerns? Thanks, Timo On 01.02.21 15:07, Leonard Xu wrote: Hi, all I’ve discussed with @Timo @Jark about the time function evaluation further. We reach a consensus that we’d better address the time function evaluation(function value materialization) in this FLIP as well. We’re fine with introducing an option table.exec.time-function-evaluation to control the materialize time point of time function value. The time function includes LOCALTIME LOCALTIMESTAMP CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP NOW() The default value of table.exec.time-function-evaluation is 'per-record', which means Flink evaluates the function value per record, we recommend users config this option value for their streaming pipe lines. Another valid option value is ’query-start’, which means Flink evaluates the function value at the query start, we recommend users config this option value for their batch pipelines. In the future, more valid evaluation option value like ‘auto' may be supported if there’re new requirements, e.g: support ‘auto’ option which evaluates time function value per-record in streaming mode and evaluates time function value at query start in batch mode. Alternative1: Introduce function like CURRENT_TIMESTAMP2/CURRENT_TIMESTAMP_NOW which evaluates function value at query start. This may confuse users a bit that we provide two similar functions but with different return value. Alternative2: Do not introduce any configuration/function, control the function evaluation by pipeline execution mode. This may produce different result when user use their streaming pipeline sql to run a batch pipeline(e.g backfilling), and user also can not control these function behavior. How do you think ? Thanks, Leonard 在 2021年2月1日,18:23,Timo Walther 写道: Parts of the FLIP can already be implemented without a completed voting, e.g. there is no doubt that we should support TIME(9). However, I don't see a benefit of reworking the time functions to rework them again later. If we lock the time on query-start the implementation of the previsouly mentioned functions will be completely different. Regards, Timo On 01.02.21 02:37, Kurt Young wrote: I also prefer to not expand this FLIP further, but we could open a discussion thread right after this FLIP being accepted and start coding & reviewing. Make technique discussion and coding more pipelined will improve efficiency. Best, Kurt On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu wrote: Hi, Timo I do think that this topic must be part of the FLIP as well. Esp. if the FLIP has the title "time function behavior" and this is clearly a behavioral aspect. We are performing a heavy refactoring of the SQL query semantics in Flink here which will affect a lot of users. We cannot rework the time functions a third time after this. I checked a couple of other vendors. It seems that they all lock the timestamp when the query is started. And as you said, in this case both mature (Oracle) and less mature systems (Hive, MySQL) have the same behavior. FLIP-162> “These problems come from the fact that lots of time-related functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone." The motivation of FLIP-162 is to correct the wrong time-related function value which caused by timezone. And after our discussed before, we found it's related to the function return type compared to SQL standard and other vendors and thus we proposed make the function return type also consistent. This is the exact meaning of the FLIP title and that the FLIP plans to do. But for the function materialization mechanism, we didn't consider yet as a part of our plan because we need to fix the timezone and function type issues no matter we modify the function materialization mechanism in the future or not. So I think it's not belong to this FLIP scope. It will have been a great work if we can fix current FLIP's 7 proposals well, we don't want to expand the scope again Eps it's not part of our plan. What do you think? @Timo And what’s others' thoughts? @Jark @Kurt Best, Leonard Flink should not differ. I fear that we have to adopt this behavior as well to call us standard compliant. Otherwise it will also not be possible to have Hive compatibility with proper semantics. It could lead to unintended behavior. I see two options for this topic: 1) Clearly distinguish between query-start and processing time MySQL offers NOW() and SYSDATE() to distinguish the two semantics. We could run all the previously discussed functions that have a meaning in other systems in query-start time and use a different name for processing time. `SYS_TIMESTAMP`, `SYS_DATE`, `SYS_TIME`, `SYS_LOCALTIMESTAMP`, `SYS_LOCAL
[jira] [Created] (FLINK-21231) add "SHOW VIEWS" to SQL client
tim yu created FLINK-21231: -- Summary: add "SHOW VIEWS" to SQL client Key: FLINK-21231 URL: https://issues.apache.org/jira/browse/FLINK-21231 Project: Flink Issue Type: New Feature Reporter: tim yu SQL client cannot run "SHOW VIEWS" statement now, We should add the "SHOW VIEWS" implement to it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
+1 to Jark's proposal To make it clearer, will `module#getFunctionDefinition()` return empty suppose the module is loaded but not enabled? Best, Jane On Mon, Feb 1, 2021 at 10:02 PM Timo Walther wrote: > +1 to Jark's proposal > > I like the difference between just loading and actually enabling these > modules. > > @Rui: I would use the same behavior as catalogs here. You cannot `USE` a > catalog without creating it before. > > Another question is whether a LOAD operation also adds the module to the > enabled list by default? > > Regards, > Timo > > On 01.02.21 13:52, Rui Li wrote: > > If `USE MODULES` implies unloading modules that are not listed, does it > > also imply loading modules that are not previously loaded, especially > since > > we're mapping modules by name now? > > > > On Mon, Feb 1, 2021 at 8:20 PM Jark Wu wrote: > > > >> I agree with Timo that the USE implies the specified modules are in use > in > >> the specified order and others are not used. > >> This would be easier to know what's the result list and order after the > USE > >> statement. > >> That means: if current modules in order are x, y, z. And `USE MODULES > z, y` > >> means current modules in order are z, y. > >> > >> But I would like to not unload the unmentioned modules in the USE > >> statement. Because it seems strange that USE > >> will implicitly remove modules. In the above example, the user may type > the > >> wrong modules list using USE by mistake > >> and would like to declare the list again, the user has to create the > >> module again with some properties he may don't know. Therefore, I > propose > >> the USE statement just specifies the current module lists and doesn't > >> unload modules. > >> Besides that, we may need a new syntax to list all the modules including > >> not used but loaded. > >> We can introduce SHOW FULL MODULES for this purpose with an additional > >> `used` column. > >> > >> For example: > >> > >> Flink SQL> list modules: > >> --- > >> | modules | > >> --- > >> | x | > >> | y | > >> | z | > >> --- > >> Flink SQL> USE MODULES z, y; > >> Flink SQL> show modules: > >> --- > >> | modules | > >> --- > >> | z | > >> | y | > >> --- > >> Flink SQL> show FULL modules; > >> --- > >> | modules | used | > >> --- > >> | z | true | > >> | y | true | > >> | x | false | > >> --- > >> Flink SQL> USE MODULES z, y, x; > >> Flink SQL> show modules; > >> --- > >> | modules | > >> --- > >> | z | > >> | y | > >> | x | > >> --- > >> > >> What do you think? > >> > >> Best, > >> Jark > >> > >> On Mon, 1 Feb 2021 at 19:02, Jane Chan wrote: > >> > >>> Hi Timo, thanks for the discussion. > >>> > >>> It seems to reach an agreement regarding #3 that <1> Module name should > >>> better be a simple identifier rather than a string literal. <2> > Property > >>> `type` is redundant and should be removed, and mapping will rely on the > >>> module name because loading a module multiple times just using a > >> different > >>> module name doesn't make much sense. <3> We should migrate to the newer > >> API > >>> rather than the deprecated `TableFactory` class. > >>> > >>> Regarding #1, I think the point lies in whether changing the resolution > >>> order implies an `unload` operation explicitly (i.e., users could sense > >>> it). What do others think? > >>> > >>> Best, > >>> Jane > >>> > >>> On Mon, Feb 1, 2021 at 6:41 PM Timo Walther > wrote: > >>> > IMHO I would rather unload the not mentioned modules. The statement > expresses `USE` that implicilty implies that the other modules are > "not > used". What do others think? > > Regards, > Timo > > > On 01.02.21 11:28, Jane Chan wrote: > > Hi Jark and Rui, > > > > Thanks for the discussions. > > > > Regarding #1, I'm fine with `USE MODULES` syntax, and > > > >> It can be interpreted as "setting the current order of modules", > >> which > is > >> similar to "setting the current catalog" for `USE CATALOG`. > >> > > I would like to confirm that the unmentioned modules remain in the > >> same > > relative order? E.g., if there are three loaded modules `X`, `Y`, > >> `Z`, > then > > `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`. > > > > Regarding #3, I'm fine with mapping modules purely by name, and I > >> think > > Jark raised a good point on making the module name a simple > >> identifier > > instead of a string literal. For backward compatibility, since we > >>> haven't > > supported this syntax yet, the affected users are those who defined > modules > > in the YAML configuration file. Maybe we can eliminate the 'type' > >> from > the > > 'requiredContext' to make it optional. Thus the proposed mapping > mechanism > > could use th
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
Not the module itself but the ModuleManager should handle this case, yes. Regards, Timo On 01.02.21 17:35, Jane Chan wrote: +1 to Jark's proposal To make it clearer, will `module#getFunctionDefinition()` return empty suppose the module is loaded but not enabled? Best, Jane On Mon, Feb 1, 2021 at 10:02 PM Timo Walther wrote: +1 to Jark's proposal I like the difference between just loading and actually enabling these modules. @Rui: I would use the same behavior as catalogs here. You cannot `USE` a catalog without creating it before. Another question is whether a LOAD operation also adds the module to the enabled list by default? Regards, Timo On 01.02.21 13:52, Rui Li wrote: If `USE MODULES` implies unloading modules that are not listed, does it also imply loading modules that are not previously loaded, especially since we're mapping modules by name now? On Mon, Feb 1, 2021 at 8:20 PM Jark Wu wrote: I agree with Timo that the USE implies the specified modules are in use in the specified order and others are not used. This would be easier to know what's the result list and order after the USE statement. That means: if current modules in order are x, y, z. And `USE MODULES z, y` means current modules in order are z, y. But I would like to not unload the unmentioned modules in the USE statement. Because it seems strange that USE will implicitly remove modules. In the above example, the user may type the wrong modules list using USE by mistake and would like to declare the list again, the user has to create the module again with some properties he may don't know. Therefore, I propose the USE statement just specifies the current module lists and doesn't unload modules. Besides that, we may need a new syntax to list all the modules including not used but loaded. We can introduce SHOW FULL MODULES for this purpose with an additional `used` column. For example: Flink SQL> list modules: --- | modules | --- | x | | y | | z | --- Flink SQL> USE MODULES z, y; Flink SQL> show modules: --- | modules | --- | z | | y | --- Flink SQL> show FULL modules; --- | modules | used | --- | z | true | | y | true | | x | false | --- Flink SQL> USE MODULES z, y, x; Flink SQL> show modules; --- | modules | --- | z | | y | | x | --- What do you think? Best, Jark On Mon, 1 Feb 2021 at 19:02, Jane Chan wrote: Hi Timo, thanks for the discussion. It seems to reach an agreement regarding #3 that <1> Module name should better be a simple identifier rather than a string literal. <2> Property `type` is redundant and should be removed, and mapping will rely on the module name because loading a module multiple times just using a different module name doesn't make much sense. <3> We should migrate to the newer API rather than the deprecated `TableFactory` class. Regarding #1, I think the point lies in whether changing the resolution order implies an `unload` operation explicitly (i.e., users could sense it). What do others think? Best, Jane On Mon, Feb 1, 2021 at 6:41 PM Timo Walther wrote: IMHO I would rather unload the not mentioned modules. The statement expresses `USE` that implicilty implies that the other modules are "not used". What do others think? Regards, Timo On 01.02.21 11:28, Jane Chan wrote: Hi Jark and Rui, Thanks for the discussions. Regarding #1, I'm fine with `USE MODULES` syntax, and It can be interpreted as "setting the current order of modules", which is similar to "setting the current catalog" for `USE CATALOG`. I would like to confirm that the unmentioned modules remain in the same relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`, then `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`. Regarding #3, I'm fine with mapping modules purely by name, and I think Jark raised a good point on making the module name a simple identifier instead of a string literal. For backward compatibility, since we haven't supported this syntax yet, the affected users are those who defined modules in the YAML configuration file. Maybe we can eliminate the 'type' from the 'requiredContext' to make it optional. Thus the proposed mapping mechanism could use the module name to lookup the suitable factory, and in the meanwhile updating documentation to encourage users to simplify their YAML configuration. And in the long run, we can deprecate the 'type'. Best, Jane On Mon, Feb 1, 2021 at 4:19 PM Rui Li wrote: Thanks Jane for starting the discussion. Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted as "setting the current order of modules", which is similar to "setting the current catalog" for `USE CATALOG`. Regarding #3, I'm fine to map modules purely by name because I think it satisfies all the use case
[jira] [Created] (FLINK-21232) Introduce pluggable Hadoop delegation token providers
jackwangcs created FLINK-21232: -- Summary: Introduce pluggable Hadoop delegation token providers Key: FLINK-21232 URL: https://issues.apache.org/jira/browse/FLINK-21232 Project: Flink Issue Type: New Feature Components: Deployment / YARN Reporter: jackwangcs Introduce a pluggable delegation provider via SPI. Delegation provider could be placed in connector related code and is more extendable comparing using reflection way to obtain DTs. Email dicussion thread: [https://lists.apache.org/thread.html/rbedb6e769358a10c6426c4c42b3b51cdbed48a3b6537e4ebde912bc0%40%3Cdev.flink.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [Announce] Documentation Freeze Feb 2nd
Reminder On Thu, Jan 28, 2021 at 9:07 AM Seth Wiesman wrote: > Hi Everyone, > > As part of migrating the flink documentation to Hugo, I need to ask the > community for a short documentation freeze. This will keep us from losing > any contributions during the migration. I am proposing the freeze begin > next week February 2nd with the goal to get the change merged in that week. > I have been working to have everything ready to go to keep this as > unobtrusive as possible. > > If you have a pending documentation PR please do not rush it. If it is not > merged before next Tuesday you will simply need to rebase after the > migration is completed. > > Please let me know if you have any questions. > > Seth >
Re: [DISCUSS] Support obtaining Hive delegation tokens when submitting application to Yarn
Hi Rui, I agree with you that we can implement puggable DT providers firstly, I have created a new ticket to track it: https://issues.apache.org/jira/browse/FLINK-21232. Spark’s HadoopDelegationTokenManager could run on both client and driver(Application master) sides. On the client side, HadoopDelegationTokenManager is used to obtain tokens when users use keytab or `kinit`(credential cache); on the driver side, it is used to obtain and renew DTs. To explain this, there are some backgrounds. Currently, Flink will distribute keytab to JobManager and TaskManagers, the kerberos credentials are renewed by the keytab on JobManager and TaskManagers. However, Spark adopts a different way solution, it only ships the keytab to Driver and Driver will use this keytab to renew all delegation tokens periodically and then distribute the renewed tokens to Executors. In this way, Spark can reduce the load on KDC. You could refer this doc for details: https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t6BrC6TZi3FGf2Dm6-i8/edit Thanks, Jie On 2021/01/27 03:33:37, Rui Li wrote: > Hi Jie, > > Thanks for the investigation. I think we can first implement pluggable DT > providers, and add renewal abilities incrementally. I'm also curious where > Spark runs its HadoopDelegationTokenManager when renewal is enabled? > Because it seems HadoopDelegationTokenManager needs access to keytab to > create new tokens, does that mean it can only run on the client side? > > On Mon, Jan 25, 2021 at 10:32 AM 王 杰 wrote: > > > Hi Till, > > > > Sorry for late response, I just did some investigations about Spark. Spark > > adopted the SPI way to obtain delegations for different components. It has > > a HadoopDelegationTokenManager.scala< > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala> > > to manage all Hadoop delegation tokens including obtaining and renewing the > > delegation tokens. > > > > When the HadoopDelegationTokenManager is initializing, it will use > > ServiceLoader to load all HadoopDelegationTokenProviders in different > > connectors. As for Hive, the provider implementation is > > HadoopDelegationTokenProvider< > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala > > >. > > > > Thanks, > > Jie > > > > > > On 2021/01/13 08:51:29, Till Rohrmann > trohrm...@apache.org>> wrote: > > > Hi Jie Wang, > > > > > > thanks for starting this discussion. To me the SPI approach sounds better > > > because it is not as brittle as using reflection. Concerning the > > > configuration, we could think about introducing some Hive specific > > > configuration options which allow us to specify these paths. How are > > other > > > projects which integrate with Hive are solving this problem? > > > > > > Cheers, > > > Till > > > > > > On Tue, Jan 12, 2021 at 4:13 PM 王 杰 > jackwan...@outlook.com>> wrote: > > > > > > > Hi everyone, > > > > > > > > Currently, Hive delegation token is not obtained when Flink submits the > > > > application in Yarn mode using kinit way. The ticket is > > > > https://issues.apache.org/jira/browse/FLINK-20714. I'd like to start a > > > > discussion about how to support this feature. > > > > > > > > Maybe we have two options: > > > > 1. Using a reflection way to construct a Hive client to obtain the > > token, > > > > just same as the org.apache.flink.yarn.Utils.obtainTokenForHBase > > > > implementation. > > > > 2. Introduce a pluggable delegation provider via SPI. Delegation > > provider > > > > could be placed in connector related code, so reflection is not needed > > and > > > > is more extendable. > > > > > > > > > > > > > > > > Both options have to handle how to specify the HiveConf to use. In Hive > > > > connector, user could specify both hiveConfDir and hadoopConfDir when > > > > creating HiveCatalog. The hadoopConfDir may not the same as the Hadoop > > > > configuration in HadoopModule. > > > > > > > > Looking forward to your suggestions. > > > > > > > > -- > > > > Best regards! > > > > Jie Wang > > > > > > > > > > > > > > > > -- > Best regards! > Rui Li >
[jira] [Created] (FLINK-21233) Race condition in CheckpointCoordinator in finishing sync savepoint
Roman Khachatryan created FLINK-21233: - Summary: Race condition in CheckpointCoordinator in finishing sync savepoint Key: FLINK-21233 URL: https://issues.apache.org/jira/browse/FLINK-21233 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.12.1, 1.11.3, 1.13.0 Reporter: Roman Khachatryan I'm writing an integration test and see a failure from time to time (1 per 100 on my machine): {code:java} Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.checkpoint.CheckpointException: CheckpointCoordinator shutdown. {code} Consider the final stage of the synchronous savepoint (started by stop with savepoint command): # The last subtask ACKs the checkpoint # CheckpointCoordinator finalizes the checkpoint and sends out confirmations # EndOfPartition is generated on sources and flows through the graph # Each Subtask notifies the Scheduler about its completion # Upon receiving the last notification Scheduler shuts down CheckpointCoordinator # CheckpointCoordinator aborts all pending checkpoing Not that Scheduler and CheckpointCoordinator run in different threads. So if savepoint finalization takes longer then it can be aborted before completion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
Hi Timo, > Another question is whether a LOAD operation also adds the module to the enabled list by default? I would like to add the module to the enabled list by default, the main reasons are: 1) Reordering is an advanced requirement, adding modules needs additional USE statements with "core" module sounds too burdensome. Most users should be satisfied with only LOAD statements. 2) We should keep compatible for TableEnvironment#loadModule(). 3) We are using the LOAD statement instead of CREATE, so I think it's fine that it does some implicit things. Best, Jark On Tue, 2 Feb 2021 at 00:48, Timo Walther wrote: > Not the module itself but the ModuleManager should handle this case, yes. > > Regards, > Timo > > > On 01.02.21 17:35, Jane Chan wrote: > > +1 to Jark's proposal > > > > To make it clearer, will `module#getFunctionDefinition()` return empty > > suppose the module is loaded but not enabled? > > > > Best, > > Jane > > > > On Mon, Feb 1, 2021 at 10:02 PM Timo Walther wrote: > > > >> +1 to Jark's proposal > >> > >> I like the difference between just loading and actually enabling these > >> modules. > >> > >> @Rui: I would use the same behavior as catalogs here. You cannot `USE` a > >> catalog without creating it before. > >> > >> Another question is whether a LOAD operation also adds the module to the > >> enabled list by default? > >> > >> Regards, > >> Timo > >> > >> On 01.02.21 13:52, Rui Li wrote: > >>> If `USE MODULES` implies unloading modules that are not listed, does it > >>> also imply loading modules that are not previously loaded, especially > >> since > >>> we're mapping modules by name now? > >>> > >>> On Mon, Feb 1, 2021 at 8:20 PM Jark Wu wrote: > >>> > I agree with Timo that the USE implies the specified modules are in > use > >> in > the specified order and others are not used. > This would be easier to know what's the result list and order after > the > >> USE > statement. > That means: if current modules in order are x, y, z. And `USE MODULES > >> z, y` > means current modules in order are z, y. > > But I would like to not unload the unmentioned modules in the USE > statement. Because it seems strange that USE > will implicitly remove modules. In the above example, the user may > type > >> the > wrong modules list using USE by mistake > and would like to declare the list again, the user has to create > the > module again with some properties he may don't know. Therefore, I > >> propose > the USE statement just specifies the current module lists and doesn't > unload modules. > Besides that, we may need a new syntax to list all the modules > including > not used but loaded. > We can introduce SHOW FULL MODULES for this purpose with an additional > `used` column. > > For example: > > Flink SQL> list modules: > --- > | modules | > --- > | x | > | y | > | z | > --- > Flink SQL> USE MODULES z, y; > Flink SQL> show modules: > --- > | modules | > --- > | z | > | y | > --- > Flink SQL> show FULL modules; > --- > | modules | used | > --- > | z | true | > | y | true | > | x | false | > --- > Flink SQL> USE MODULES z, y, x; > Flink SQL> show modules; > --- > | modules | > --- > | z | > | y | > | x | > --- > > What do you think? > > Best, > Jark > > On Mon, 1 Feb 2021 at 19:02, Jane Chan wrote: > > > Hi Timo, thanks for the discussion. > > > > It seems to reach an agreement regarding #3 that <1> Module name > should > > better be a simple identifier rather than a string literal. <2> > >> Property > > `type` is redundant and should be removed, and mapping will rely on > the > > module name because loading a module multiple times just using a > different > > module name doesn't make much sense. <3> We should migrate to the > newer > API > > rather than the deprecated `TableFactory` class. > > > > Regarding #1, I think the point lies in whether changing the > resolution > > order implies an `unload` operation explicitly (i.e., users could > sense > > it). What do others think? > > > > Best, > > Jane > > > > On Mon, Feb 1, 2021 at 6:41 PM Timo Walther > >> wrote: > > > >> IMHO I would rather unload the not mentioned modules. The statement > >> expresses `USE` that implicilty implies that the other modules are > >> "not > >> used". What do others think? > >> > >> Regards, > >> Timo > >> > >> > >> On 01.02.21 11:28, Jane Chan wrote: > >>> Hi Jark and Rui, > >>> > >
Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior
Hi Leonard, Timo, I just did some investigation and found all the other batch processing systems evaluate the time functions at query-start, including Snowflake, Hive, Spark, Trino. I'm wondering whether the default 'per-record' mode will still be weird for batch users. I know we proposed the option for batch users to change the behavior. However if 90% users need to set this config before submitting batch jobs, why not use this mode for batch by default? For the other 10% special users, they can still set the config to per-record before submitting batch jobs. I believe this can greatly improve the usability for batch cases. Therefore, what do you think about using "auto" as the default option value? It evaluates time functions per-record in streaming mode and evaluates at query start in batch mode. I think this can make both streaming users and batch users happy. IIUC, the reason why we proposing the default "per-record" mode is for the batch streaming consistent. However, I think time functions are special cases because they are naturally non-deterministic. Even if streaming jobs and batch jobs all use "per-record" mode, they still can't provide consistent results. Thus, I think we may need to think more from the users' perspective. Best, Jark On Mon, 1 Feb 2021 at 23:06, Timo Walther wrote: > Hi Leonard, > > thanks for considering this issue as well. +1 for the proposed config > option. Let's start a voting thread once the FLIP document has been > updated if there are no other concerns? > > Thanks, > Timo > > > On 01.02.21 15:07, Leonard Xu wrote: > > Hi, all > > > > I’ve discussed with @Timo @Jark about the time function evaluation > further. We reach a consensus that we’d better address the time function > evaluation(function value materialization) in this FLIP as well. > > > > We’re fine with introducing an option > table.exec.time-function-evaluation to control the materialize time point > of time function value. The time function includes > > LOCALTIME > > LOCALTIMESTAMP > > CURRENT_DATE > > CURRENT_TIME > > CURRENT_TIMESTAMP > > NOW() > > The default value of table.exec.time-function-evaluation is > 'per-record', which means Flink evaluates the function value per record, we > recommend users config this option value for their streaming pipe lines. > > Another valid option value is ’query-start’, which means Flink evaluates > the function value at the query start, we recommend users config this > option value for their batch pipelines. > > In the future, more valid evaluation option value like ‘auto' may be > supported if there’re new requirements, e.g: support ‘auto’ option which > evaluates time function value per-record in streaming mode and evaluates > > time function value at query start in batch mode. > > > > Alternative1: > > Introduce function like CURRENT_TIMESTAMP2/CURRENT_TIMESTAMP_NOW > which evaluates function value at query start. This may confuse users a bit > that we provide two similar functions but with different return value. > > > > > Alternative2: > > Do not introduce any configuration/function, control the > function evaluation by pipeline execution mode. This may produce different > result when user use their streaming pipeline sql to run a batch > pipeline(e.g backfilling), and user also > > can not control these function behavior. > > > > > > How do you think ? > > > > Thanks, > > Leonard > > > > > >> 在 2021年2月1日,18:23,Timo Walther 写道: > >> > >> Parts of the FLIP can already be implemented without a completed > voting, e.g. there is no doubt that we should support TIME(9). > >> > >> However, I don't see a benefit of reworking the time functions to > rework them again later. If we lock the time on query-start the > implementation of the previsouly mentioned functions will be completely > different. > >> > >> Regards, > >> Timo > >> > >> > >> On 01.02.21 02:37, Kurt Young wrote: > >>> I also prefer to not expand this FLIP further, but we could open a > >>> discussion thread > >>> right after this FLIP being accepted and start coding & reviewing. Make > >>> technique > >>> discussion and coding more pipelined will improve efficiency. > >>> Best, > >>> Kurt > >>> On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu wrote: > Hi, Timo > > > I do think that this topic must be part of the FLIP as well. Esp. if > the > FLIP has the title "time function behavior" and this is clearly a > behavioral aspect. We are performing a heavy refactoring of the SQL > query > semantics in Flink here which will affect a lot of users. We cannot > rework > the time functions a third time after this. > > I checked a couple of other vendors. It seems that they all lock the > timestamp when the query is started. And as you said, in this case > both > mature (Oracle) and less mature systems (Hive, MySQL) have the same > behavior. > > FLIP-162> “These problems come from the fact that lots of time-related > functions like
Re: [DISCUSS]FLIP-163: SQL Client Improvements
Thanks for the proposal, yes, sql-client is too outdated. +1 for improving it. About "SET" and "RESET", Why not be "SET" and "UNSET"? Best, Jingsong On Mon, Feb 1, 2021 at 2:46 PM Rui Li wrote: > Thanks Shengkai for the update! The proposed changes look good to me. > > On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang wrote: > > > Hi, Rui. > > You are right. I have already modified the FLIP. > > > > The main changes: > > > > # -f parameter has no restriction about the statement type. > > Sometimes, users use the pipe to redirect the result of queries to debug > > when submitting job by -f parameter. It's much convenient comparing to > > writing INSERT INTO statements. > > > > # Add a new sql client option `sql-client.job.detach` . > > Users prefer to execute jobs one by one in the batch mode. Users can set > > this option false and the client will process the next job until the > > current job finishes. The default value of this option is false, which > > means the client will execute the next job when the current job is > > submitted. > > > > Best, > > Shengkai > > > > > > > > Rui Li 于2021年1月29日周五 下午4:52写道: > > > >> Hi Shengkai, > >> > >> Regarding #2, maybe the -f options in flink and hive have different > >> implications, and we should clarify the behavior. For example, if the > >> client just submits the job and exits, what happens if the file contains > >> two INSERT statements? I don't think we should treat them as a statement > >> set, because users should explicitly write BEGIN STATEMENT SET in that > >> case. And the client shouldn't asynchronously submit the two jobs, > because > >> the 2nd may depend on the 1st, right? > >> > >> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang > wrote: > >> > >>> Hi Rui, > >>> Thanks for your feedback. I agree with your suggestions. > >>> > >>> For the suggestion 1: Yes. we are plan to strengthen the set command. > In > >>> the implementation, it will just put the key-value into the > >>> `Configuration`, which will be used to generate the table config. If > hive > >>> supports to read the setting from the table config, users are able to > set > >>> the hive-related settings. > >>> > >>> For the suggestion 2: The -f parameter will submit the job and exit. If > >>> the queries never end, users have to cancel the job by themselves, > which is > >>> not reliable(people may forget their jobs). In most case, queries are > used > >>> to analyze the data. Users should use queries in the interactive mode. > >>> > >>> Best, > >>> Shengkai > >>> > >>> Rui Li 于2021年1月29日周五 下午3:18写道: > >>> > Thanks Shengkai for bringing up this discussion. I think it covers a > lot of useful features which will dramatically improve the usability > of our > SQL Client. I have two questions regarding the FLIP. > > 1. Do you think we can let users set arbitrary configurations via the > SET command? A connector may have its own configurations and we don't > have > a way to dynamically change such configurations in SQL Client. For > example, > users may want to be able to change hive conf when using hive > connector [1]. > 2. Any reason why we have to forbid queries in SQL files specified > with > the -f option? Hive supports a similar -f option but allows queries > in the > file. And a common use case is to run some query and redirect the > results > to a file. So I think maybe flink users would like to do the same, > especially in batch scenarios. > > [1] https://issues.apache.org/jira/browse/FLINK-20590 > > On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu > > wrote: > > > Hi Shengkai, > > > > Glad to see this improvement. And I have some additional suggestions: > > > > #1. Unify the TableEnvironment in ExecutionContext to > > StreamTableEnvironment for both streaming and batch sql. > > #2. Improve the way of results retrieval: sql client collect the > > results > > locally all at once using accumulators at present, > > which may have memory issues in JM or Local for the big query > > result. > > Accumulator is only suitable for testing purpose. > > We may change to use SelectTableSink, which is based > > on CollectSinkOperatorCoordinator. > > #3. Do we need to consider Flink SQL gateway which is in FLIP-91. > Seems > > that this FLIP has not moved forward for a long time. > > Provide a long running service out of the box to facilitate the > > sql > > submission is necessary. > > > > What do you think of these? > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway > > > > > > Shengkai Fang 于2021年1月28日周四 下午8:54写道: > > > > > Hi devs, > > > > > > Jark and I want to start a discussion about FLIP-163:SQL Client > > > Improvements. > > > > > > Many users have complained about the problems of t
Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior
+1 for the default "auto" to the "table.exec.time-function-evaluation". >From the definition of these functions, in my opinion: - Batch is the instant execution of all records, which is the meaning of the word "BATCH", so there is only one time at query-start. - Stream only executes a single record in a moment, so time is generated by each record. On the other hand, we should be more careful about consistency with other systems. Best, Jingsong On Tue, Feb 2, 2021 at 11:24 AM Jark Wu wrote: > Hi Leonard, Timo, > > I just did some investigation and found all the other batch processing > systems > evaluate the time functions at query-start, including Snowflake, Hive, > Spark, Trino. > I'm wondering whether the default 'per-record' mode will still be weird for > batch users. > I know we proposed the option for batch users to change the behavior. > However if 90% users need to set this config before submitting batch jobs, > why not > use this mode for batch by default? For the other 10% special users, they > can still > set the config to per-record before submitting batch jobs. I believe this > can greatly > improve the usability for batch cases. > > Therefore, what do you think about using "auto" as the default option > value? > > It evaluates time functions per-record in streaming mode and evaluates at > query start in batch mode. > I think this can make both streaming users and batch users happy. IIUC, the > reason why we > proposing the default "per-record" mode is for the batch streaming > consistent. > However, I think time functions are special cases because they are > naturally non-deterministic. > Even if streaming jobs and batch jobs all use "per-record" mode, they still > can't provide consistent > results. Thus, I think we may need to think more from the users' > perspective. > > Best, > Jark > > > On Mon, 1 Feb 2021 at 23:06, Timo Walther wrote: > > > Hi Leonard, > > > > thanks for considering this issue as well. +1 for the proposed config > > option. Let's start a voting thread once the FLIP document has been > > updated if there are no other concerns? > > > > Thanks, > > Timo > > > > > > On 01.02.21 15:07, Leonard Xu wrote: > > > Hi, all > > > > > > I’ve discussed with @Timo @Jark about the time function evaluation > > further. We reach a consensus that we’d better address the time function > > evaluation(function value materialization) in this FLIP as well. > > > > > > We’re fine with introducing an option > > table.exec.time-function-evaluation to control the materialize time point > > of time function value. The time function includes > > > LOCALTIME > > > LOCALTIMESTAMP > > > CURRENT_DATE > > > CURRENT_TIME > > > CURRENT_TIMESTAMP > > > NOW() > > > The default value of table.exec.time-function-evaluation is > > 'per-record', which means Flink evaluates the function value per record, > we > > recommend users config this option value for their streaming pipe lines. > > > Another valid option value is ’query-start’, which means Flink > evaluates > > the function value at the query start, we recommend users config this > > option value for their batch pipelines. > > > In the future, more valid evaluation option value like ‘auto' may be > > supported if there’re new requirements, e.g: support ‘auto’ option which > > evaluates time function value per-record in streaming mode and evaluates > > > time function value at query start in batch mode. > > > > > > Alternative1: > > > Introduce function like CURRENT_TIMESTAMP2/CURRENT_TIMESTAMP_NOW > > which evaluates function value at query start. This may confuse users a > bit > > that we provide two similar functions but with different return value. > > > > > > > > Alternative2: > > > Do not introduce any configuration/function, control the > > function evaluation by pipeline execution mode. This may produce > different > > result when user use their streaming pipeline sql to run a batch > > pipeline(e.g backfilling), and user also > > > can not control these function behavior. > > > > > > > > > How do you think ? > > > > > > Thanks, > > > Leonard > > > > > > > > >> 在 2021年2月1日,18:23,Timo Walther 写道: > > >> > > >> Parts of the FLIP can already be implemented without a completed > > voting, e.g. there is no doubt that we should support TIME(9). > > >> > > >> However, I don't see a benefit of reworking the time functions to > > rework them again later. If we lock the time on query-start the > > implementation of the previsouly mentioned functions will be completely > > different. > > >> > > >> Regards, > > >> Timo > > >> > > >> > > >> On 01.02.21 02:37, Kurt Young wrote: > > >>> I also prefer to not expand this FLIP further, but we could open a > > >>> discussion thread > > >>> right after this FLIP being accepted and start coding & reviewing. > Make > > >>> technique > > >>> discussion and coding more pipelined will improve efficiency. > > >>> Best, > > >>> Kurt > > >>> On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu > wrote: > > H
[jira] [Created] (FLINK-21234) testKafkaSourceSinkWithKeyAndPartialValue[legacy = false, format = csv] hang
Guowei Ma created FLINK-21234: - Summary: testKafkaSourceSinkWithKeyAndPartialValue[legacy = false, format = csv] hang Key: FLINK-21234 URL: https://issues.apache.org/jira/browse/FLINK-21234 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.13.0 Reporter: Guowei Ma https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12758&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-21235) leaderChange_withBlockingJobManagerTermination_doesNotAffectNewLeader hang
Guowei Ma created FLINK-21235: - Summary: leaderChange_withBlockingJobManagerTermination_doesNotAffectNewLeader hang Key: FLINK-21235 URL: https://issues.apache.org/jira/browse/FLINK-21235 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.11.3 Reporter: Guowei Ma [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12759&view=logs&j=3b6ec2fd-a816-5e75-c775-06fb87cb6670&t=b33fdd4f-3de5-542e-2624-5d53167bb672] {code:java} at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.util.AutoCloseableAsync.close(AutoCloseableAsync.java:36) at org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunnerITCase.leaderChange_withBlockingJobManagerTermination_doesNotAffectNewLeader(DefaultDispatcherRunnerITCase.java:211) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithReru{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-21236) Don't explicitly use HeapMemorySegment in row format serde
Kurt Young created FLINK-21236: -- Summary: Don't explicitly use HeapMemorySegment in row format serde Key: FLINK-21236 URL: https://issues.apache.org/jira/browse/FLINK-21236 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Affects Versions: 1.12.0 Reporter: Kurt Young Fix For: 1.13.0 `RawFormatDeserializationSchema` and `RawFormatSerializationSchema` explicitly used `HeapMemorySegment`, and in a typical batch job, `HybridMemorySegment` will also be loaded and used as managed memory. This will prevent Class Hierarchy Analysis (CHA) to optimize the function call of MemorySegment. More details can be found here: [https://flink.apache.org/news/2015/09/16/off-heap-memory.html] We can use `ByteBuffer` instead of `HeapMemorySegment`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-21237) Reflects the actual running state of the job
liuzhuo created FLINK-21237: --- Summary: Reflects the actual running state of the job Key: FLINK-21237 URL: https://issues.apache.org/jira/browse/FLINK-21237 Project: Flink Issue Type: Improvement Components: Runtime / Task Reporter: liuzhuo {code:java} public enum JobStatus { ... /** Some tasks are scheduled or running, some may be pending, some may be finished. */ RUNNING(TerminalState.NON_TERMINAL), ... }{code} According to the RUNNING comment, some tasks are not in the true RUNNING state, and may take a while to reach RUNNING, or even fail due to some errors. why not to provide a state that truly reflects the Tasks RUNNING, indicating that all tasks are RUNNING and in this state they can process data correctly -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax
+1 @Jane Can you summarize our discussion in the JIRA issue? Thanks, Timo On 02.02.21 03:50, Jark Wu wrote: Hi Timo, Another question is whether a LOAD operation also adds the module to the enabled list by default? I would like to add the module to the enabled list by default, the main reasons are: 1) Reordering is an advanced requirement, adding modules needs additional USE statements with "core" module sounds too burdensome. Most users should be satisfied with only LOAD statements. 2) We should keep compatible for TableEnvironment#loadModule(). 3) We are using the LOAD statement instead of CREATE, so I think it's fine that it does some implicit things. Best, Jark On Tue, 2 Feb 2021 at 00:48, Timo Walther wrote: Not the module itself but the ModuleManager should handle this case, yes. Regards, Timo On 01.02.21 17:35, Jane Chan wrote: +1 to Jark's proposal To make it clearer, will `module#getFunctionDefinition()` return empty suppose the module is loaded but not enabled? Best, Jane On Mon, Feb 1, 2021 at 10:02 PM Timo Walther wrote: +1 to Jark's proposal I like the difference between just loading and actually enabling these modules. @Rui: I would use the same behavior as catalogs here. You cannot `USE` a catalog without creating it before. Another question is whether a LOAD operation also adds the module to the enabled list by default? Regards, Timo On 01.02.21 13:52, Rui Li wrote: If `USE MODULES` implies unloading modules that are not listed, does it also imply loading modules that are not previously loaded, especially since we're mapping modules by name now? On Mon, Feb 1, 2021 at 8:20 PM Jark Wu wrote: I agree with Timo that the USE implies the specified modules are in use in the specified order and others are not used. This would be easier to know what's the result list and order after the USE statement. That means: if current modules in order are x, y, z. And `USE MODULES z, y` means current modules in order are z, y. But I would like to not unload the unmentioned modules in the USE statement. Because it seems strange that USE will implicitly remove modules. In the above example, the user may type the wrong modules list using USE by mistake and would like to declare the list again, the user has to create the module again with some properties he may don't know. Therefore, I propose the USE statement just specifies the current module lists and doesn't unload modules. Besides that, we may need a new syntax to list all the modules including not used but loaded. We can introduce SHOW FULL MODULES for this purpose with an additional `used` column. For example: Flink SQL> list modules: --- | modules | --- | x | | y | | z | --- Flink SQL> USE MODULES z, y; Flink SQL> show modules: --- | modules | --- | z | | y | --- Flink SQL> show FULL modules; --- | modules | used | --- | z | true | | y | true | | x | false | --- Flink SQL> USE MODULES z, y, x; Flink SQL> show modules; --- | modules | --- | z | | y | | x | --- What do you think? Best, Jark On Mon, 1 Feb 2021 at 19:02, Jane Chan wrote: Hi Timo, thanks for the discussion. It seems to reach an agreement regarding #3 that <1> Module name should better be a simple identifier rather than a string literal. <2> Property `type` is redundant and should be removed, and mapping will rely on the module name because loading a module multiple times just using a different module name doesn't make much sense. <3> We should migrate to the newer API rather than the deprecated `TableFactory` class. Regarding #1, I think the point lies in whether changing the resolution order implies an `unload` operation explicitly (i.e., users could sense it). What do others think? Best, Jane On Mon, Feb 1, 2021 at 6:41 PM Timo Walther wrote: IMHO I would rather unload the not mentioned modules. The statement expresses `USE` that implicilty implies that the other modules are "not used". What do others think? Regards, Timo On 01.02.21 11:28, Jane Chan wrote: Hi Jark and Rui, Thanks for the discussions. Regarding #1, I'm fine with `USE MODULES` syntax, and It can be interpreted as "setting the current order of modules", which is similar to "setting the current catalog" for `USE CATALOG`. I would like to confirm that the unmentioned modules remain in the same relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`, then `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`. Regarding #3, I'm fine with mapping modules purely by name, and I think Jark raised a good point on making the module name a simple identifier instead of a string literal. For backward compatibility, since we haven't supported this syntax yet, the affected users are those who defined modules in
[jira] [Created] (FLINK-21238) Support to close PythonFunctionFactory manually
Dian Fu created FLINK-21238: --- Summary: Support to close PythonFunctionFactory manually Key: FLINK-21238 URL: https://issues.apache.org/jira/browse/FLINK-21238 Project: Flink Issue Type: Improvement Components: API / Python Reporter: Dian Fu Assignee: Dian Fu Fix For: 1.13.0 PythonFunctionFactory is used to convert a Python class to a Java PythonFunction representation which could then be used as a user-defined function. Underlying PythonFunctionFactory, there is a Python process which is used to perform the actual conversion work. Currently, the Python process is added to shutdown hook and closed when the JVM exits. The aim of this JIRA is to provide more flexibility for users by introducing a close method to PythonFunctionFactory to allow it to be manually closed. -- This message was sent by Atlassian Jira (v8.3.4#803005)