Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources
Hi Jiabao, Please see the reply inline. > The MySQL connector is currently in the flink-connector-jdbc repository > and is not a standalone connector. > Is it too unique to use "mysql" as the configuration option prefix? If the intended behavior makes sense to all the supported JDBC drivers, we can make this a JDBC connector configuration. Also, I would like to ask about the difference in behavior between AUTO and > ALWAYS. > It seems that we cannot guarantee the pushing down of all filters to the > external system under the ALWAYS > mode because not all filters in Flink SQL are supported by the external > system. > Should we throw an error when encountering a filter that cannot be pushed > down in the ALWAYS mode? The idea of AUTO is to do efficiency-aware pushdowns. The source will query the external system (MySQL, Oracle, SQL Server, etc) first to retrieve the information of the table. With that information, the source will decide whether to further push a filter to the external system based on the efficiency. E.g. only push the indexed fields. In contrast, ALWAYS will just always push the supported filters to the external system, regardless of the efficiency. In case there are filters that are not supported, according to the current contract of SupportsFilterPushdown, these unsupported filters should just be returned by the *SupportsFilterPushdown.applyFilters()* method as remaining filters. Therefore, there is no need to throw exceptions here. This is likely the desired behavior for most users, IMO. If there are cases that users really want to get alerted when a filter cannot be pushed to the external system, we can add another value like "ENFORCED_ALWAYS", which behaves like ALWAYS, but throws exceptions when a filter cannot be applied to the external system. But personally I don't see much value in doing this. Thanks, Jiangjie (Becket) Qin On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun wrote: > Hi Becket, > > The MySQL connector is currently in the flink-connector-jdbc repository > and is not a standalone connector. > Is it too unique to use "mysql" as the configuration option prefix? > > Also, I would like to ask about the difference in behavior between AUTO > and ALWAYS. > It seems that we cannot guarantee the pushing down of all filters to the > external system under the ALWAYS > mode because not all filters in Flink SQL are supported by the external > system. > Should we throw an error when encountering a filter that cannot be pushed > down in the ALWAYS mode? > > Thanks, > Jiabao > > > 2023年12月18日 15:34,Becket Qin 写道: > > > > Hi JIabao, > > > > Thanks for updating the FLIP. Maybe I did not explain it clearly enough. > My > > point is that given there are various good flavors of behaviors handling > > filters pushed down, we should not have a common config of > > "ignore.filter.pushdown", because the behavior is not *common*. > > > > It looks like the original motivation of this FLIP is just for MySql. > Let's > > focus on what is the best solution for MySql connector here first. After > > that, if people think the best behavior for MySql happens to be a common > > one, we can then discuss whether that is worth being added to the base > > implementation of source. For MySQL, if we are going to introduce a > config > > to MySql, why not have something like "mysql.filter.handling.policy" with > > value of AUTO / NEVER / ALWAYS? Isn't that better than > > "ignore.filter.pushdown"? > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > > > > > On Sun, Dec 17, 2023 at 11:30 PM Jiabao Sun .invalid> > > wrote: > > > >> Hi Becket, > >> > >> The FLIP document has been updated as well. > >> Please take a look when you have time. > >> > >> Thanks, > >> Jiabao > >> > >> > >>> 2023年12月17日 22:54,Jiabao Sun 写道: > >>> > >>> Thanks Becket, > >>> > >>> I apologize for not being able to continue with this proposal due to > >> being too busy during this period. > >>> > >>> The viewpoints you shared about the design of Flink Source make sense > to > >> me > >>> The native configuration ‘ignore.filter.pushdown’ is good to me. > >>> Having a unified name or naming style can indeed prevent confusion for > >> users regarding > >>> the inconsistent naming of this configuration across different > >> connectors. > >>> > >>> Currently, there are not many external connectors that support filter > >> pushdown. > >>> I propose that we first introduce it in flink-connector-jdbc and > >> flink-connector-mongodb. > >>> Do you think this is feasible? > >>> > >>> Best, > >>> Jiabao > >>> > >>> > 2023年11月16日 17:45,Becket Qin 写道: > > Hi Jiabao, > > Arguments like "because Spark has it so Flink should also have it" > does > >> not > make sense. Different projects have different API flavors and styles. > >> What > is really important is the rationale and the design principle behind > the > API. They should conform to the convention of the project. > > First of all, Spark Sou
[jira] [Created] (FLINK-33871) Reduce getTable call for hive client and optimize graph generation time
hehuiyuan created FLINK-33871: - Summary: Reduce getTable call for hive client and optimize graph generation time Key: FLINK-33871 URL: https://issues.apache.org/jira/browse/FLINK-33871 Project: Flink Issue Type: Improvement Reporter: hehuiyuan HiveCatalog.getHiveTable method wastes a lot of time when generate graph, because the number of calls is relatively high. I have an sql task with over 2000 rows, the HiveCatalog.getHiveTable method is called 4879 times , but only six hive tables were used.  The client.getTable method costs a lot of time.  There is a statistic that jobmanager interacts with hive when generate graph. If One call takes approximately 50 milliseconds , How much time it spends : 4879 * 50 =243950ms = 243.95s = 4min We can cache and client.getTable method is only called six times. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[VOTE] FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable
Hi everyone, Since there were no further comments on the discussion thread [1], I would like to start the vote for FLIP-372 [2]. The FLIP started as a small new feature, but in the discussion thread and in a similar parallel thread [3] we opted for a somewhat bigger change in the Sink V2 API. Please read the FLIP and cast your vote. The vote will remain open for at least 72 hours and only concluded if there are no objections and enough (i.e. at least 3) binding votes. Thanks, Peter [1] - https://lists.apache.org/thread/344pzbrqbbb4w0sfj67km25msp7hxlyd [2] - https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Allow+TwoPhaseCommittingSink+WithPreCommitTopology+to+alter+the+type+of+the+Committable [3] - https://lists.apache.org/thread/h6nkgth838dlh5s90sd95zd6hlsxwx57
Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources
Thanks Bucket, The jdbc.filter.handling.policy is good to me as it provides sufficient extensibility for future filter pushdown optimizations. However, currently, we don't have an implementation for the AUTO mode, and it seems that the AUTO mode can easily be confused with the ALWAYS mode because users don't have the opportunity to MANUALLY decide which filters to push down. I suggest that we only introduce the ALWAYS and NEVER modes for now, and we can consider introducing more flexible policies in the future, such as INDEX_ONLY, NUMBERIC_ONLY and so on. WDYT? Best, Jiabao > 2023年12月18日 16:27,Becket Qin 写道: > > Hi Jiabao, > > Please see the reply inline. > > >> The MySQL connector is currently in the flink-connector-jdbc repository >> and is not a standalone connector. >> Is it too unique to use "mysql" as the configuration option prefix? > > If the intended behavior makes sense to all the supported JDBC drivers, we > can make this a JDBC connector configuration. > > Also, I would like to ask about the difference in behavior between AUTO and >> ALWAYS. >> It seems that we cannot guarantee the pushing down of all filters to the >> external system under the ALWAYS >> mode because not all filters in Flink SQL are supported by the external >> system. >> Should we throw an error when encountering a filter that cannot be pushed >> down in the ALWAYS mode? > > The idea of AUTO is to do efficiency-aware pushdowns. The source will query > the external system (MySQL, Oracle, SQL Server, etc) first to retrieve the > information of the table. With that information, the source will decide > whether to further push a filter to the external system based on the > efficiency. E.g. only push the indexed fields. In contrast, ALWAYS will > just always push the supported filters to the external system, regardless > of the efficiency. In case there are filters that are not supported, > according to the current contract of SupportsFilterPushdown, these > unsupported filters should just be returned by the > *SupportsFilterPushdown.applyFilters()* method as remaining filters. > Therefore, there is no need to throw exceptions here. This is likely the > desired behavior for most users, IMO. If there are cases that users really > want to get alerted when a filter cannot be pushed to the external system, > we can add another value like "ENFORCED_ALWAYS", which behaves like ALWAYS, > but throws exceptions when a filter cannot be applied to the external > system. But personally I don't see much value in doing this. > > Thanks, > > Jiangjie (Becket) Qin > > > > On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun > wrote: > >> Hi Becket, >> >> The MySQL connector is currently in the flink-connector-jdbc repository >> and is not a standalone connector. >> Is it too unique to use "mysql" as the configuration option prefix? >> >> Also, I would like to ask about the difference in behavior between AUTO >> and ALWAYS. >> It seems that we cannot guarantee the pushing down of all filters to the >> external system under the ALWAYS >> mode because not all filters in Flink SQL are supported by the external >> system. >> Should we throw an error when encountering a filter that cannot be pushed >> down in the ALWAYS mode? >> >> Thanks, >> Jiabao >> >>> 2023年12月18日 15:34,Becket Qin 写道: >>> >>> Hi JIabao, >>> >>> Thanks for updating the FLIP. Maybe I did not explain it clearly enough. >> My >>> point is that given there are various good flavors of behaviors handling >>> filters pushed down, we should not have a common config of >>> "ignore.filter.pushdown", because the behavior is not *common*. >>> >>> It looks like the original motivation of this FLIP is just for MySql. >> Let's >>> focus on what is the best solution for MySql connector here first. After >>> that, if people think the best behavior for MySql happens to be a common >>> one, we can then discuss whether that is worth being added to the base >>> implementation of source. For MySQL, if we are going to introduce a >> config >>> to MySql, why not have something like "mysql.filter.handling.policy" with >>> value of AUTO / NEVER / ALWAYS? Isn't that better than >>> "ignore.filter.pushdown"? >>> >>> Thanks, >>> >>> Jiangjie (Becket) Qin >>> >>> >>> >>> On Sun, Dec 17, 2023 at 11:30 PM Jiabao Sun > .invalid> >>> wrote: >>> Hi Becket, The FLIP document has been updated as well. Please take a look when you have time. Thanks, Jiabao > 2023年12月17日 22:54,Jiabao Sun 写道: > > Thanks Becket, > > I apologize for not being able to continue with this proposal due to being too busy during this period. > > The viewpoints you shared about the design of Flink Source make sense >> to me > The native configuration ‘ignore.filter.pushdown’ is good to me. > Having a unified name or naming style can indeed prevent confusion for users regarding > the inconsistent naming of this
Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources
Yes, that sounds reasonable to me. We can start with ALWAYS and NEVER, and add more policies as needed. Thanks, Jiangjie (Becket) Qin On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun wrote: > Thanks Bucket, > > The jdbc.filter.handling.policy is good to me as it provides sufficient > extensibility for future filter pushdown optimizations. > However, currently, we don't have an implementation for the AUTO mode, and > it seems that the AUTO mode can easily be confused with the ALWAYS mode > because users don't have the opportunity to MANUALLY decide which filters > to push down. > > I suggest that we only introduce the ALWAYS and NEVER modes for now, and > we can consider introducing more flexible policies in the future, > such as INDEX_ONLY, NUMBERIC_ONLY and so on. > > WDYT? > > Best, > Jiabao > > > > > 2023年12月18日 16:27,Becket Qin 写道: > > > > Hi Jiabao, > > > > Please see the reply inline. > > > > > >> The MySQL connector is currently in the flink-connector-jdbc repository > >> and is not a standalone connector. > >> Is it too unique to use "mysql" as the configuration option prefix? > > > > If the intended behavior makes sense to all the supported JDBC drivers, > we > > can make this a JDBC connector configuration. > > > > Also, I would like to ask about the difference in behavior between AUTO > and > >> ALWAYS. > >> It seems that we cannot guarantee the pushing down of all filters to the > >> external system under the ALWAYS > >> mode because not all filters in Flink SQL are supported by the external > >> system. > >> Should we throw an error when encountering a filter that cannot be > pushed > >> down in the ALWAYS mode? > > > > The idea of AUTO is to do efficiency-aware pushdowns. The source will > query > > the external system (MySQL, Oracle, SQL Server, etc) first to retrieve > the > > information of the table. With that information, the source will decide > > whether to further push a filter to the external system based on the > > efficiency. E.g. only push the indexed fields. In contrast, ALWAYS will > > just always push the supported filters to the external system, regardless > > of the efficiency. In case there are filters that are not supported, > > according to the current contract of SupportsFilterPushdown, these > > unsupported filters should just be returned by the > > *SupportsFilterPushdown.applyFilters()* method as remaining filters. > > Therefore, there is no need to throw exceptions here. This is likely the > > desired behavior for most users, IMO. If there are cases that users > really > > want to get alerted when a filter cannot be pushed to the external > system, > > we can add another value like "ENFORCED_ALWAYS", which behaves like > ALWAYS, > > but throws exceptions when a filter cannot be applied to the external > > system. But personally I don't see much value in doing this. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > > > > > On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun .invalid> > > wrote: > > > >> Hi Becket, > >> > >> The MySQL connector is currently in the flink-connector-jdbc repository > >> and is not a standalone connector. > >> Is it too unique to use "mysql" as the configuration option prefix? > >> > >> Also, I would like to ask about the difference in behavior between AUTO > >> and ALWAYS. > >> It seems that we cannot guarantee the pushing down of all filters to the > >> external system under the ALWAYS > >> mode because not all filters in Flink SQL are supported by the external > >> system. > >> Should we throw an error when encountering a filter that cannot be > pushed > >> down in the ALWAYS mode? > >> > >> Thanks, > >> Jiabao > >> > >>> 2023年12月18日 15:34,Becket Qin 写道: > >>> > >>> Hi JIabao, > >>> > >>> Thanks for updating the FLIP. Maybe I did not explain it clearly > enough. > >> My > >>> point is that given there are various good flavors of behaviors > handling > >>> filters pushed down, we should not have a common config of > >>> "ignore.filter.pushdown", because the behavior is not *common*. > >>> > >>> It looks like the original motivation of this FLIP is just for MySql. > >> Let's > >>> focus on what is the best solution for MySql connector here first. > After > >>> that, if people think the best behavior for MySql happens to be a > common > >>> one, we can then discuss whether that is worth being added to the base > >>> implementation of source. For MySQL, if we are going to introduce a > >> config > >>> to MySql, why not have something like "mysql.filter.handling.policy" > with > >>> value of AUTO / NEVER / ALWAYS? Isn't that better than > >>> "ignore.filter.pushdown"? > >>> > >>> Thanks, > >>> > >>> Jiangjie (Becket) Qin > >>> > >>> > >>> > >>> On Sun, Dec 17, 2023 at 11:30 PM Jiabao Sun >> .invalid> > >>> wrote: > >>> > Hi Becket, > > The FLIP document has been updated as well. > Please take a look when you have time. > > Thanks, > Jiabao > > > > 2023年12月17日 22:54,Jiabao Sun 写道: > > > >
Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources
Hi Becket, The FLIP document[1] has been updated. Could you help take a look again? Thanks, Jiabao [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 > 2023年12月18日 16:53,Becket Qin 写道: > > Yes, that sounds reasonable to me. We can start with ALWAYS and NEVER, and > add more policies as needed. > > Thanks, > > Jiangjie (Becket) Qin > > On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun > wrote: > >> Thanks Bucket, >> >> The jdbc.filter.handling.policy is good to me as it provides sufficient >> extensibility for future filter pushdown optimizations. >> However, currently, we don't have an implementation for the AUTO mode, and >> it seems that the AUTO mode can easily be confused with the ALWAYS mode >> because users don't have the opportunity to MANUALLY decide which filters >> to push down. >> >> I suggest that we only introduce the ALWAYS and NEVER modes for now, and >> we can consider introducing more flexible policies in the future, >> such as INDEX_ONLY, NUMBERIC_ONLY and so on. >> >> WDYT? >> >> Best, >> Jiabao >> >> >> >>> 2023年12月18日 16:27,Becket Qin 写道: >>> >>> Hi Jiabao, >>> >>> Please see the reply inline. >>> >>> The MySQL connector is currently in the flink-connector-jdbc repository and is not a standalone connector. Is it too unique to use "mysql" as the configuration option prefix? >>> >>> If the intended behavior makes sense to all the supported JDBC drivers, >> we >>> can make this a JDBC connector configuration. >>> >>> Also, I would like to ask about the difference in behavior between AUTO >> and ALWAYS. It seems that we cannot guarantee the pushing down of all filters to the external system under the ALWAYS mode because not all filters in Flink SQL are supported by the external system. Should we throw an error when encountering a filter that cannot be >> pushed down in the ALWAYS mode? >>> >>> The idea of AUTO is to do efficiency-aware pushdowns. The source will >> query >>> the external system (MySQL, Oracle, SQL Server, etc) first to retrieve >> the >>> information of the table. With that information, the source will decide >>> whether to further push a filter to the external system based on the >>> efficiency. E.g. only push the indexed fields. In contrast, ALWAYS will >>> just always push the supported filters to the external system, regardless >>> of the efficiency. In case there are filters that are not supported, >>> according to the current contract of SupportsFilterPushdown, these >>> unsupported filters should just be returned by the >>> *SupportsFilterPushdown.applyFilters()* method as remaining filters. >>> Therefore, there is no need to throw exceptions here. This is likely the >>> desired behavior for most users, IMO. If there are cases that users >> really >>> want to get alerted when a filter cannot be pushed to the external >> system, >>> we can add another value like "ENFORCED_ALWAYS", which behaves like >> ALWAYS, >>> but throws exceptions when a filter cannot be applied to the external >>> system. But personally I don't see much value in doing this. >>> >>> Thanks, >>> >>> Jiangjie (Becket) Qin >>> >>> >>> >>> On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun > .invalid> >>> wrote: >>> Hi Becket, The MySQL connector is currently in the flink-connector-jdbc repository and is not a standalone connector. Is it too unique to use "mysql" as the configuration option prefix? Also, I would like to ask about the difference in behavior between AUTO and ALWAYS. It seems that we cannot guarantee the pushing down of all filters to the external system under the ALWAYS mode because not all filters in Flink SQL are supported by the external system. Should we throw an error when encountering a filter that cannot be >> pushed down in the ALWAYS mode? Thanks, Jiabao > 2023年12月18日 15:34,Becket Qin 写道: > > Hi JIabao, > > Thanks for updating the FLIP. Maybe I did not explain it clearly >> enough. My > point is that given there are various good flavors of behaviors >> handling > filters pushed down, we should not have a common config of > "ignore.filter.pushdown", because the behavior is not *common*. > > It looks like the original motivation of this FLIP is just for MySql. Let's > focus on what is the best solution for MySql connector here first. >> After > that, if people think the best behavior for MySql happens to be a >> common > one, we can then discuss whether that is worth being added to the base > implementation of source. For MySQL, if we are going to introduce a config > to MySql, why not have something like "mysql.filter.handling.policy" >> with > value of AUTO / NEVER / ALWAYS? Isn't that better than > "ignore.filter.pushdown"? > > Thanks, > > Jiangjie (Becket) Qin > > > >>>
Re: [DISCUSS] FLIP-400: AsyncScalarFunction for asynchronous scalar function support
Hi Xuyang and Alan, thanks for this productive discussion. > Would it make a difference if it were exposed by the explain @Alan: I think this is great idea. +1 on exposing the sync/async behavior thought EXPLAIN. > Is there an easy way to determine if the output of an async function > would be problematic or not? Clear "no" on this. Changelog semantics make the planner complex and we need to be careful. Therefore I would strongly suggest we introduce ORDERED and slowly enable UNORDERED whenever we see a good fit for it in plans with appropriate planner rules that guard it. > If the input to the operator is append-only, it seems fine, because > this implies that each row is effectively independent and ordering is > unimportant. As @Xuyang pointed out, it's not only the input that decides whether append-only is safe. It's also the subsequent operators in the pipeline. The example of Xuyang is a good one, when the sink operates in upsert mode. Append-only source, append-only operators, and append-only sink are safer. However, even in this combination, a row is not fully "independent" there are still watermarks flowing between rows: R(5), W(4), R(3), R(4), R(2), R(1), W(0) So unordering should be fine *within* watermarks. This is also what watermarks are good for, a trade-off between strict ordering and making progress. The async operator from DataStream API also supports this if I remember correctly. However, it assumes a timestamp is present in StreamRecord on which it can work. But this is not the case within the SQL engine. TLDR: Let's focus on ORDERED first. If we want to use UNORDERED, I would suggest to check the input operator for exactly 1 time attribute column. If there is exactly 1 time attribute column, we could insert it into the StreamRecord and allow UNORDERED mode. If this condition is not met, we go with ORDERED. Regards, Timo On 18.12.23 07:05, Xuyang wrote: Hi, Alan and Timo. Thanks for your reply. Would it make a difference if it were exposed by the explain method (the operator having "syncMode" vs not)? @Alan: I think this is a good way to tell the user what mode these async udx are currently in. A regular SQL user doesn't care whether the function is sync or async. @Timo: I agree that the planner should throw as few exceptions as possible rather than confusing users. So I think it is a good way to expose syncMode through explain syntax. If the input to the operator is append-only, it seems fine, because this implies that each row is effectively independent and ordering is unimportant. For example, if the query is > an append-only `SELECT FUNC(c) FROM t`, I don't see a reason why the > operator is not allowed to produce unordered results. @Timo, @Alan: IIUC, there seems to be something wrong here. Take kafka as source and mysql as sink as an example. Although kafka is an append-only source, one of its fields is used as pk when writing to mysql. If async udx is executed in an unordered mode, there may be problems with the data in mysql in the end. In this case, we need to ensure that the sink-based pk is in order actually. -- Best! Xuyang At 2023-12-16 03:33:55, "Alan Sheinberg" wrote: Thanks for the replies everyone. My responses are inline: About the configs, what do you think using hints as mentioned in [1]. @Aitozi: I think hints could be a good way to do this, similar to lookup joins or the proposal in FLIP-313. One benefit of hints is that they allow for the highest granularity of configuration because you can decide at each and every call site just what parameters to use. The downside of hints is that there's more syntax to learn and more verbosity. I'm somewhat partial to a configuration like this with a class definition level of granularity (similar to how metrics reporters are defined [1]): table.exec.async-scalar.myfunc.class: org.apache.flink.MyAsyncScalarFunction table.exec.async-scalar.myfunc.buffer-capacity: 10 ... As Timo mentioned, the downside to this is that there's not a nice static way to do this at the moment, unless you extend ConfigOption. It would be good ultimately if Lookup joins, async scalar functions, and other future configurable UDFs shared the same methodology, but maybe a unified approach is a followup discussion. I’m just curious why you don’t use conf(global) and query hint(individual async udx) to mark the output mode 'order' or 'unorder' like async look join [1] and async udtf[2], but chose to introduce a new enum in AsyncScalarFunction. @Xuyang: I'm open to adding hints. I think the important part is that we have some method for the user to have a class definition level way to define whether ORDERED or ALLOW_UNORDERED is most appropriate. I don't have a strong sense yet for what would be most appropriately exposed as a FunctionRequirement vs a simple configuration/hint. What about throwing an exception to make it clear to users that using async scalar
Re: [DISCUSS] Release Flink 1.18.1
Hi folks, Thanks Martijn for driving the https://issues.apache.org/jira/browse/FLINK-33704, which will solve the issue of https://issues.apache.org/jira/browse/FLINK-33793. I will move forward with the 1.18.1 release today. Best regards, Jing On Thu, Dec 14, 2023 at 11:17 AM Jing Ge wrote: > Hi folks, > > What Martijn said makes sense. We should pay more attention to > https://issues.apache.org/jira/browse/FLINK-33793. I have also tried to > contribute and share my thoughts in the PR > https://github.com/apache/flink/pull/23489. > > For 1.18.1-rc1, I will wait until tomorrow. If there is no progress with > the PR, I will move forward with the release, unless we upgrade the ticket > to be a Blocker. Look forward to your feedback. > > Best regards, > Jing > > On Wed, Dec 13, 2023 at 9:57 AM Jing Ge wrote: > >> Hi Martijn, >> >> Thanks for the heads-up! I already started 1.18.1-rc1[1]. But please feel >> free to ping me. I will either redo rc1 or start rc2. Thanks! >> >> Best regards, >> Jing >> >> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.18.1-rc1/ >> >> On Tue, Dec 12, 2023 at 9:49 PM Martijn Visser >> wrote: >> >>> Hi Jing, >>> >>> The only thing that was brought to my attention today was >>> https://issues.apache.org/jira/browse/FLINK-33793. I've marked it as a >>> Critical (I think we should have working checkpointing with GCS), but >>> could also be considered a Blocker. There is a PR open for it, but I >>> don't think it's the right fix so I'm testing >>> >>> https://github.com/apache/flink/commit/2ebe7f6e090690486c1954099a2b57283c578192 >>> at this moment. If you haven't started the 1.18.1 release yet, perhaps >>> we could include it (hopefully we can merge it tomorrow), else it >>> would have to wait for the next release. >>> >>> Thanks, >>> >>> Martijn >>> >>> On Tue, Dec 12, 2023 at 9:30 AM Jing Ge >>> wrote: >>> > >>> > Hi All, >>> > >>> > Thank you for your feedback! >>> > >>> > Since FLINK-33523[1] is done(thanks Martijn for driving it) and there >>> are >>> > no other concerns or objections. Currently I am not aware of any >>> unresolved >>> > blockers. There is one critical task[2] whose PR still has some >>> checkstyle >>> > issues. I will try to include it into 1.18.1 with best effort, since it >>> > depends on how quickly the author can finish it and cp to 1.18. If >>> anyone >>> > considers FLINK-33588 as a must-have fix in 1.18.1, please let me know, >>> > thanks! >>> > >>> > I will start working on the RC1 release today. >>> > >>> > @benchao: gotcha wrt FLINK-33313 >>> > >>> > Best regards, >>> > Jing >>> > >>> > [1] https://issues.apache.org/jira/browse/FLINK-33523 >>> > [2] https://issues.apache.org/jira/browse/FLINK-33588 >>> > >>> > On Tue, Dec 12, 2023 at 7:03 AM weijie guo >>> > wrote: >>> > >>> > > Thanks Jing for driving this bug-fix release. >>> > > >>> > > +1 from my side. >>> > > >>> > > Best regards, >>> > > >>> > > Weijie >>> > > >>> > > >>> > > Jark Wu 于2023年12月12日周二 12:17写道: >>> > > >>> > > > Thanks Jing for driving 1.18.1. >>> > > > +1 for this. >>> > > > >>> > > > Best, >>> > > > Jark >>> > > > >>> > > > On Mon, 11 Dec 2023 at 16:59, Hong Liang >>> wrote: >>> > > > >>> > > > > +1. Thanks Jing for driving this. >>> > > > > >>> > > > > Hong >>> > > > > >>> > > > > On Mon, Dec 11, 2023 at 2:27 AM Yun Tang >>> wrote: >>> > > > > >>> > > > > > Thanks Jing for driving 1.18.1 release, +1 for this. >>> > > > > > >>> > > > > > >>> > > > > > Best >>> > > > > > Yun Tang >>> > > > > > >>> > > > > > From: Rui Fan <1996fan...@gmail.com> >>> > > > > > Sent: Saturday, December 9, 2023 21:46 >>> > > > > > To: dev@flink.apache.org >>> > > > > > Subject: Re: [DISCUSS] Release Flink 1.18.1 >>> > > > > > >>> > > > > > Thanks Jing for driving this release, +1 >>> > > > > > >>> > > > > > Best, >>> > > > > > Rui >>> > > > > > >>> > > > > > On Sat, Dec 9, 2023 at 7:33 AM Leonard Xu >>> wrote: >>> > > > > > >>> > > > > > > Thanks Jing for driving this release, +1 >>> > > > > > > >>> > > > > > > Best, >>> > > > > > > Leonard >>> > > > > > > >>> > > > > > > > 2023年12月9日 上午1:23,Danny Cranmer >>> 写道: >>> > > > > > > > >>> > > > > > > > +1 >>> > > > > > > > >>> > > > > > > > Thanks for driving this >>> > > > > > > > >>> > > > > > > > On Fri, 8 Dec 2023, 12:05 Timo Walther, < >>> twal...@apache.org> >>> > > > wrote: >>> > > > > > > > >>> > > > > > > >> Thanks for taking care of this Jing. >>> > > > > > > >> >>> > > > > > > >> +1 to release 1.18.1 for this. >>> > > > > > > >> >>> > > > > > > >> Cheers, >>> > > > > > > >> Timo >>> > > > > > > >> >>> > > > > > > >> >>> > > > > > > >> On 08.12.23 10:00, Benchao Li wrote: >>> > > > > > > >>> I've merged FLINK-33313 to release-1.18 branch. >>> > > > > > > >>> >>> > > > > > > >>> Péter Váry 于2023年12月8日周五 >>> > > 16:56写道: >>> > > > > > > >>> > > > > > > Hi Jing, >>> > > > > > > Thanks for taking care of this! >>> > > > > > > +1 (non-binding) >>> > > > > > > Peter >>> > > >
Re: Question on lookup joins
I don't see a problem in the result. Since you are using LEFT JOIN, the NULLs are expected where there is no matching result in the right table. Hang Ruan 于2023年12月18日周一 09:39写道: > > Hi, David. > > The FLIP-377[1] is about this part. You could take a look at it. > > Best, > Hang > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 > > > Hang Ruan 于2023年12月17日周日 20:56写道: > > > Hi, David. > > > > I think you are right that the value with NULL should not be returned if > > the filter push down is closed. > > > > Maybe you should explain this sql to make sure this filter not be pushed > > down to the lookup source. > > > > I see the configuration > > 'table.optimizer.source.predicate-pushdown-enabled' relies on the class > > FilterableTableSource, which is deprecated. > > I am not sure whether this configuration is still useful for jdbc > > connector, which is using the SupportsFilterPushDown. > > > > Maybe the jdbc connector should read this configuration and return an > > empty 'acceptedFilters' in the method 'applyFilters'. > > > > Best, > > Hang > > > > David Radley 于2023年12月16日周六 01:47写道: > > > >> Hi , > >> I am working on FLINK-33365 which related to JDBC predicate pushdown. I > >> want to ensure that the same results occur with predicate pushdown as > >> without. So I am asking this question outside the pr / issue. > >> > >> I notice the following behaviour for lookup joins without predicate > >> pushdown. I was not expecting all the s , when there is not a > >> matching join key. ’a’ is a table in paimon and ‘db’ is a relational > >> database. > >> > >> > >> > >> Flink SQL> select * from a; > >> > >> +++-+ > >> > >> | op | ip |proctime | > >> > >> +++-+ > >> > >> | +I |10.10.10.10 | 2023-12-15 17:36:10.028 | > >> > >> | +I |20.20.20.20 | 2023-12-15 17:36:10.030 | > >> > >> | +I |30.30.30.30 | 2023-12-15 17:36:10.031 | > >> > >> ^CQuery terminated, received a total of 3 rows > >> > >> > >> > >> Flink SQL> select * from db_catalog.menagerie.e; > >> > >> > >> +++-+-+-+-+ > >> > >> | op | ip |type | age | > >> height | weight | > >> > >> > >> +++-+-+-+-+ > >> > >> | +I |10.10.10.10 | 1 | 30 | > >>100 | 100 | > >> > >> | +I |10.10.10.10 | 2 | 40 | > >> 90 | 110 | > >> > >> | +I |10.10.10.10 | 2 | 50 | > >> 80 | 120 | > >> > >> | +I |10.10.10.10 | 3 | 50 | > >> 70 | 40 | > >> > >> | +I |20.20.20.20 | 3 | 30 | > >> 80 | 90 | > >> > >> > >> +++-+-+-+-+ > >> > >> Received a total of 5 rows > >> > >> > >> > >> Flink SQL> set table.optimizer.source.predicate-pushdown-enabled=false; > >> > >> [INFO] Execute statement succeed. > >> > >> > >> > >> Flink SQL> SELECT * FROM a left join mariadb_catalog.menagerie.e FOR > >> SYSTEM_TIME AS OF a.proctime on e.type = 2 and a.ip = e.ip; > >> > >> > >> +++-++-+-+-+-+ > >> > >> | op | ip |proctime | > >> ip0 |type | age | height | > >> weight | > >> > >> > >> +++-++-+-+-+-+ > >> > >> | +I |10.10.10.10 | 2023-12-15 17:38:05.169 | > >> 10.10.10.10 | 2 | 40 | 90 | > >> 110 | > >> > >> | +I |10.10.10.10 | 2023-12-15 17:38:05.169 | > >> 10.10.10.10 | 2 | 50 | 80 | > >> 120 | > >> > >> | +I |20.20.20.20 | 2023-12-15 17:38:05.170 | > >> | | | | > >> | > >> > >> | +I |30.30.30.30 | 2023-12-15 17:38:05.172 | > >> | | | | > >> | > >> > >> Unless otherwise stated above: > >> > >> IBM United Kingdom Limited > >> Registered in England and Wales with number 741598 > >> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > >> > > -- Best, Benchao Li
[jira] [Created] (FLINK-33872) Checkpoint history does not display for completed jobs
Hong Liang Teoh created FLINK-33872: --- Summary: Checkpoint history does not display for completed jobs Key: FLINK-33872 URL: https://issues.apache.org/jira/browse/FLINK-33872 Project: Flink Issue Type: Bug Components: Runtime / REST Affects Versions: 1.18.0 Reporter: Hong Liang Teoh Fix For: 1.19.0, 1.18.2 Attachments: image-2023-12-18-11-37-11-914.png, image-2023-12-18-11-37-29-596.png Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint history for completed jobs (CANCELED, FAILED, FINISHED). After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint history does not show up for completed jobs. *Reproduction steps:* # Start a Flink cluster. # Submit a job with checkpointing enabled. # Wait until at least 1 checkpoint completes. # Cancel job. # Open the Flink dashboard > Job > Checkpoints > History. We will see log line in JobManager saying "FlinkJobNotFoundException: Could not find Flink job ( )" *Snapshot of failure:* When job is running, we can see checkpoints. !image-2023-12-18-11-37-11-914.png! When job has been CANCELLED, we no longer see checkpoints data. !image-2023-12-18-11-37-29-596.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33873) Create a Redis HyperLogLog Connector for Flink
Jinsui Chen created FLINK-33873: --- Summary: Create a Redis HyperLogLog Connector for Flink Key: FLINK-33873 URL: https://issues.apache.org/jira/browse/FLINK-33873 Project: Flink Issue Type: New Feature Components: Connectors / Redis Streams Reporter: Jinsui Chen Redis HyperLogLog is a probabilistic data structure used for estimating the cardinality of a dataset, which is the number of unique elements in a set. I think it is possible to create a sink connector for HyperLogLog. FLINK-15571 is about Redis stream connector. Since there is no component for the Redis connector as a whole, the issue is created under this component. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable
+1 (binding) On Mon 18. Dec 2023 at 09:34, Péter Váry wrote: > Hi everyone, > > Since there were no further comments on the discussion thread [1], I would > like to start the vote for FLIP-372 [2]. > > The FLIP started as a small new feature, but in the discussion thread and > in a similar parallel thread [3] we opted for a somewhat bigger change in > the Sink V2 API. > > Please read the FLIP and cast your vote. > > The vote will remain open for at least 72 hours and only concluded if there > are no objections and enough (i.e. at least 3) binding votes. > > Thanks, > Peter > > [1] - https://lists.apache.org/thread/344pzbrqbbb4w0sfj67km25msp7hxlyd > [2] - > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Allow+TwoPhaseCommittingSink+WithPreCommitTopology+to+alter+the+type+of+the+Committable > [3] - https://lists.apache.org/thread/h6nkgth838dlh5s90sd95zd6hlsxwx57 >
Re: [VOTE] FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable
+1 (binding) Gyula On Mon, 18 Dec 2023 at 13:04, Márton Balassi wrote: > +1 (binding) > > On Mon 18. Dec 2023 at 09:34, Péter Váry > wrote: > > > Hi everyone, > > > > Since there were no further comments on the discussion thread [1], I > would > > like to start the vote for FLIP-372 [2]. > > > > The FLIP started as a small new feature, but in the discussion thread and > > in a similar parallel thread [3] we opted for a somewhat bigger change in > > the Sink V2 API. > > > > Please read the FLIP and cast your vote. > > > > The vote will remain open for at least 72 hours and only concluded if > there > > are no objections and enough (i.e. at least 3) binding votes. > > > > Thanks, > > Peter > > > > [1] - https://lists.apache.org/thread/344pzbrqbbb4w0sfj67km25msp7hxlyd > > [2] - > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Allow+TwoPhaseCommittingSink+WithPreCommitTopology+to+alter+the+type+of+the+Committable > > [3] - https://lists.apache.org/thread/h6nkgth838dlh5s90sd95zd6hlsxwx57 > > >
Request permission for creating a new FLIP page
hi, all I want to add a new connector for Flink (see FLINK-33873). For this level of change a new FLIP page needs to be created, but I don't have the corresponding permissions. Following the instructions in the FLIP document, I sent this email. Could anyone help me with this? Thanks, Jinsui
[jira] [Created] (FLINK-33874) Introduce resource request wait mechanism at DefaultDeclarativeSlotPool side for Default Scheduler
RocMarshal created FLINK-33874: -- Summary: Introduce resource request wait mechanism at DefaultDeclarativeSlotPool side for Default Scheduler Key: FLINK-33874 URL: https://issues.apache.org/jira/browse/FLINK-33874 Project: Flink Issue Type: New Feature Components: Runtime / Task Reporter: RocMarshal -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33875) Support slots wait mechanism at DeclarativeSlotPoolBridge side for Default Scheduler
RocMarshal created FLINK-33875: -- Summary: Support slots wait mechanism at DeclarativeSlotPoolBridge side for Default Scheduler Key: FLINK-33875 URL: https://issues.apache.org/jira/browse/FLINK-33875 Project: Flink Issue Type: Sub-task Components: Runtime / Task Reporter: RocMarshal -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Request permission for creating a new FLIP page
Hi Jinsui, You should now have permissions. Best regards, Martijn On Mon, Dec 18, 2023 at 1:56 PM Jinsui Chen wrote: > > hi, all > > I want to add a new connector for Flink (see FLINK-33873). For this level > of change a new FLIP page needs to be created, but I don't have the > corresponding permissions. > > Following the instructions in the FLIP document, I sent this email. Could > anyone help me with this? > > Thanks, > Jinsui
退订
退订
[jira] [Created] (FLINK-33876) [JUnit5 Migration] Introduce testName method in TableTestBase
Jiabao Sun created FLINK-33876: -- Summary: [JUnit5 Migration] Introduce testName method in TableTestBase Key: FLINK-33876 URL: https://issues.apache.org/jira/browse/FLINK-33876 Project: Flink Issue Type: Sub-task Components: Table SQL / Planner, Tests Affects Versions: 1.19.0 Reporter: Jiabao Sun After completing the JUnit5 migration in the table planner, there is an incompatibility issue with JUnit TestName and TestInfo. Therefore, considering introducing the methodName method in TableTestBase. External connectors's TablePlanTest can override this method when performing JUnit 5 migration for TableTestBase to avoid compilation issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Release Flink 1.18.1
Hi, I am waiting for the CI of https://issues.apache.org/jira/browse/FLINK-33872 @Hong. It would be great if anyone would like to review the PR. Thanks. 1.18.1-rc1 release will be canceled because of https://issues.apache.org/jira/browse/FLINK-33704, I will start with rc2 once FLINK-33872 has been merged into 1.18. Best regards, Jing On Mon, Dec 18, 2023 at 11:52 AM Jing Ge wrote: > Hi folks, > > Thanks Martijn for driving the > https://issues.apache.org/jira/browse/FLINK-33704, which will solve the > issue of https://issues.apache.org/jira/browse/FLINK-33793. I will move > forward with the 1.18.1 release today. > > Best regards, > Jing > > On Thu, Dec 14, 2023 at 11:17 AM Jing Ge wrote: > >> Hi folks, >> >> What Martijn said makes sense. We should pay more attention to >> https://issues.apache.org/jira/browse/FLINK-33793. I have also tried to >> contribute and share my thoughts in the PR >> https://github.com/apache/flink/pull/23489. >> >> For 1.18.1-rc1, I will wait until tomorrow. If there is no progress with >> the PR, I will move forward with the release, unless we upgrade the ticket >> to be a Blocker. Look forward to your feedback. >> >> Best regards, >> Jing >> >> On Wed, Dec 13, 2023 at 9:57 AM Jing Ge wrote: >> >>> Hi Martijn, >>> >>> Thanks for the heads-up! I already started 1.18.1-rc1[1]. But please >>> feel free to ping me. I will either redo rc1 or start rc2. Thanks! >>> >>> Best regards, >>> Jing >>> >>> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.18.1-rc1/ >>> >>> On Tue, Dec 12, 2023 at 9:49 PM Martijn Visser >>> wrote: >>> Hi Jing, The only thing that was brought to my attention today was https://issues.apache.org/jira/browse/FLINK-33793. I've marked it as a Critical (I think we should have working checkpointing with GCS), but could also be considered a Blocker. There is a PR open for it, but I don't think it's the right fix so I'm testing https://github.com/apache/flink/commit/2ebe7f6e090690486c1954099a2b57283c578192 at this moment. If you haven't started the 1.18.1 release yet, perhaps we could include it (hopefully we can merge it tomorrow), else it would have to wait for the next release. Thanks, Martijn On Tue, Dec 12, 2023 at 9:30 AM Jing Ge wrote: > > Hi All, > > Thank you for your feedback! > > Since FLINK-33523[1] is done(thanks Martijn for driving it) and there are > no other concerns or objections. Currently I am not aware of any unresolved > blockers. There is one critical task[2] whose PR still has some checkstyle > issues. I will try to include it into 1.18.1 with best effort, since it > depends on how quickly the author can finish it and cp to 1.18. If anyone > considers FLINK-33588 as a must-have fix in 1.18.1, please let me know, > thanks! > > I will start working on the RC1 release today. > > @benchao: gotcha wrt FLINK-33313 > > Best regards, > Jing > > [1] https://issues.apache.org/jira/browse/FLINK-33523 > [2] https://issues.apache.org/jira/browse/FLINK-33588 > > On Tue, Dec 12, 2023 at 7:03 AM weijie guo >>> > > wrote: > > > Thanks Jing for driving this bug-fix release. > > > > +1 from my side. > > > > Best regards, > > > > Weijie > > > > > > Jark Wu 于2023年12月12日周二 12:17写道: > > > > > Thanks Jing for driving 1.18.1. > > > +1 for this. > > > > > > Best, > > > Jark > > > > > > On Mon, 11 Dec 2023 at 16:59, Hong Liang wrote: > > > > > > > +1. Thanks Jing for driving this. > > > > > > > > Hong > > > > > > > > On Mon, Dec 11, 2023 at 2:27 AM Yun Tang wrote: > > > > > > > > > Thanks Jing for driving 1.18.1 release, +1 for this. > > > > > > > > > > > > > > > Best > > > > > Yun Tang > > > > > > > > > > From: Rui Fan <1996fan...@gmail.com> > > > > > Sent: Saturday, December 9, 2023 21:46 > > > > > To: dev@flink.apache.org > > > > > Subject: Re: [DISCUSS] Release Flink 1.18.1 > > > > > > > > > > Thanks Jing for driving this release, +1 > > > > > > > > > > Best, > > > > > Rui > > > > > > > > > > On Sat, Dec 9, 2023 at 7:33 AM Leonard Xu wrote: > > > > > > > > > > > Thanks Jing for driving this release, +1 > > > > > > > > > > > > Best, > > > > > > Leonard > > > > > > > > > > > > > 2023年12月9日 上午1:23,Danny Cranmer 写道: > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > Thanks for driving this > > > > > > > > > > > > > > On Fri, 8 Dec 2023, 12:05 Timo Walther, < twal...@apache.org> > > > wrote: > > > > > > > > > > > > > >> Thanks for taking care of this Ji
Re: [DISCUSS] Release Flink 1.18.1
Hi folks, Found another blocker issue for 1.18: https://github.com/apache/flink/pull/23950. Thanks Sergey for the heads-up. Best regards, Jing On Mon, Dec 18, 2023 at 5:02 PM Jing Ge wrote: > Hi, > > I am waiting for the CI of > https://issues.apache.org/jira/browse/FLINK-33872 @Hong. It would be > great if anyone would like to review the PR. Thanks. > > 1.18.1-rc1 release will be canceled because of > https://issues.apache.org/jira/browse/FLINK-33704, I will start with rc2 > once FLINK-33872 has been merged into 1.18. > > Best regards, > Jing > > On Mon, Dec 18, 2023 at 11:52 AM Jing Ge wrote: > >> Hi folks, >> >> Thanks Martijn for driving the >> https://issues.apache.org/jira/browse/FLINK-33704, which will solve the >> issue of https://issues.apache.org/jira/browse/FLINK-33793. I will move >> forward with the 1.18.1 release today. >> >> Best regards, >> Jing >> >> On Thu, Dec 14, 2023 at 11:17 AM Jing Ge wrote: >> >>> Hi folks, >>> >>> What Martijn said makes sense. We should pay more attention to >>> https://issues.apache.org/jira/browse/FLINK-33793. I have also tried to >>> contribute and share my thoughts in the PR >>> https://github.com/apache/flink/pull/23489. >>> >>> For 1.18.1-rc1, I will wait until tomorrow. If there is no progress with >>> the PR, I will move forward with the release, unless we upgrade the ticket >>> to be a Blocker. Look forward to your feedback. >>> >>> Best regards, >>> Jing >>> >>> On Wed, Dec 13, 2023 at 9:57 AM Jing Ge wrote: >>> Hi Martijn, Thanks for the heads-up! I already started 1.18.1-rc1[1]. But please feel free to ping me. I will either redo rc1 or start rc2. Thanks! Best regards, Jing [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.18.1-rc1/ On Tue, Dec 12, 2023 at 9:49 PM Martijn Visser < martijnvis...@apache.org> wrote: > Hi Jing, > > The only thing that was brought to my attention today was > https://issues.apache.org/jira/browse/FLINK-33793. I've marked it as a > Critical (I think we should have working checkpointing with GCS), but > could also be considered a Blocker. There is a PR open for it, but I > don't think it's the right fix so I'm testing > > https://github.com/apache/flink/commit/2ebe7f6e090690486c1954099a2b57283c578192 > at this moment. If you haven't started the 1.18.1 release yet, perhaps > we could include it (hopefully we can merge it tomorrow), else it > would have to wait for the next release. > > Thanks, > > Martijn > > On Tue, Dec 12, 2023 at 9:30 AM Jing Ge > wrote: > > > > Hi All, > > > > Thank you for your feedback! > > > > Since FLINK-33523[1] is done(thanks Martijn for driving it) and > there are > > no other concerns or objections. Currently I am not aware of any > unresolved > > blockers. There is one critical task[2] whose PR still has some > checkstyle > > issues. I will try to include it into 1.18.1 with best effort, since > it > > depends on how quickly the author can finish it and cp to 1.18. If > anyone > > considers FLINK-33588 as a must-have fix in 1.18.1, please let me > know, > > thanks! > > > > I will start working on the RC1 release today. > > > > @benchao: gotcha wrt FLINK-33313 > > > > Best regards, > > Jing > > > > [1] https://issues.apache.org/jira/browse/FLINK-33523 > > [2] https://issues.apache.org/jira/browse/FLINK-33588 > > > > On Tue, Dec 12, 2023 at 7:03 AM weijie guo < > guoweijieres...@gmail.com> > > wrote: > > > > > Thanks Jing for driving this bug-fix release. > > > > > > +1 from my side. > > > > > > Best regards, > > > > > > Weijie > > > > > > > > > Jark Wu 于2023年12月12日周二 12:17写道: > > > > > > > Thanks Jing for driving 1.18.1. > > > > +1 for this. > > > > > > > > Best, > > > > Jark > > > > > > > > On Mon, 11 Dec 2023 at 16:59, Hong Liang > wrote: > > > > > > > > > +1. Thanks Jing for driving this. > > > > > > > > > > Hong > > > > > > > > > > On Mon, Dec 11, 2023 at 2:27 AM Yun Tang > wrote: > > > > > > > > > > > Thanks Jing for driving 1.18.1 release, +1 for this. > > > > > > > > > > > > > > > > > > Best > > > > > > Yun Tang > > > > > > > > > > > > From: Rui Fan <1996fan...@gmail.com> > > > > > > Sent: Saturday, December 9, 2023 21:46 > > > > > > To: dev@flink.apache.org > > > > > > Subject: Re: [DISCUSS] Release Flink 1.18.1 > > > > > > > > > > > > Thanks Jing for driving this release, +1 > > > > > > > > > > > > Best, > > > > > > Rui > > > > > > > > > > > > On Sat, Dec 9, 2023 at 7:33 AM Leonard Xu > wrote: > > > > > > > > > > > > > Thanks Jing for driving this release, +1 > > > > > > > > >
Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources
Hi JIabao, Thanks for updating the FLIP. It looks better. Some suggestions / questions: 1. In the motivation section: > *Currently, Flink Table/SQL does not expose fine-grained control for users > to control filter pushdown. **However, filter pushdown has some side > effects, such as additional computational pressure on external > systems. Moreover, Improper queries can lead to issues such as full table > scans, which in turn can impact the stability of external systems.* This statement sounds like the side effects are there for all the systems, which is inaccurate. Maybe we can say: *Currently, Flink Table/SQL does not expose fine-grained control for users to control filter pushdown. **However, filter pushdown may have side effects in some cases, **such as additional computational pressure on external systems. The JDBC source is a typical example of that. If a filter is pushed down to the database, an expensive full table scan may happen if the filter involves unindexed columns.* 2. Regarding the prefix, usually a prefix is not required for the top level connector options. This is because the *connector* option is already there. So 'connector' = 'jdbc', 'filter.handling.policy' = 'ALWAYS' is sufficient. The prefix is needed when the option is for a 2nd+ level. For example, 'connector' = 'jdbc', 'format' = 'orc', 'orc.some.option' = 'some_value' In this case, the prefix of "orc" is needed to make it clear this option is for the format. I am guessing that the reason that currently the connector prefix is there is because the values of this configuration may be different depending on the connectors. For example, jdbc may have INDEXED_ONLY while MongoDB may have something else. Personally speaking, I am fine if we do not have a prefix in this case because users have already specified the connector type and it is intuitive enough that the option value is for that connector, not others. 3. can we clarify on the following statement: > *Introduce the native configuration [prefix].filter.handling.policy in the > connector.* What do you mean by "native configuration"? From what I understand, the FLIP does the following: - introduces a new configuration to the JDBC and MongoDB connector. - Suggests a convention option name if other connectors are going to add an option for the same purpose. Thanks, Jiangjie (Becket) Qin On Mon, Dec 18, 2023 at 5:45 PM Jiabao Sun wrote: > Hi Becket, > > The FLIP document[1] has been updated. > Could you help take a look again? > > Thanks, > Jiabao > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 > > > > 2023年12月18日 16:53,Becket Qin 写道: > > > > Yes, that sounds reasonable to me. We can start with ALWAYS and NEVER, > and > > add more policies as needed. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun .invalid> > > wrote: > > > >> Thanks Bucket, > >> > >> The jdbc.filter.handling.policy is good to me as it provides sufficient > >> extensibility for future filter pushdown optimizations. > >> However, currently, we don't have an implementation for the AUTO mode, > and > >> it seems that the AUTO mode can easily be confused with the ALWAYS mode > >> because users don't have the opportunity to MANUALLY decide which > filters > >> to push down. > >> > >> I suggest that we only introduce the ALWAYS and NEVER modes for now, and > >> we can consider introducing more flexible policies in the future, > >> such as INDEX_ONLY, NUMBERIC_ONLY and so on. > >> > >> WDYT? > >> > >> Best, > >> Jiabao > >> > >> > >> > >>> 2023年12月18日 16:27,Becket Qin 写道: > >>> > >>> Hi Jiabao, > >>> > >>> Please see the reply inline. > >>> > >>> > The MySQL connector is currently in the flink-connector-jdbc > repository > and is not a standalone connector. > Is it too unique to use "mysql" as the configuration option prefix? > >>> > >>> If the intended behavior makes sense to all the supported JDBC drivers, > >> we > >>> can make this a JDBC connector configuration. > >>> > >>> Also, I would like to ask about the difference in behavior between AUTO > >> and > ALWAYS. > It seems that we cannot guarantee the pushing down of all filters to > the > external system under the ALWAYS > mode because not all filters in Flink SQL are supported by the > external > system. > Should we throw an error when encountering a filter that cannot be > >> pushed > down in the ALWAYS mode? > >>> > >>> The idea of AUTO is to do efficiency-aware pushdowns. The source will > >> query > >>> the external system (MySQL, Oracle, SQL Server, etc) first to retrieve > >> the > >>> information of the table. With that information, the source will decide > >>> whether to further push a filter to the external system based on the > >>> efficiency. E.g. only push the indexed fields. In contrast, ALWAYS will > >>> just always push the supported filters to the external system, > regardless >
[jira] [Created] (FLINK-33877) CollectSinkFunctionTest.testConfiguredPortIsUsed fails due to BindException
Jiabao Sun created FLINK-33877: -- Summary: CollectSinkFunctionTest.testConfiguredPortIsUsed fails due to BindException Key: FLINK-33877 URL: https://issues.apache.org/jira/browse/FLINK-33877 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.19.0 Reporter: Jiabao Sun https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55646&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9482 {noformat} Dec 18 17:49:57 17:49:57.241 [ERROR] org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest.testConfiguredPortIsUsed -- Time elapsed: 0.021 s <<< ERROR! Dec 18 17:49:57 java.net.BindException: Address already in use (Bind failed) Dec 18 17:49:57 at java.net.PlainSocketImpl.socketBind(Native Method) Dec 18 17:49:57 at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) Dec 18 17:49:57 at java.net.ServerSocket.bind(ServerSocket.java:390) Dec 18 17:49:57 at java.net.ServerSocket.(ServerSocket.java:252) Dec 18 17:49:57 at org.apache.flink.streaming.api.operators.collect.CollectSinkFunction$ServerThread.(CollectSinkFunction.java:375) Dec 18 17:49:57 at org.apache.flink.streaming.api.operators.collect.CollectSinkFunction$ServerThread.(CollectSinkFunction.java:362) Dec 18 17:49:57 at org.apache.flink.streaming.api.operators.collect.CollectSinkFunction.open(CollectSinkFunction.java:252) Dec 18 17:49:57 at org.apache.flink.streaming.api.operators.collect.utils.CollectSinkFunctionTestWrapper.openFunction(CollectSinkFunctionTestWrapper.java:103) Dec 18 17:49:57 at org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest.testConfiguredPortIsUsed(CollectSinkFunctionTest.java:138) Dec 18 17:49:57 at java.lang.reflect.Method.invoke(Method.java:498) Dec 18 17:49:57 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Dec 18 17:49:57 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Dec 18 17:49:57 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Dec 18 17:49:57 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Dec 18 17:49:57 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re:Re:Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation
Hi, Timo. Sorry for this noise. What do you think about splitting the flip like this? -- Best! Xuyang At 2023-12-15 10:05:32, "Xuyang" wrote: >Hi, Timo, thanks for your advice. > > >I am considering splitting the existing flip into two while leaving the >existing flip (or without). >One of them points to the completion of the operator about window tvf to >support CDC (there are several >small work items, such as window agg, window rank, window join, etc. Due to >time constraints, >the 1.19 version takes priority to complete the window agg). The other points >to the HOP window tvf >supports a size that is a non-integer multiple of the step. Once these two >flips are basically completed >in 1.19, we can consider officially deprecating the old group window agg >syntax in the release note. > > >WDYT? > > > > >-- > >Best! >Xuyang > > > > > >At 2023-12-14 17:51:01, "Timo Walther" wrote: >>Hi Xuyang, >> >> > I'm not spliting this flip is that all of these subtasks like session >>window tvf and cdc support do not change the public interface and the >>public syntax >> >>Given the length of this mailing list discussion and number of involved >>people I would strongly suggest to simplify the FLIP and give it a >>better title to make quicker progress. In general, we all seem to be on >>the same page in what we want. And both session TVF support and the >>deprecation of the legacy group windows has been voted already and >>discussed thouroughly. The FLIP can purely focus on the CDC topic. >> >>Cheers, >>Timo >> >> >>On 14.12.23 08:35, Xuyang wrote: >>> Hi, Timo, Sergey and Lincoln Lee. Thanks for your feedback. >>> >>> In my opinion the FLIP touches too many topics at the same time and should be split into multiple FLIPs. We > should stay focused on what is needed for Flink 2.0. >>> The main goal and topic of this Flip is to align the abilities between the >>> legacy group window agg syntax and the new window tvf syntax, >>> and then we can say that the legacy window syntax will be deprecated. IMO, >>> although there are many misalignments about these two >>> syntaxes, such as session window tvf, cdc support and so on, they are all >>> the subtasks we need to do in this flip. Another reason I'm not >>> spliting this flip is that all of these subtasks like session window tvf >>> and cdc support do not change the public interface and the public >>> syntax, the implements of them will only be in modules table-planner and >>> table-runtime. >>> >>> Can we postpone this discussion? Currently we should focus on user switching to Window TVFs before Flink 2.0. Early fire, late fire and > allow lateness have not exposed through public configuration. It can be > introduced after Flink 2.0 released. >>> >>> >>> Agree with you. This flip will not and should not expose these experimental >>> flink conf to users. I list them in this flip just aims to show the >>> misalignments about these two window syntaxes. >>> >>> >>> Look for your thought. >>> >>> >>> >>> >>> -- >>> >>> Best! >>> Xuyang >>> >>> >>> >>> >>> >>> At 2023-12-13 15:40:16, "Lincoln Lee" wrote: Thanks Xuyang driving this work! It's great that everyone agrees with the work itself in this flip[1]! Regarding whether to split the flip or adjust the scope of this flip, I'd like to share some thoughts: 1. About the title of this flip, what I want to say is that flip-145[2] had marked the legacy group window deprecated in the documentation but the functionality of the new syntax is not aligned with the legacy one. This is not a user-friendly deprecation, so the initiation of this flip, as I understand it, is for the formal deprecation of the legacy window, which requires us to complete the functionality alignment. 2. Agree with Timo that we can process the late-fire/early-fire features separately. These experimental parameters have not been officially opened to users. Considering the workload, we can focus more on this version. 3. I have no objection to splitting this flip if everyone feels that the work included is too much. Regarding the support of session tvf, it seems that the main problem is that this part of the description occupies a large part of the flip, causing some misunderstandings. This is indeed a predetermined task in FLIP-145, just adding more explanation about semantics. In addition, I saw the discussion history in FLINK-24024[3], thanks Sergey for being willing to help driving this work together. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-392%3A+Deprecate+the+Legacy+Group+Window+Aggregation [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function [3] https://issues.apache.org/jira/browse/FLINK-24024 B
[jira] [Created] (FLINK-33878) Many Keyed Operator extends `TableStreamOperator` which is marked without key.
xuyang created FLINK-33878: -- Summary: Many Keyed Operator extends `TableStreamOperator` which is marked without key. Key: FLINK-33878 URL: https://issues.apache.org/jira/browse/FLINK-33878 Project: Flink Issue Type: Technical Debt Components: Table SQL / Runtime Reporter: xuyang Many Keyed Operator like `WindowJoinOperator` and `SlicingWindowOperator` extends `TableStreamOperator` which is marked without key. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources
Thanks Becket for the suggestions, Updated. Please help review it again when you have time. Best, Jiabao > 2023年12月19日 09:06,Becket Qin 写道: > > Hi JIabao, > > Thanks for updating the FLIP. It looks better. Some suggestions / questions: > > 1. In the motivation section: > >> *Currently, Flink Table/SQL does not expose fine-grained control for users >> to control filter pushdown. **However, filter pushdown has some side >> effects, such as additional computational pressure on external >> systems. Moreover, Improper queries can lead to issues such as full table >> scans, which in turn can impact the stability of external systems.* > > This statement sounds like the side effects are there for all the systems, > which is inaccurate. Maybe we can say: > *Currently, Flink Table/SQL does not expose fine-grained control for users > to control filter pushdown. **However, filter pushdown may have side > effects in some cases, **such as additional computational pressure on > external systems. The JDBC source is a typical example of that. If a filter > is pushed down to the database, an expensive full table scan may happen if > the filter involves unindexed columns.* > > 2. Regarding the prefix, usually a prefix is not required for the top level > connector options. This is because the *connector* option is already there. > So >'connector' = 'jdbc', > 'filter.handling.policy' = 'ALWAYS' > is sufficient. > > The prefix is needed when the option is for a 2nd+ level. For example, >'connector' = 'jdbc', >'format' = 'orc', >'orc.some.option' = 'some_value' > In this case, the prefix of "orc" is needed to make it clear this option is > for the format. > > I am guessing that the reason that currently the connector prefix is there > is because the values of this configuration may be different depending on > the connectors. For example, jdbc may have INDEXED_ONLY while MongoDB may > have something else. Personally speaking, I am fine if we do not have a > prefix in this case because users have already specified the connector type > and it is intuitive enough that the option value is for that connector, not > others. > > 3. can we clarify on the following statement: > >> *Introduce the native configuration [prefix].filter.handling.policy in the >> connector.* > > What do you mean by "native configuration"? From what I understand, the > FLIP does the following: > - introduces a new configuration to the JDBC and MongoDB connector. > - Suggests a convention option name if other connectors are going to add an > option for the same purpose. > > Thanks, > > Jiangjie (Becket) Qin > > > > On Mon, Dec 18, 2023 at 5:45 PM Jiabao Sun > wrote: > >> Hi Becket, >> >> The FLIP document[1] has been updated. >> Could you help take a look again? >> >> Thanks, >> Jiabao >> >> [1] >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 >> >> >>> 2023年12月18日 16:53,Becket Qin 写道: >>> >>> Yes, that sounds reasonable to me. We can start with ALWAYS and NEVER, >> and >>> add more policies as needed. >>> >>> Thanks, >>> >>> Jiangjie (Becket) Qin >>> >>> On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun > .invalid> >>> wrote: >>> Thanks Bucket, The jdbc.filter.handling.policy is good to me as it provides sufficient extensibility for future filter pushdown optimizations. However, currently, we don't have an implementation for the AUTO mode, >> and it seems that the AUTO mode can easily be confused with the ALWAYS mode because users don't have the opportunity to MANUALLY decide which >> filters to push down. I suggest that we only introduce the ALWAYS and NEVER modes for now, and we can consider introducing more flexible policies in the future, such as INDEX_ONLY, NUMBERIC_ONLY and so on. WDYT? Best, Jiabao > 2023年12月18日 16:27,Becket Qin 写道: > > Hi Jiabao, > > Please see the reply inline. > > >> The MySQL connector is currently in the flink-connector-jdbc >> repository >> and is not a standalone connector. >> Is it too unique to use "mysql" as the configuration option prefix? > > If the intended behavior makes sense to all the supported JDBC drivers, we > can make this a JDBC connector configuration. > > Also, I would like to ask about the difference in behavior between AUTO and >> ALWAYS. >> It seems that we cannot guarantee the pushing down of all filters to >> the >> external system under the ALWAYS >> mode because not all filters in Flink SQL are supported by the >> external >> system. >> Should we throw an error when encountering a filter that cannot be pushed >> down in the ALWAYS mode? > > The idea of AUTO is to do efficiency-aware pushdowns. The source will query > the external system (MySQL, Oracle, SQL Server, etc) first to retrieve the > info
Re: 退订
Please send email to user-unsubscr...@flink.apache.org if you want to unsubscribe the mail from u...@flink.apache.org, and you can refer [1][2] for more details. 请发送任意内容的邮件到 user-unsubscr...@flink.apache.org 地址来取消订阅来自 u...@flink.apache.org 邮件组的邮件,你可以参考[1][2] 管理你的邮件订阅。 Best, Hang [1] https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8 [2] https://flink.apache.org/community.html#mailing-lists 唐大彪 于2023年12月18日周一 23:44写道: > 退订 >
Re: [DISCUSS] FLIP-400: AsyncScalarFunction for asynchronous scalar function support
Thanks for the helpful comments, Xuyang and Timo. @Timo, @Alan: IIUC, there seems to be something wrong here. Take kafka as > source and mysql as sink as an example. > Although kafka is an append-only source, one of its fields is used as pk > when writing to mysql. If async udx is executed > in an unordered mode, there may be problems with the data in mysql in the > end. In this case, we need to ensure that > the sink-based pk is in order actually. @Xuyang: That's a great point. If some node downstream of my operator cares about ordering, there's no way for it to reconstruct the original ordering of the rows as they were input to my operator. So even if they want to preserve ordering by key, the order in which they see it may already be incorrect. Somehow I thought that maybe the analysis of the changelog mode at a given operator was aware of downstream operations, but it seems not. Clear "no" on this. Changelog semantics make the planner complex and we > need to be careful. Therefore I would strongly suggest we introduce > ORDERED and slowly enable UNORDERED whenever we see a good fit for it in > plans with appropriate planner rules that guard it. @Timo: The better I understand the complexity, the more I agree with this. I would be totally fine with the first version only having ORDERED mode. For a v2, we could attempt to do the next most conservative thing and only allow UNORDERED when the whole graph is in *INSERT *changelog mode. The next best type of optimization might understand what's the key required downstream, and allow breaking the original order only between unrelated keys, but maintaining it between rows of the same key. Of course if the key used downstream is computed in some manner, that makes it all the harder to know this beforehand. So unordering should be fine *within* watermarks. This is also what > watermarks are good for, a trade-off between strict ordering and making > progress. The async operator from DataStream API also supports this if I > remember correctly. However, it assumes a timestamp is present in > StreamRecord on which it can work. But this is not the case within the > SQL engine. *AsyncWaitOperator* and *UnorderedStreamElementQueue* (the implementations I plan on using) seem to support exactly this behavior. I don't think it makes assumptions about the record's timestamp, but just preserves whatever the input order is w.r.t watermarks. I'd be curious to understand the timestamp use in more detail and see if it's required with the mentioned classes. TLDR: Let's focus on ORDERED first. I'm more than happy to start here and we can consider UNORDERED as a followup. Then maybe we consider only INSERT mode graphs and ones where we can solve the watermark constraints. Thanks, Alan On Mon, Dec 18, 2023 at 2:36 AM Timo Walther wrote: > Hi Xuyang and Alan, > > thanks for this productive discussion. > > > Would it make a difference if it were exposed by the explain > > @Alan: I think this is great idea. +1 on exposing the sync/async > behavior thought EXPLAIN. > > > > Is there an easy way to determine if the output of an async function > > would be problematic or not? > > Clear "no" on this. Changelog semantics make the planner complex and we > need to be careful. Therefore I would strongly suggest we introduce > ORDERED and slowly enable UNORDERED whenever we see a good fit for it in > plans with appropriate planner rules that guard it. > > > If the input to the operator is append-only, it seems fine, because > > this implies that each row is effectively independent and ordering is > > unimportant. > > As @Xuyang pointed out, it's not only the input that decides whether > append-only is safe. It's also the subsequent operators in the pipeline. > The example of Xuyang is a good one, when the sink operates in upsert > mode. Append-only source, append-only operators, and append-only sink > are safer. > > However, even in this combination, a row is not fully "independent" > there are still watermarks flowing between rows: > > R(5), W(4), R(3), R(4), R(2), R(1), W(0) > > So unordering should be fine *within* watermarks. This is also what > watermarks are good for, a trade-off between strict ordering and making > progress. The async operator from DataStream API also supports this if I > remember correctly. However, it assumes a timestamp is present in > StreamRecord on which it can work. But this is not the case within the > SQL engine. > > TLDR: Let's focus on ORDERED first. > > If we want to use UNORDERED, I would suggest to check the input operator > for exactly 1 time attribute column. If there is exactly 1 time > attribute column, we could insert it into the StreamRecord and allow > UNORDERED mode. If this condition is not met, we go with ORDERED. > > Regards, > Timo > > > > > On 18.12.23 07:05, Xuyang wrote: > > Hi, Alan and Timo. Thanks for your reply. > >> Would it make a difference if it were exposed by the explain > >> method (the operator havin