[RESULT][VOTE] FLIP-415: Introduce a new join operator to support minibatch
Hi devs, I'm glad to announce that the FLIP-415[1] has been accepted. The voting thread is here[2]. The proposal received 5 approving votes, all of which are as follow: - Lincoln Lee (binding) - liu ron (binding) - Jane Chan (binding) - Benchao Li (binding) - Xuyang (non-binding) And there is no disapproving one. Thanks to all participants for discussion and voting! [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch [2] https://lists.apache.org/thread/xc0c4zsq4qlyot4vcbdxznr9ocnd0pm0
[RESULT][VOTE] FLIP-415: Introduce a new join operator to support minibatch
Hi devs, I'm glad to announce that the FLIP-415[1] has been accepted. The voting thread is here[2]. The proposal received five approving votes, four of which are binding: - Lincoln Lee (binding) - Jane Chan (binding) - Ron liu (binding) - Benchao Li (binding) - Xuyang (non-binding) And there is no disapproving one. Thanks to all participants for discussion and voting! [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch [2] https://lists.apache.org/thread/xc0c4zsq4qlyot4vcbdxznr9ocnd0pm0
[jira] [Created] (FLINK-34232) Config file unexpectedly lacks support for env.java.home
Junrui Li created FLINK-34232: - Summary: Config file unexpectedly lacks support for env.java.home Key: FLINK-34232 URL: https://issues.apache.org/jira/browse/FLINK-34232 Project: Flink Issue Type: Bug Components: API / Core Reporter: Junrui Li Fix For: 1.19.0 We removed the option to set JAVA_HOME in the config file with commit [24091|https://github.com/apache/flink/pull/24091] to improve how we handle standard YAML with BashJavaUtils. But since setting JAVA_HOME is a publicly documented feature, we need to keep it available for users. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34233) HybridShuffleITCase.testHybridSelectiveExchangesRestart failed due to a IllegalStateException
Matthias Pohl created FLINK-34233: - Summary: HybridShuffleITCase.testHybridSelectiveExchangesRestart failed due to a IllegalStateException Key: FLINK-34233 URL: https://issues.apache.org/jira/browse/FLINK-34233 Project: Flink Issue Type: Bug Components: Runtime / Network Affects Versions: 1.19.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56791&view=logs&j=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3&t=712ade8c-ca16-5b76-3acd-14df33bc1cb1&l=8357 {code} Jan 24 02:10:03 02:10:03.582 [ERROR] Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 34.74 s <<< FAILURE! -- in org.apache.flink.test.runtime.HybridShuffleITCase Jan 24 02:10:03 02:10:03.582 [ERROR] org.apache.flink.test.runtime.HybridShuffleITCase.testHybridSelectiveExchangesRestart -- Time elapsed: 3.347 s <<< FAILURE! Jan 24 02:10:03 java.lang.AssertionError: org.apache.flink.runtime.JobException: org.apache.flink.runtime.JobException: Recovery is suppressed by FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=2, backoffTimeMS=0) Jan 24 02:10:03 at org.apache.flink.test.runtime.JobGraphRunningUtil.execute(JobGraphRunningUtil.java:59) Jan 24 02:10:03 at org.apache.flink.test.runtime.BatchShuffleITCaseBase.executeJob(BatchShuffleITCaseBase.java:137) Jan 24 02:10:03 at org.apache.flink.test.runtime.HybridShuffleITCase.testHybridSelectiveExchangesRestart(HybridShuffleITCase.java:91) Jan 24 02:10:03 at java.base/java.lang.reflect.Method.invoke(Method.java:568) Jan 24 02:10:03 at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) Jan 24 02:10:03 at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) Jan 24 02:10:03 at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) Jan 24 02:10:03 at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) Jan 24 02:10:03 at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) Jan 24 02:10:03 at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) Jan 24 02:10:03 at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) Jan 24 02:10:03 at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992) Jan 24 02:10:03 at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) Jan 24 02:10:03 at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) Jan 24 02:10:03 at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) Jan 24 02:10:03 at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) Jan 24 02:10:03 at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) Jan 24 02:10:03 at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) Jan 24 02:10:03 at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276) Jan 24 02:10:03 at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) Jan 24 02:10:03 at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) Jan 24 02:10:03 at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) Jan 24 02:10:03 at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) Jan 24 02:10:03 at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) Jan 24 02:10:03 at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) Jan 24 02:10:03 at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) Jan 24 02:10:03 at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194) Jan 24 02:10:03 at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) Jan 24 02:10:03 at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) Jan 24 02:10:03 at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) Jan 24 02:10:03 at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) Jan 24 02:10:03 at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) Jan 24 02:10:03 Caused by: org.apache.flink.runtime.JobException: org.apache.flink.runtime.JobException: Recovery is suppressed by FixedDelayRestartBackoffTimeStrategy(maxNumberRest
[jira] [Created] (FLINK-34234) Apply ShadeOptionalChecker for flink-shaded
Sergey Nuyanzin created FLINK-34234: --- Summary: Apply ShadeOptionalChecker for flink-shaded Key: FLINK-34234 URL: https://issues.apache.org/jira/browse/FLINK-34234 Project: Flink Issue Type: Bug Components: BuildSystem / Shaded Reporter: Sergey Nuyanzin Assignee: Sergey Nuyanzin As it was found within FLINK-34148 that newer version of shade plugin breaks previous behavior and non shaded artifacts are started being added to flink-shaded deps. The tasks is to apply same check for flink-shaded with help of {{ShadeOptionalChecker}} which is already applied for Flink -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34235) Not using Optional as input arguments in QueryHintsResolver
xuyang created FLINK-34235: -- Summary: Not using Optional as input arguments in QueryHintsResolver Key: FLINK-34235 URL: https://issues.apache.org/jira/browse/FLINK-34235 Project: Flink Issue Type: Technical Debt Components: Table SQL / Planner Reporter: xuyang -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [SUMMARY] Flink 1.19 Release Sync 01/23/2024
Hi everyone, I am currently working on FLIP-415[1], which aims to support minibatch join. This will bring higher performance for regular joins. There is still a task to be merged which has undergone multiple rounds of review. I expect it could be merged on Jan 26th. Therefore, I am +1 with Feng Jin to delay the deadline. [1]https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch Best regards, Xu Shuai Lincoln Lee 于2024年1月24日周三 22:00写道: > > Hi devs, > > I'd like to share some highlights from the release sync on 01/23/2024 > > > *- Feature freeze* *We plan to freeze the feature on Jan 26th. If there's > specific need for an extension, please confirm with RMs by replying this > mail.* > > > *- Features & issues tracking* So far we've had 15 flips been marked > done(some documentation is still in progress), we also ask responsible > contributors to help update the status of the remaining items on the 1.19 > wiki page [1], including *documentation and cross-team testing requirements*, > this will help the release process. > > > *- Blockers* There're performance regression and blocker issues are being > worked on: > https://issues.apache.org/jira/browse/FLINK-34148 > https://issues.apache.org/jira/browse/FLINK-34007 > https://issues.apache.org/jira/browse/FLINK-34225 > Note that test instabilities will be upgraded to blocker if it is newly > introduced. > > *- Sync meeting* (https://meet.google.com/vcx-arzs-trv) > The next release sync is *Jan 30th, 2024*. We'll switch to weekly release > sync. > > [1] https://cwiki.apache.org/confluence/display/FLINK/1.19+Release > > Best, > Yun, Jing, Martijn and Lincoln
[jira] [Created] (FLINK-34236) Evaluate strange unstable build after cleaning up CI machines
Jing Ge created FLINK-34236: --- Summary: Evaluate strange unstable build after cleaning up CI machines Key: FLINK-34236 URL: https://issues.apache.org/jira/browse/FLINK-34236 Project: Flink Issue Type: Improvement Components: Test Infrastructure Reporter: Jing Ge To check if it is one time issue because infra change or not. https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56601&view=logs&j=e9d3d34f-3d15-59f4-0e3e-35067d100dfe&t=5d91035e-8022-55f2-2d4f-ab121508bf7e -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [SUMMARY] Flink 1.19 Release Sync 01/23/2024
Hi Feng, Thanks for letting us know, it is a great feature. +1 to extend the code freeze deadline. Best regards, Jing On Thu, Jan 25, 2024 at 8:08 AM Feng Jin wrote: > Hi everyone, > > Xuyang and I are currently working on FLIP-387[1], which aims to support > named parameters for functions and procedures. > This will make it more convenient for users to utilize functions and > procedures with multiple parameters. > > We have divided the task into four sub-tasks, and we are currently working > on them: > https://issues.apache.org/jira/browse/FLINK-34055 > https://issues.apache.org/jira/browse/FLINK-34056 > https://issues.apache.org/jira/browse/FLINK-34057 > > These tasks have already been developed and reviewed, and we expect them > to be merged today(Jan 25th). > > However, there is still one remaining task: > https://issues.apache.org/jira/browse/FLINK-34058. > I have already completed the necessary development work for this task. It > may still require 2-3 rounds of review before it is finalized. > I anticipate that it will take another 2-3 days to complete. > > > Therefore, I kindly request that we merge the pull request next Monday(Jan > 29th). I apologize if this affects your related schedule. > > [1]. https://issues.apache.org/jira/browse/FLINK-34054 > > > Best regards, > Feng Jin > > On Wed, Jan 24, 2024 at 10:00 PM Lincoln Lee > wrote: > >> Hi devs, >> >> I'd like to share some highlights from the release sync on 01/23/2024 >> >> >> *- Feature freeze* *We plan to freeze the feature on Jan 26th. If there's >> specific need for an extension, please confirm with RMs by replying this >> mail.* >> >> >> *- Features & issues tracking* So far we've had 15 flips been marked >> done(some documentation is still in progress), we also ask responsible >> contributors to help update the status of the remaining items on the 1.19 >> wiki page [1], including *documentation and cross-team testing >> requirements*, >> this will help the release process. >> >> >> *- Blockers* There're performance regression and blocker issues are being >> worked on: >> https://issues.apache.org/jira/browse/FLINK-34148 >> https://issues.apache.org/jira/browse/FLINK-34007 >> https://issues.apache.org/jira/browse/FLINK-34225 >> Note that test instabilities will be upgraded to blocker if it is newly >> introduced. >> >> *- Sync meeting* (https://meet.google.com/vcx-arzs-trv) >> The next release sync is *Jan 30th, 2024*. We'll switch to weekly >> release >> sync. >> >> [1] https://cwiki.apache.org/confluence/display/FLINK/1.19+Release >> >> Best, >> Yun, Jing, Martijn and Lincoln >> >
[jira] [Created] (FLINK-34237) MongoDB connector compile failed with Flink 1.19-SNAPSHOT
Leonard Xu created FLINK-34237: -- Summary: MongoDB connector compile failed with Flink 1.19-SNAPSHOT Key: FLINK-34237 URL: https://issues.apache.org/jira/browse/FLINK-34237 Project: Flink Issue Type: Bug Components: API / Core, Connectors / MongoDB Reporter: Leonard Xu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[VOTE] FLIP-417: Expose JobManagerOperatorMetrics via REST API
Hi Devs, I would like to start a vote on FLIP-417: Expose JobManagerOperatorMetrics via REST API [1] which has been discussed in this thread [2]. The vote will be open for at least 72 hours unless there is an objection or not enough votes. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API [2] https://lists.apache.org/thread/tt0hf6kf5lcxd7g62v9dhpn3z978pxw0 Best, Mason
[DISCUSS] Release new version of Flink's Kafka connector
Hi everyone, The latest version of the Flink Kafka connector that's available is currently v3.0.2, which is compatible with both Flink 1.17 and Flink 1.18. I would like to propose to create a release which is either v3.1, or v4.0 (see below), with compatibility for Flink 1.17 and Flink 1.18. This newer version would contain many improvements [1] [2] like: * FLIP-246 Dynamic Kafka Source * FLIP-288 Dynamic Partition Discovery * Rack Awareness support * Kafka Record support for KafkaSink * Misc bug fixes and CVE issues If there are no objections, I would like to volunteer as release manager. The only thing why I'm not sure if this should be a v3.1 or a v4.0, is because I'm not 100% sure if FLIP-246 introduces incompatible API changes (requiring a new major version), or if the functionality was added in a backwards compatible matter (meaning a new minor version would be sufficient). I'm looping in Hongshun Wang and Leonard Xu to help clarify this. There's also a discussion happening in an open PR [3] on dropping support for Flink 1.18 afterwards (since this PR would add support for RecordEvaluator, which only exists in Flink 1.19). My proposal would be that after either v3.1 or v4.0 is released, we would indeed drop support for Flink 1.18 with that PR and the next Flink Kafka connector would be either v4.0 (if v3.1 is the next release) or v5.0 (if v4.0 is the next release). Best regards, Martijn [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353135 [2] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352917 [3] https://github.com/apache/flink-connector-kafka/pull/76#pullrequestreview-1844645464
Re: [ANNOUNCE] Apache Flink 1.18.1 released
Hi folks, The bug has been fixed and PR at docker-library/official-images has been merged. The official images are available now. Best regards, Jing On Mon, Jan 22, 2024 at 11:39 AM Jing Ge wrote: > Hi folks, > > I am still working on the official images because of the issue > https://issues.apache.org/jira/browse/FLINK-34165. Images under > apache/flink are available. > > Best regards, > Jing > > On Sun, Jan 21, 2024 at 11:06 PM Jing Ge wrote: > >> Thanks Leonard for the feedback! Also thanks @Jark Wu >> @Chesnay >> Schepler and each and everyone who worked closely >> with me for this release. We made it together! >> >> Best regards, >> Jing >> >> On Sun, Jan 21, 2024 at 9:25 AM Leonard Xu wrote: >> >>> Thanks Jing for driving the release, nice work! >>> >>> Thanks all who involved this release! >>> >>> Best, >>> Leonard >>> >>> > 2024年1月20日 上午12:01,Jing Ge 写道: >>> > >>> > The Apache Flink community is very happy to announce the release of >>> Apache >>> > Flink 1.18.1, which is the first bugfix release for the Apache Flink >>> 1.18 >>> > series. >>> > >>> > Apache Flink® is an open-source stream processing framework for >>> > distributed, high-performing, always-available, and accurate data >>> streaming >>> > applications. >>> > >>> > The release is available for download at: >>> > https://flink.apache.org/downloads.html >>> > >>> > Please check out the release blog post for an overview of the >>> improvements >>> > for this bugfix release: >>> > >>> https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/ >>> > >>> > Please note: Users that have state compression should not migrate to >>> 1.18.1 >>> > (nor 1.18.0) due to a critical bug that could lead to data loss. Please >>> > refer to FLINK-34063 for more information. >>> > >>> > The full release notes are available in Jira: >>> > >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353640 >>> > >>> > We would like to thank all contributors of the Apache Flink community >>> who >>> > made this release possible! Special thanks to @Qingsheng Ren @Leonard >>> Xu >>> > @Xintong Song @Matthias Pohl @Martijn Visser for the support during >>> this >>> > release. >>> > >>> > A Jira task series based on the Flink release wiki has been created for >>> > 1.18.1 release. Tasks that need to be done by PMC have been explicitly >>> > created separately. It will be convenient for the release manager to >>> reach >>> > out to PMC for those tasks. Any future patch release could consider >>> cloning >>> > it and follow the standard release process. >>> > https://issues.apache.org/jira/browse/FLINK-33824 >>> > >>> > Feel free to reach out to the release managers (or respond to this >>> thread) >>> > with feedback on the release process. Our goal is to constantly >>> improve the >>> > release process. Feedback on what could be improved or things that >>> didn't >>> > go so well are appreciated. >>> > >>> > Regards, >>> > Jing >>> >>>
[jira] [Created] (FLINK-34238) In streaming mode, redundant exchange nodes can be optimally deleted in some cases
xuyang created FLINK-34238: -- Summary: In streaming mode, redundant exchange nodes can be optimally deleted in some cases Key: FLINK-34238 URL: https://issues.apache.org/jira/browse/FLINK-34238 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Reporter: xuyang -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] FLIP-417: Expose JobManagerOperatorMetrics via REST API
Thanks for the FLIP. +1 (non-binding) Best, Hang Mason Chen 于2024年1月26日周五 04:51写道: > Hi Devs, > > I would like to start a vote on FLIP-417: Expose JobManagerOperatorMetrics > via REST API [1] which has been discussed in this thread [2]. > > The vote will be open for at least 72 hours unless there is an objection or > not enough votes. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API > [2] https://lists.apache.org/thread/tt0hf6kf5lcxd7g62v9dhpn3z978pxw0 > > Best, > Mason >
[jira] [Created] (FLINK-34239) Introduce a deep copy method of SerializerConfig for merging with Table configs in org.apache.flink.table.catalog.DataTypeFactoryImpl
Zhanghao Chen created FLINK-34239: - Summary: Introduce a deep copy method of SerializerConfig for merging with Table configs in org.apache.flink.table.catalog.DataTypeFactoryImpl Key: FLINK-34239 URL: https://issues.apache.org/jira/browse/FLINK-34239 Project: Flink Issue Type: Sub-task Components: API / Core Affects Versions: 1.19.0 Reporter: Zhanghao Chen *Problem* Currently, org.apache.flink.table.catalog.DataTypeFactoryImpl#createSerializerExecutionConfig will create a deep-copy of the SerializerConfig and merge Table config into it. However, the deep copy is done by manully calling the getter and setter methods of SerializerConfig, and is prone to human errors, e.g. missing copying a newly added field in SerializerConfig. *Proposal* Introduce a deep copy method for SerializerConfig and replace the curr impl in org.apache.flink.table.catalog.DataTypeFactoryImpl#createSerializerExecutionConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[NOTICE] master branch cannot compile for now
Hi devs, I merged FLINK-33263[1] this morning (10:16 +8:00), and it based on an old commit which uses older guava version, so currently the master branch cannot compile. Zhanghao has discovered this in FLINK-33264[2], and the hotfix commit has been proposed in the same PR, hopefully we can merge it after CI passes (it may take a few hours). Sorry for the inconvenience. [1] https://github.com/apache/flink/pull/24128 [2] https://github.com/apache/flink/pull/24133 -- Best, Benchao Li
Re:Re: [VOTE] FLIP-417: Expose JobManagerOperatorMetrics via REST API
+1 (non-binding) -- Best! Xuyang 在 2024-01-26 10:12:34,"Hang Ruan" 写道: >Thanks for the FLIP. > >+1 (non-binding) > >Best, >Hang > >Mason Chen 于2024年1月26日周五 04:51写道: > >> Hi Devs, >> >> I would like to start a vote on FLIP-417: Expose JobManagerOperatorMetrics >> via REST API [1] which has been discussed in this thread [2]. >> >> The vote will be open for at least 72 hours unless there is an objection or >> not enough votes. >> >> [1] >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API >> [2] https://lists.apache.org/thread/tt0hf6kf5lcxd7g62v9dhpn3z978pxw0 >> >> Best, >> Mason >>
Re: Re: [VOTE] FLIP-417: Expose JobManagerOperatorMetrics via REST API
+1(binding) Best, Rui On Fri, Jan 26, 2024 at 11:55 AM Xuyang wrote: > +1 (non-binding) > > > -- > > Best! > Xuyang > > > > > > 在 2024-01-26 10:12:34,"Hang Ruan" 写道: > >Thanks for the FLIP. > > > >+1 (non-binding) > > > >Best, > >Hang > > > >Mason Chen 于2024年1月26日周五 04:51写道: > > > >> Hi Devs, > >> > >> I would like to start a vote on FLIP-417: Expose > JobManagerOperatorMetrics > >> via REST API [1] which has been discussed in this thread [2]. > >> > >> The vote will be open for at least 72 hours unless there is an > objection or > >> not enough votes. > >> > >> [1] > >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API > >> [2] https://lists.apache.org/thread/tt0hf6kf5lcxd7g62v9dhpn3z978pxw0 > >> > >> Best, > >> Mason > >> >
[jira] [Created] (FLINK-34240) The example of sliding windows with offset in documentation is not correct
Weijie Guo created FLINK-34240: -- Summary: The example of sliding windows with offset in documentation is not correct Key: FLINK-34240 URL: https://issues.apache.org/jira/browse/FLINK-34240 Project: Flink Issue Type: Bug Components: Documentation Reporter: Weijie Guo Assignee: Weijie Guo In documentation of windows, we have the following example code: {code:java} // sliding processing-time windows offset by -8 hours input .keyBy() .window(SlidingProcessingTimeWindows.of(Time.hours(12), Time.hours(1), Time.hours(-8))) .(); {code} Unfortunately, it will raise as the absolute value of offset must be less than the slide. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction
Hi Weijie, Thanks for raising discussions about the new DataStream API. I have a few questions about the content of the FLIP. 1. Will we provide any API to support choosing which input to consume between the two inputs of TwoInputStreamProcessFunction? It would be helpful in online machine learning cases, where a process function needs to receive the first machine learning model before it can start predictions on input data. Similar requirements might also exist in Flink CEP, where a rule set needs to be consumed by the process function first before it can start matching the event stream against CEP patterns. 2. A typo might exist in the current FLIP describing the API to generate a global stream, as I can see either global() or coalesce() in different places of the FLIP. These two methods might need to be unified into one method. 3. The order of parameters in the current ProcessFunction is (record, context, output), while this FLIP proposes to change the order into (record, output, context). Is there any reason to make this change? 4. Why does this FLIP propose to use connectAndProcess() instead of connect() (+ keyBy()) + process()? The latter looks simpler to me. Looking forward to discussing these questions with you. Best regards, Yunfeng Zhou On Tue, Dec 26, 2023 at 2:44 PM weijie guo wrote: > > Hi devs, > > > I'd like to start a discussion about FLIP-409: DataStream V2 Building > Blocks: DataStream, Partitioning and ProcessFunction [1]. > > > As the first sub-FLIP for DataStream API V2, we'd like to discuss and > try to answer some of the most fundamental questions in stream > processing: > >1. What kinds of data streams do we have? >2. How to partition data over the streams? >3. How to define a processing on the data stream? > > The answer to these questions involve three core concepts: DataStream, > Partitioning and ProcessFunction. In this FLIP, we will discuss the > definitions and related API primitives of these concepts in detail. > > > You can find more details in FLIP-409 [1]. This sub-FLIP is at the > heart of the entire DataStream API V2, and its relationship with other > sub-FLIPs can be found in the umbrella FLIP [2]. > > > Looking forward to hearing from you, thanks! > > > Best regards, > > Weijie > > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction > > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2
Re: [DISCUSS] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2
Hi Weijie, Thanks for introducing this FLIP! I have a few questions about the designs proposed. 1. Would it be better to have all XXXPartitionStream classes implement ProcessConfigurable, instead of defining both XXXPartitionStream and ProcessConfigurableAndXXXPartitionStream? I wonder whether users would need to operate on a non-configurable PartitionStream. 2. The name "ProcessConfigurable" seems a little ambiguous to me. Will there be classes other than XXXPartitionStream that implement this interface? Will "Process" be accurate enough to describe PartitionStream and those classes? 3. Apart from the detailed withConfigFoo(foo)/withConfigBar(bar) methods, would it be better to also add a general withConfig(configKey, configValue) method to the ProcessConfigurable interface? Adding a method for each configuration might harm the readability and compatibility of configurations. Looking forward to your response. Best regards, Yunfeng Zhou On Tue, Dec 26, 2023 at 2:47 PM weijie guo wrote: > > Hi devs, > > > I'd like to start a discussion about FLIP-410: Config, Context and > Processing Timer Service of DataStream API V2 [1]. This is the second > sub-FLIP of DataStream API V2. > > > In FLIP-409 [2], we have defined the most basic primitive of > DataStream V2. On this basis, this FLIP will further answer several > important questions closely related to it: > >1. >How to configure the processing over the datastreams, such as > setting the parallelism. >2. >How to get access to the runtime contextual information and > services from inside the process functions. >3. How to work with processing-time timers. > > You can find more details in this FLIP. Its relationship with other > sub-FLIPs can be found in the umbrella FLIP > [3]. > > > Looking forward to hearing from you, thanks! > > > Best regards, > > Weijie > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2 > > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction > > [3] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2
Re: Re: [DISCUSS] FLIP-331: Support EndOfStreamWindows and isOutputOnEOF operator attribute to optimize task deployment
Thanks Xuannan for the update! +1 (binding) On Wed, Jan 10, 2024 at 5:54 PM Xuannan Su wrote: > Hi all, > > After several rounds of offline discussions with Xingtong and Jinhao, > we have decided to narrow the scope of the FLIP. It will now focus on > introducing OperatorAttributes that indicate whether an operator emits > records only after inputs have ended. We will also use the attribute > to optimize task scheduling for better resource utilization. Setting > the backlog status and optimizing the operator implementation during > the backlog will be deferred to future work. > > In addition to the change above, we also make the following changes to > the FLIP to address the problems mentioned by Dong: > - Public interfaces are updated to reuse the GlobalWindows. > - Instead of making all outputs of the upstream operators of the > "isOutputOnlyAfterEndOfStream=true" operator blocking, we only make > the output of the operator with "isOutputOnlyAfterEndOfStream=true" > blocking. This can prevent the second problem Dong mentioned. In the > future, we may introduce an extra OperatorAttributes to indicate if an > operator has any side output. > > I would greatly appreciate any comment or feedback you may have on the > updated FLIP. > > Best regards, > Xuannan > > On Tue, Sep 26, 2023 at 11:24 AM Dong Lin wrote: > > > > Hi all, > > > > Thanks for the review! > > > > Becket and I discussed this FLIP offline and we agreed on several things > > that need to be improved with this FLIP. I will summarize our discussion > > with the problems and TODOs. We will update the FLIP and let you know > once > > the FLIP is ready for review again. > > > > 1) Investigate whether it is possible to update the existing > GlobalWindows > > in a backward-compatible way and re-use it for the same purpose > > as EndOfStreamWindows, without introducing EndOfStreamWindows as a new > > class. > > > > Note that GlobalWindows#getDefaultTrigger returns a NeverTrigger instance > > which will not trigger window's computation even on end-of-inputs. We > will > > need to investigate its existing usage and see if we can re-use it in a > > backward-compatible way. > > > > 2) Let JM know whether any operator in the upstream of the operator with > > "isOutputOnEOF=true" will emit output via any side channel. The FLIP > should > > update the execution mode of those operators *only if* all outputs from > > those operators are emitted only at the end of input. > > > > More specifically, the upstream operator might involve a user-defined > > operator that might emit output directly to an external service, where > the > > emission operation is not explicitly expressed as an operator's output > edge > > and thus not visible to JM. Similarly, it is also possible for the > > user-defined operator to register a timer > > via InternalTimerService#registerEventTimeTimer and emit output to an > > external service inside Triggerable#onEventTime. There is a chance that > > users still need related logic to output data in real-time, even if the > > downstream operators have isOutputOnEOF=true. > > > > One possible solution to address this problem is to add an extra > > OperatorAttribute to specify whether this operator might output records > in > > such a way that does not go through operator's output (e.g. side output). > > Then the JM can safely enable the runtime optimization currently > described > > in the FLIP when there is no such operator. > > > > 3) Create a follow-up FLIP that allows users to specify whether a source > > with Boundedness=bounded should have isProcessingBacklog=true. > > > > This capability would effectively introduce a 3rd strategy to set backlog > > status (in addition to FLIP-309 and FLIP-328). It might be useful to note > > that, even though the data in bounded sources are backlog data in most > > practical use-cases, it is not necessarily true. For example, users might > > want to start a Flink job to consume real-time data from a Kafka topic > and > > specify that the job stops after 24 hours, which means the source is > > technically bounded while the data is fresh/real-time. > > > > This capability is more generic and can cover more use-case than > > EndOfStreamWindows. On the other hand, EndOfStreamWindows will still be > > useful in cases where users already need to specify this window assigner > in > > a DataStream program, without bothering users to decide whether it is > safe > > to treat data in a bounded source as backlog data. > > > > > > Regards, > > Dong > > > > > > > > > > > > > > On Mon, Sep 18, 2023 at 2:56 PM Yuxin Tan > wrote: > > > > > Hi, Dong, > > > Thanks for your efforts. > > > > > > +1 to this proposal, > > > I believe this will improve the performance in some mixture > circumstances > > > of bounded and unbounded workloads. > > > > > > Best, > > > Yuxin > > > > > > > > > Xintong Song 于2023年9月18日周一 10:56写道: > > > > > > > Thanks for addressing my comments, Dong. > > > > > > > > LGTM. > > > > > > > > Bes
Re: [DISCUSS] Release new version of Flink's Kafka connector
Hi Martin, Thank you for your invitation. The idea of adding new improvements to either version V3.1 or V4.0 sounds appealing to me. > if the functionality was added in a > backwards compatible matter (meaning a new minor version would be > sufficient). It seems there is no backwards compatible between new Interface KafkaDynamicSource and privious Interface KafkaSource. As this FLIP shows, the source state is incompatible between KafkaSource and DynamicKafkaSource so it is recommended to reset all state or reset partial state by setting a different uid and starting the application from nonrestore state.[1] However, it will not influence the current job in the previous version. For Datastream jobs, it seems there will be no impact because they will not call the new interface unless changes are made in the code. For table jobs, the new FLIP-246 DynamicKafkaSource is not yet being used. We should pay more attention if we decide to migrate to the new DynamicKafkaSource for table API later on. Yours Hongshun [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217389320 On Fri, Jan 26, 2024 at 6:16 AM Martijn Visser wrote: > Hi everyone, > > The latest version of the Flink Kafka connector that's available is > currently v3.0.2, which is compatible with both Flink 1.17 and Flink 1.18. > > I would like to propose to create a release which is either v3.1, or v4.0 > (see below), with compatibility for Flink 1.17 and Flink 1.18. This newer > version would contain many improvements [1] [2] like: > > * FLIP-246 Dynamic Kafka Source > * FLIP-288 Dynamic Partition Discovery > * Rack Awareness support > * Kafka Record support for KafkaSink > * Misc bug fixes and CVE issues > > If there are no objections, I would like to volunteer as release manager. > > The only thing why I'm not sure if this should be a v3.1 or a v4.0, is > because I'm not 100% sure if FLIP-246 introduces incompatible API changes > (requiring a new major version), or if the functionality was added in a > backwards compatible matter (meaning a new minor version would be > sufficient). I'm looping in Hongshun Wang and Leonard Xu to help clarify > this. > > There's also a discussion happening in an open PR [3] on dropping support > for Flink 1.18 afterwards (since this PR would add support for > RecordEvaluator, which only exists in Flink 1.19). My proposal would be > that after either v3.1 or v4.0 is released, we would indeed drop support > for Flink 1.18 with that PR and the next Flink Kafka connector would be > either v4.0 (if v3.1 is the next release) or v5.0 (if v4.0 is the next > release). > > Best regards, > > Martijn > > [1] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353135 > [2] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352917 > [3] > > https://github.com/apache/flink-connector-kafka/pull/76#pullrequestreview-1844645464 >
Re: [DISCUSS] Release new version of Flink's Kafka connector
Hi Martijn, +1 no objections, thanks for volunteering. I'll definitely help verify the rc when it becomes available. I think FLIP-288 (I assume you meant this) doesn't introduce incompatible changes since the implementation should be state compatible as well as the default changes should be transparent to the user and actually correct possibly erroneous behavior. Also, the RecordEvaluator was released with Flink 1.18 (I assume you meant this). Given the above, I'm +1 for a v3.1 release that only supports 1.18 while we support patches on v3.0 that supports 1.17. This logic is also inline with what was agreed upon for external connector versioning [1]. [1] https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development Best, Mason On Thu, Jan 25, 2024 at 2:16 PM Martijn Visser wrote: > Hi everyone, > > The latest version of the Flink Kafka connector that's available is > currently v3.0.2, which is compatible with both Flink 1.17 and Flink 1.18. > > I would like to propose to create a release which is either v3.1, or v4.0 > (see below), with compatibility for Flink 1.17 and Flink 1.18. This newer > version would contain many improvements [1] [2] like: > > * FLIP-246 Dynamic Kafka Source > * FLIP-288 Dynamic Partition Discovery > * Rack Awareness support > * Kafka Record support for KafkaSink > * Misc bug fixes and CVE issues > > If there are no objections, I would like to volunteer as release manager. > > The only thing why I'm not sure if this should be a v3.1 or a v4.0, is > because I'm not 100% sure if FLIP-246 introduces incompatible API changes > (requiring a new major version), or if the functionality was added in a > backwards compatible matter (meaning a new minor version would be > sufficient). I'm looping in Hongshun Wang and Leonard Xu to help clarify > this. > > There's also a discussion happening in an open PR [3] on dropping support > for Flink 1.18 afterwards (since this PR would add support for > RecordEvaluator, which only exists in Flink 1.19). My proposal would be > that after either v3.1 or v4.0 is released, we would indeed drop support > for Flink 1.18 with that PR and the next Flink Kafka connector would be > either v4.0 (if v3.1 is the next release) or v5.0 (if v4.0 is the next > release). > > Best regards, > > Martijn > > [1] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353135 > [2] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352917 > [3] > > https://github.com/apache/flink-connector-kafka/pull/76#pullrequestreview-1844645464 >