[jira] [Created] (FLINK-36202) Remove all deprecated datastream api
Weijie Guo created FLINK-36202: -- Summary: Remove all deprecated datastream api Key: FLINK-36202 URL: https://issues.apache.org/jira/browse/FLINK-36202 Project: Flink Issue Type: Sub-task Components: API / DataStream Affects Versions: 2.0-preview Reporter: Weijie Guo Fix For: 2.0-preview We should remove all deprecated DataStream API in 2.0 release cycle. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] FLIP-477: Amazon SQS Source Connector
Hi Flink Devs, Gentle Reminder for voting on FLIP-477: Amazon SQS Source Connector [1]. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-477 +Amazon+SQS+Source+Connector Regards Saurabh & Abhi On Thu, Aug 22, 2024 at 1:17 AM Aleksandr Pilipenko wrote: > Thank you for driving this! > +1 (non-binding) > > Best, > Aleksandr > > On Wed, 21 Aug 2024 at 15:21, Ahmed Hamdy wrote: > > > Thanks for driving Abhisagar, > > +1 (non-binding) > > > > Best Regards > > Ahmed Hamdy > > > > > > On Wed, 21 Aug 2024 at 09:11, Abhisagar Khatri < > > khatri.abhisaga...@gmail.com> > > wrote: > > > > > Hi Flink Devs, > > > > > > Thank you for all the feedback about FLIP-477: Amazon SQS Source > > Connector > > > [1]. The discussion thread can be found here [2]. > > > > > > The vote will be open for at least 72 hours unless there are any > > objections > > > or insufficient votes. > > > > > > Regards, > > > Abhi & Saurabh > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-477+Amazon+SQS+Source+Connector > > > [2] https://lists.apache.org/thread/p27tj3kzyln1fjqyx2xmg4tt7thoh0sh > > > > > >
Re: [DISCUSS] FLIP-XXX Amazon SQS Source Connector
Hello Saurabh, Thanks for contributing this, I have seen multiple custom implementations of SQS sources, so a community supported version will be a great addition. // .setFailOnError(false) A general callout that this is not ideal, I know we use it elsewhere but we should consider some pluggable error handling at some point. Not as part of this FLIP, but if we can avoid adding this flag here, then I would. // The code registers a checkpoint completion notification callback via *notifyCheckpointComplete* Flink does not guarantee that notifications are actually invoked and successful [1]. Therefore there is a chance this method will not run, and hence we could violate exactly once semantics here. Suggest that we instead/additionally track the SQS read messages within the checkpoint state, we can evict from state once deleted, and possibly prune on startup or skip duplicates. Thoughts? Thanks, Danny [1] https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/common/state/CheckpointListener.html On Tue, Aug 20, 2024 at 10:24 AM Ahmed Hamdy wrote: > Hi Abhisagar and Saurabh > I have created the FLIP page and assigned it FLIP-477[1]. Feel free to > resume with the next steps. > > 1- > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-+477+Amazon+SQS+Source+Connector > Best Regards > Ahmed Hamdy > > > On Tue, 20 Aug 2024 at 06:05, Abhisagar Khatri < > khatri.abhisaga...@gmail.com> > wrote: > > > Hi Flink Devs, > > > > Gentle Reminder for the request. We'd like to ask the PMC/Committers to > > transfer the content from the Amazon SQS Source Connector Google Doc [1] > > and assign a FLIP Number for us, which we can use further for voting. > > We are following the procedure outlined on the Flink Improvement Proposal > > Confluence page [2]. > > > > [1] > > > https://docs.google.com/document/d/1lreo27jNh0LkRs1Mj9B3wj3itrzMa38D4_XGryOIFks/edit?usp=sharing > > [2] > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Process > > > > Regards, > > Abhi & Saurabh > > > > > > On Tue, Aug 13, 2024 at 12:50 PM Saurabh Singh < > saurabhsingh9...@gmail.com> > > wrote: > > > >> Hi Flink Devs, > >> > >> Thanks for all the feedback flink devs. > >> > >> Following the procedure outlined on the Flink Improvement Proposal > >> Confluence page [1], we kindly ask the PMC/Committers to transfer the > >> content from the Amazon SQS Source Connector Google Doc [2] and assign a > >> FLIP Number for us, which we will use for voting. > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Process > >> [2] > >> > https://docs.google.com/document/d/1lreo27jNh0LkRs1Mj9B3wj3itrzMa38D4_XGryOIFks/edit?usp=sharing > >> > >> Regards > >> Saurabh & Abhi > >> > >> > >> On Thu, Aug 8, 2024 at 5:12 PM Saurabh Singh < > saurabhsingh9...@gmail.com> > >> wrote: > >> > >>> Hi Ahmed, > >>> > >>> Yes, you're correct. Currently, we're utilizing the "record emitter" to > >>> send messages into the queue for deletion. However, for the actual > deletion > >>> process, which is dependent on the checkpoints, we've been using the > source > >>> reader class because it allows us to override the > notifyCheckpointComplete > >>> method. > >>> > >>> Regards > >>> Saurabh & Abhi > >>> > >>> On Wed, Aug 7, 2024 at 2:18 PM Ahmed Hamdy > wrote: > >>> > Hi Saurabh > Thanks for addressing, I see the FLIP is in much better state. > Could we specify where we queue messages for deletion, In my opinion > the record emitter is a good place for that where we delete messages > that > are forwarded to the next operator. > Other than that I don't have further comments. > Thanks again for the effort. > > Best Regards > Ahmed Hamdy > > > On Wed, 31 Jul 2024 at 10:34, Saurabh Singh < > saurabhsingh9...@gmail.com> > wrote: > > > Hi Ahmed, > > > > Thank you very much for the detailed, valuable review. Please find > our > > responses below: > > > > > >- In the FLIP you mention the split is going to be 1 sqs Queue, > >does this mean we would support reading from multiple queues? > This is also > >not clear in the implementation of `addSplitsBack` whether we are > planning > >to support multiple sqs topics or not. > > > > *Our current implementation assumes that each source reads from a > > single SQS queue. If you need to read from multiple SQS queues, you > can > > define multiple sources accordingly. We believe this approach is > clearer > > and more organized compared to having a single source switch between > > multiple queues. This design choice is based on weighing the > benefits, but > > we can support multiple queues per source if the need arises.* > > > >- Regarding Client creation, there has been some effort in the > >common `aws-util
[jira] [Created] (FLINK-36203) Have some way to control the polling frequency of GetRecords from SplitFetcher
Abhi Gupta created FLINK-36203: -- Summary: Have some way to control the polling frequency of GetRecords from SplitFetcher Key: FLINK-36203 URL: https://issues.apache.org/jira/browse/FLINK-36203 Project: Flink Issue Type: Improvement Components: Connectors / DynamoDB Reporter: Abhi Gupta There is no polling frequency of GetRecords from the connector today. We should have some way to control how fast / slow we poll from GetRecords -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-XXX Amazon SQS Source Connector
Sorry for being late to the party. I saw your call to vote and looked at the FLIP. First, most of the design is looking really good and it will be good to have another connector integrated into the AWS ecosystem. A couple of questions/remarks: 1) Since we only have 1 split, we should also limit the parallelism of the source accordingly. We could exploit the DynamicParallelismInference to effectively limit it to 1 unless the user explicitly overwrites. 2) If I haven't overlooked something obvious, then your exactly-once strategy will not work. There is unfortunately no guarantee that notifyCheckpointComplete is called at all before a failure happens. So you very likely get duplicate messages. Scanning the SQS documentation, I saw that you can read the SequenceNumber of the message. If you also store the latest number in the checkpoint state of the split, then you can discard all messages during recovery that are smaller. But I have never used SQS, so just take it as an inspiration. Also note that you didn't sketch how you delete messages from the queue. It's very important to only delete those messages that are part of the successful checkpoint. So you can't use PurgeQueue. 3) I wonder if you need to use ReceiveRequestAttemptId. It looks like it may be important for retries. Best, Arvid On Tue, Aug 20, 2024 at 11:24 AM Ahmed Hamdy wrote: > Hi Abhisagar and Saurabh > I have created the FLIP page and assigned it FLIP-477[1]. Feel free to > resume with the next steps. > > 1- > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-+477+Amazon+SQS+Source+Connector > Best Regards > Ahmed Hamdy > > > On Tue, 20 Aug 2024 at 06:05, Abhisagar Khatri < > khatri.abhisaga...@gmail.com> > wrote: > > > Hi Flink Devs, > > > > Gentle Reminder for the request. We'd like to ask the PMC/Committers to > > transfer the content from the Amazon SQS Source Connector Google Doc [1] > > and assign a FLIP Number for us, which we can use further for voting. > > We are following the procedure outlined on the Flink Improvement Proposal > > Confluence page [2]. > > > > [1] > > > https://docs.google.com/document/d/1lreo27jNh0LkRs1Mj9B3wj3itrzMa38D4_XGryOIFks/edit?usp=sharing > > [2] > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Process > > > > Regards, > > Abhi & Saurabh > > > > > > On Tue, Aug 13, 2024 at 12:50 PM Saurabh Singh < > saurabhsingh9...@gmail.com> > > wrote: > > > >> Hi Flink Devs, > >> > >> Thanks for all the feedback flink devs. > >> > >> Following the procedure outlined on the Flink Improvement Proposal > >> Confluence page [1], we kindly ask the PMC/Committers to transfer the > >> content from the Amazon SQS Source Connector Google Doc [2] and assign a > >> FLIP Number for us, which we will use for voting. > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Process > >> [2] > >> > https://docs.google.com/document/d/1lreo27jNh0LkRs1Mj9B3wj3itrzMa38D4_XGryOIFks/edit?usp=sharing > >> > >> Regards > >> Saurabh & Abhi > >> > >> > >> On Thu, Aug 8, 2024 at 5:12 PM Saurabh Singh < > saurabhsingh9...@gmail.com> > >> wrote: > >> > >>> Hi Ahmed, > >>> > >>> Yes, you're correct. Currently, we're utilizing the "record emitter" to > >>> send messages into the queue for deletion. However, for the actual > deletion > >>> process, which is dependent on the checkpoints, we've been using the > source > >>> reader class because it allows us to override the > notifyCheckpointComplete > >>> method. > >>> > >>> Regards > >>> Saurabh & Abhi > >>> > >>> On Wed, Aug 7, 2024 at 2:18 PM Ahmed Hamdy > wrote: > >>> > Hi Saurabh > Thanks for addressing, I see the FLIP is in much better state. > Could we specify where we queue messages for deletion, In my opinion > the record emitter is a good place for that where we delete messages > that > are forwarded to the next operator. > Other than that I don't have further comments. > Thanks again for the effort. > > Best Regards > Ahmed Hamdy > > > On Wed, 31 Jul 2024 at 10:34, Saurabh Singh < > saurabhsingh9...@gmail.com> > wrote: > > > Hi Ahmed, > > > > Thank you very much for the detailed, valuable review. Please find > our > > responses below: > > > > > >- In the FLIP you mention the split is going to be 1 sqs Queue, > >does this mean we would support reading from multiple queues? > This is also > >not clear in the implementation of `addSplitsBack` whether we are > planning > >to support multiple sqs topics or not. > > > > *Our current implementation assumes that each source reads from a > > single SQS queue. If you need to read from multiple SQS queues, you > can > > define multiple sources accordingly. We believe this approach is > clearer > > and more organized compared to having a single source switch between
[jira] [Created] (FLINK-36204) Fix dropping records when consuming from LATEST and having a restore from expired snapshot
Abhi Gupta created FLINK-36204: -- Summary: Fix dropping records when consuming from LATEST and having a restore from expired snapshot Key: FLINK-36204 URL: https://issues.apache.org/jira/browse/FLINK-36204 Project: Flink Issue Type: Improvement Components: Connectors / DynamoDB Reporter: Abhi Gupta When we are restoring from an expired snapshot, and a shard doesn't have a parent in describestream response and we never captured it previously, we will start that shard from LATEST shard iterator type. We should be starting it from the correct starting position -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36205) DDB Streams connector should be able to resolve the stream ARN from given Table name
Hong Liang Teoh created FLINK-36205: --- Summary: DDB Streams connector should be able to resolve the stream ARN from given Table name Key: FLINK-36205 URL: https://issues.apache.org/jira/browse/FLINK-36205 Project: Flink Issue Type: New Feature Components: Connectors / DynamoDB Reporter: Hong Liang Teoh When user disables DDB stream and enables it back, the Stream ARN changes. This makes it such that each time a user disables / enables the DDB Stream, they will need to update their app. We can add a feature to allow users to simply specify the DDB Table, and the source can handle resolving the stream and resume consumption from the new stream seamlessly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36206) Support flink-connector-aws-base to allow custom override configuration
Abhi Gupta created FLINK-36206: -- Summary: Support flink-connector-aws-base to allow custom override configuration Key: FLINK-36206 URL: https://issues.apache.org/jira/browse/FLINK-36206 Project: Flink Issue Type: Improvement Components: Connectors / DynamoDB, Connectors / Kafka Reporter: Abhi Gupta The flink-connector-aws-base in the file: {color:#e8912d}flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/util/AWSClientUtil.java {color} {color:#172b4d}has a configuration to set the client override configuration to default value even if customer supplies a custom override config. We should fix this behaviour to support custom override configurations{color} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Apache Flink CDC Release 3.2.0, release candidate #1
+1 (binding) - verified signatures - verified hashsums - checked Github release tag - built from source code with JDK 1.8 succeeded - checked release notes, all blockers in RC0 have been fixed - run a CDC YAML job that sync data from MySQL to Kafka, the result is expected - reviewed the web PR and left some minor comments Best, Leonard > 2024年8月30日 下午5:24,Qingsheng Ren 写道: > > Hi everyone, > > Please review and vote on the release candidate #1 for the version 3.2.0 of > Apache Flink CDC, as follows: > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > **Release Overview** > > As an overview, the release consists of the following: > a) Flink CDC source release to be deployed to dist.apache.org > b) Maven artifacts to be deployed to the Maven Central Repository > > **Staging Areas to Review** > > The staging areas containing the above mentioned artifacts are as follows, > for your review: > * All artifacts for a) can be found in the corresponding dev repository at > dist.apache.org [1], which are signed with the key with fingerprint > A1BD477F79D036D2C30CA7DBCA8AEEC2F6EB040B [2] > * All artifacts for b) can be found at the Apache Nexus Repository [3] > > Other links for your review: > * JIRA release notes [4] > * Source code tag "release-3.2.0-rc1" with commit hash > 6b9dda39c7066611e3df95db3aa0a81be36cbc0e [5] > * PR for release announcement blog post of Flink CDC 3.2.0 in flink-web [6] > > **Vote Duration** > > The voting time will run for at least 72 hours. > It is adopted by majority approval, with at least 3 PMC affirmative votes. > > Thanks, > Qingsheng > > [1] https://dist.apache.org/repos/dist/dev/flink/flink-cdc-3.2.0-rc1/ > [2] https://dist.apache.org/repos/dist/release/flink/KEYS > [3] https://repository.apache.org/content/repositories/orgapacheflink-1754 > [4] > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354594 > [5] https://github.com/apache/flink-cdc/tree/release-3.2.0-rc1 > [6] https://github.com/apache/flink-web/pull/753
[jira] [Created] (FLINK-36207) Disabling japicmp plugin for deprecated APIs
Matthias Pohl created FLINK-36207: - Summary: Disabling japicmp plugin for deprecated APIs Key: FLINK-36207 URL: https://issues.apache.org/jira/browse/FLINK-36207 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 2.0.0 Reporter: Matthias Pohl The Apache Flink 2.0 release allows for the removal of public API. The japicmp plugin usually checks for these kind of changes. To avoid adding explicit excludes for each change, this Jira issue suggest to disable the API check for APIs that are marked as deprecated. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36208) use ThreadLocalRandom in Flink AbstractID
Sean Sullivan created FLINK-36208: - Summary: use ThreadLocalRandom in Flink AbstractID Key: FLINK-36208 URL: https://issues.apache.org/jira/browse/FLINK-36208 Project: Flink Issue Type: Improvement Components: API / Core Reporter: Sean Sullivan Flink AbstractID currently uses a static instance of java.util.Random Consider using java.util.concurrent.ThreadLocalRandom for improved performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Apache Flink CDC Release 3.2.0, release candidate #1
Thanks Qingsheng for preparing this RC! +1 (non-binding) * confirmed tarball checksum matches * confirmed binaries were compiled with JDK 1.8 * compiled from source and ran newly added test cases * created an end-to-end pipeline job and tested transform + schema evolution cases * tested restoring from checkpoints saved by CDC 3.0.1 and 3.1.1 Regards, Xiqian De : Qingsheng Ren Date : vendredi, 30 août 2024 à 17:26 À : dev Objet : [VOTE] Apache Flink CDC Release 3.2.0, release candidate #1 Hi everyone, Please review and vote on the release candidate #1 for the version 3.2.0 of Apache Flink CDC, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) **Release Overview** As an overview, the release consists of the following: a) Flink CDC source release to be deployed to dist.apache.org b) Maven artifacts to be deployed to the Maven Central Repository **Staging Areas to Review** The staging areas containing the above mentioned artifacts are as follows, for your review: * All artifacts for a) can be found in the corresponding dev repository at dist.apache.org [1], which are signed with the key with fingerprint A1BD477F79D036D2C30CA7DBCA8AEEC2F6EB040B [2] * All artifacts for b) can be found at the Apache Nexus Repository [3] Other links for your review: * JIRA release notes [4] * Source code tag "release-3.2.0-rc1" with commit hash 6b9dda39c7066611e3df95db3aa0a81be36cbc0e [5] * PR for release announcement blog post of Flink CDC 3.2.0 in flink-web [6] **Vote Duration** The voting time will run for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Qingsheng [1] https://dist.apache.org/repos/dist/dev/flink/flink-cdc-3.2.0-rc1/ [2] https://dist.apache.org/repos/dist/release/flink/KEYS [3] https://repository.apache.org/content/repositories/orgapacheflink-1754 [4] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354594 [5] https://github.com/apache/flink-cdc/tree/release-3.2.0-rc1 [6] https://github.com/apache/flink-web/pull/753
[jira] [Created] (FLINK-36209) Reduce some redundant operations in the initialization of KafkaSourceEnumState
xiaochen.zhou created FLINK-36209: - Summary: Reduce some redundant operations in the initialization of KafkaSourceEnumState Key: FLINK-36209 URL: https://issues.apache.org/jira/browse/FLINK-36209 Project: Flink Issue Type: Improvement Reporter: xiaochen.zhou -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36210) Optimize the logic for fetching topic metadata in the TopicPatternSubscriber mode
xiaochen.zhou created FLINK-36210: - Summary: Optimize the logic for fetching topic metadata in the TopicPatternSubscriber mode Key: FLINK-36210 URL: https://issues.apache.org/jira/browse/FLINK-36210 Project: Flink Issue Type: Improvement Reporter: xiaochen.zhou -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36211) Shade kafka related package in Kafka Pipeline connector.
LvYanquan created FLINK-36211: - Summary: Shade kafka related package in Kafka Pipeline connector. Key: FLINK-36211 URL: https://issues.apache.org/jira/browse/FLINK-36211 Project: Flink Issue Type: Improvement Components: Flink CDC Affects Versions: cassandra-3.2.0 Reporter: LvYanquan Fix For: cdc-3.2.0 An issue reported from slack: I'm trying to create a Flink CDC pipeline from MySql to Kafka (Flink CDC 3.1.1, Flink 1.18). When I try to submit the yaml file the job starts but then fails with error: {{{}Caused by: java.lang.NoSuchMethodError: 'void org.apache.flink.streaming.connectors.kafka.internals.metrics.KafkaMetricMutableWrapper.({}}}{{{}[org.apache.flink.cdc.connectors.kafka.shaded.org|http://org.apache.flink.cdc.connectors.kafka.shaded.org/]{}}}{{{}.apache.kafka.common.Metric)'{}}} we should shade {{org.apache.flink.streaming.connectors.kafka to avoid conflict with flink-connector-kafka.}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSSION] Disabling japicmp plugin in master for 2.0
Good point. It makes sense to only exclude deprecated API. I needed to find a workaround for the Scala API. The japicmp plugin seems to have an issue with Scala's @deprecated annotation. But excluding all *scala files (considering that we want to remove all these classes, anyway) should be a good enough workaround [1]. I created FLINK-36207 [2] and the corresponding PR [3] to cover this. Matthias [1] https://cwiki.apache.org/confluence/display/FLINK/2.0+Release#id-2.0Release-BreakingChanges(TargetingtofinishbythePreviewRelease) [2] https://issues.apache.org/jira/browse/FLINK-36207 [3] https://github.com/apache/flink/pull/25285 On Mon, Sep 2, 2024 at 8:11 AM Xintong Song wrote: > Hi Matthias, > > I'm not an expert in japicmp. Is it possible to only disable this for > deprecated APIs? Maybe add an exclusion for @Deprecated? I think that would > reduce the risk of unintended breaking changes to deprecated APIs that are > not yet ready for removal. > > Best, > > Xintong > > > > On Thu, Aug 29, 2024 at 5:52 PM Gabor Somogyi > wrote: > > > Hi Matthias, > > > > I would turn japicmp for 2.0 off because adding a long list of exceptions > > doesn't give any value. > > 1.x and 2.x is not going to be compatible in any way and when 2.x > released, > > then that will be the new > > japicmp baseline (after a heavy migration). > > > > What I see as a potential risk it that we break something which was not > > intended, but fixing those > > hopefully small amount of cases is less effort than maintaining an > endless > > list. > > > > BR, > > G > > > > > > On Thu, Aug 29, 2024 at 11:40 AM Matthias Pohl > wrote: > > > > > Hi everyone, > > > for the 2.0 work, we are expecting to run into public API changes > quite a > > > bit. This would get picked up by the japicmp plugin. The usual way is > to > > > add exclusions to the plugin configuration [1] generating a (presumably > > > long) list of API changes. > > > > > > I'm wondering whether we, instead, would want to disable the plugin [2] > > for > > > 2.0 entirely to lower effort on the contributors side. > > > > > > Best, > > > Matthias > > > > > > [1] https://github.com/apache/flink/blob/master/pom.xml#L2367 > > > [2] https://github.com/apache/flink/blob/master/pom.xml#L170 > > > > > >