[jira] [Created] (FLINK-36108) Wait for state download on cancellation to enforce cleanup
Piotr Nowojski created FLINK-36108: -- Summary: Wait for state download on cancellation to enforce cleanup Key: FLINK-36108 URL: https://issues.apache.org/jira/browse/FLINK-36108 Project: Flink Issue Type: Sub-task Reporter: Piotr Nowojski Assignee: Piotr Nowojski -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36109) Improve FlinkStateSnapshot resource labels for efficient filtering
Gyula Fora created FLINK-36109: -- Summary: Improve FlinkStateSnapshot resource labels for efficient filtering Key: FLINK-36109 URL: https://issues.apache.org/jira/browse/FLINK-36109 Project: Flink Issue Type: Sub-task Components: Kubernetes Operator Reporter: Gyula Fora Fix For: kubernetes-operator-1.10.0 The new FlinkStateSnapshot CRs currently have the following label only: ``` snapshot.type: XY ``` We need to investigate if we can efficiently filter based on snapshot resource type (savepoint/checkpoint), target job etc. I am not sure if we can do this based on the spec directly or we need labels for it. Also we should probably rename the snapshot.type label to trigger.type to be more straightforward. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36110) Periodic save points trigger continuously
Gyula Fora created FLINK-36110: -- Summary: Periodic save points trigger continuously Key: FLINK-36110 URL: https://issues.apache.org/jira/browse/FLINK-36110 Project: Flink Issue Type: Sub-task Components: Kubernetes Operator Reporter: Gyula Fora Fix For: kubernetes-operator-1.10.0 Attachments: image-2024-08-20-09-50-38-660.png Periodic savepoints seem to trigger continuously with the given config: ``` kubernetes.operator.periodic.savepoint.interval: 30s ``` we get !image-2024-08-20-09-50-38-660.png|width=693,height=81! -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-XXX Amazon SQS Source Connector
Hi Abhisagar and Saurabh I have created the FLIP page and assigned it FLIP-477[1]. Feel free to resume with the next steps. 1- https://cwiki.apache.org/confluence/display/FLINK/FLIP-+477+Amazon+SQS+Source+Connector Best Regards Ahmed Hamdy On Tue, 20 Aug 2024 at 06:05, Abhisagar Khatri wrote: > Hi Flink Devs, > > Gentle Reminder for the request. We'd like to ask the PMC/Committers to > transfer the content from the Amazon SQS Source Connector Google Doc [1] > and assign a FLIP Number for us, which we can use further for voting. > We are following the procedure outlined on the Flink Improvement Proposal > Confluence page [2]. > > [1] > https://docs.google.com/document/d/1lreo27jNh0LkRs1Mj9B3wj3itrzMa38D4_XGryOIFks/edit?usp=sharing > [2] > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Process > > Regards, > Abhi & Saurabh > > > On Tue, Aug 13, 2024 at 12:50 PM Saurabh Singh > wrote: > >> Hi Flink Devs, >> >> Thanks for all the feedback flink devs. >> >> Following the procedure outlined on the Flink Improvement Proposal >> Confluence page [1], we kindly ask the PMC/Committers to transfer the >> content from the Amazon SQS Source Connector Google Doc [2] and assign a >> FLIP Number for us, which we will use for voting. >> >> [1] >> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Process >> [2] >> https://docs.google.com/document/d/1lreo27jNh0LkRs1Mj9B3wj3itrzMa38D4_XGryOIFks/edit?usp=sharing >> >> Regards >> Saurabh & Abhi >> >> >> On Thu, Aug 8, 2024 at 5:12 PM Saurabh Singh >> wrote: >> >>> Hi Ahmed, >>> >>> Yes, you're correct. Currently, we're utilizing the "record emitter" to >>> send messages into the queue for deletion. However, for the actual deletion >>> process, which is dependent on the checkpoints, we've been using the source >>> reader class because it allows us to override the notifyCheckpointComplete >>> method. >>> >>> Regards >>> Saurabh & Abhi >>> >>> On Wed, Aug 7, 2024 at 2:18 PM Ahmed Hamdy wrote: >>> Hi Saurabh Thanks for addressing, I see the FLIP is in much better state. Could we specify where we queue messages for deletion, In my opinion the record emitter is a good place for that where we delete messages that are forwarded to the next operator. Other than that I don't have further comments. Thanks again for the effort. Best Regards Ahmed Hamdy On Wed, 31 Jul 2024 at 10:34, Saurabh Singh wrote: > Hi Ahmed, > > Thank you very much for the detailed, valuable review. Please find our > responses below: > > >- In the FLIP you mention the split is going to be 1 sqs Queue, >does this mean we would support reading from multiple queues? This is > also >not clear in the implementation of `addSplitsBack` whether we are > planning >to support multiple sqs topics or not. > > *Our current implementation assumes that each source reads from a > single SQS queue. If you need to read from multiple SQS queues, you can > define multiple sources accordingly. We believe this approach is clearer > and more organized compared to having a single source switch between > multiple queues. This design choice is based on weighing the benefits, but > we can support multiple queues per source if the need arises.* > >- Regarding Client creation, there has been some effort in the >common `aws-util` module like createAwsSyncClient, we should reuse > that for >`SqsClient` creation. > >*Thank you for bringing this to our attention. Yes, we will >utilize the existing createClient methods available in the libraries. > Our >goal is to avoid any code duplication on our end.* > > >- On the same point for clients, Is there a reason the FLIP >suggests async clients? sync clients have proven more stable and the > source >threading model already guarantees no blocking by sync clients. > > *We were not aware of this, and we have been using async clients for > our in-house use cases. However, since we already have sync clients in the > aws-util that ensure no blocking, we are in a good position. We will use > these sync clients during our development and testing efforts, and we will > share the results and keep the community updated.* > >- On mentioning threading, the FLIP doesn’t mention the fetcher >manager. Is it going to be `SingleThreadFetcherManager`? Would it be > better >to make the source reader extend the SingleThreadedMultiplexReaderBase > or >are we going to implement a more simple version? > > *Yes, we are considering implementing > SingleThreadMultiplexSourceReaderBase for the Reader. We have included the > implementation snippet in the FLIP for r
[jira] [Created] (FLINK-36111) fix: adjust MultiTableCommittableChannelComputer Topology name
MOBIN created FLINK-36111: - Summary: fix: adjust MultiTableCommittableChannelComputer Topology name Key: FLINK-36111 URL: https://issues.apache.org/jira/browse/FLINK-36111 Project: Flink Issue Type: Bug Components: Flink CDC Reporter: MOBIN Attachments: image-2024-08-20-17-46-28-115.png, image-2024-08-20-17-46-53-781.png before fix: !image-2024-08-20-17-46-28-115.png|width=759,height=173! after fix: !image-2024-08-20-17-46-53-781.png|width=578,height=150! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36112) Add Support for CreateFlag.NO_LOCAL_WRITE in FLINK on YARN's File Creation to Manage Disk Space and Network Load in Labeled YARN Nodes
liang yu created FLINK-36112: Summary: Add Support for CreateFlag.NO_LOCAL_WRITE in FLINK on YARN's File Creation to Manage Disk Space and Network Load in Labeled YARN Nodes Key: FLINK-36112 URL: https://issues.apache.org/jira/browse/FLINK-36112 Project: Flink Issue Type: Improvement Reporter: liang yu {*}Description{*}: I am currently using Apache Flink to write files into Hadoop. The Flink application runs on a labeled YARN queue. During operation, it has been observed that the local disks on these labeled nodes get filled up quickly, and the network load is significantly high. This issue arises because Hadoop prioritizes writing files to the local node first, and the number of these labeled nodes is quite limited. {*}Problem{*}: The current behavior leads to inefficient disk space utilization and high network traffic on these few labeled nodes, which could potentially affect the performance and reliability of the application. As shown in the picture, the host I circled have a average net_bytes_sent speed 1.2GB/s while the others are just 50MB/s, this imbalance in network and disk space nearly destroyed the whole cluster. {*}Implementation{*}: The implementation would involve adding a method of FileSystem.class to support the {{CreateFlag.NO_LOCAL_WRITE}} when we try to create a new file through HadoopFileSystem.create() API. What's more, I modify the code of FileSink class so that we can choose to enable no_local_write or disable this feature. This will provide flexibility to Flink running in labeled Yarn queues to opt for non-local writes when necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36113) Condition 'transform instanceof PhysicalTransformation' is always 'false'
tiancx created FLINK-36113: -- Summary: Condition 'transform instanceof PhysicalTransformation' is always 'false' Key: FLINK-36113 URL: https://issues.apache.org/jira/browse/FLINK-36113 Project: Flink Issue Type: Improvement Environment: master Reporter: tiancx Attachments: Snipaste_2024-08-21_01-29-52.png Judgment optimization: The legacyTransform method in the StreamGraphGenerator class has a judgment: the transform instance of PhysicalTransformation is always false {code:java} //代码占位符 private Collection legacyTransform(Transformation transform) { Collection transformedIds; if (transform instanceof FeedbackTransformation) { transformedIds = transformFeedback((FeedbackTransformation) transform); } else if (transform instanceof CoFeedbackTransformation) { transformedIds = transformCoFeedback((CoFeedbackTransformation) transform); } else if (transform instanceof SourceTransformationWrapper) { transformedIds = transform(((SourceTransformationWrapper) transform).getInput()); } else { throw new IllegalStateException("Unknown transformation: " + transform); } if (transform.getBufferTimeout() >= 0) { streamGraph.setBufferTimeout(transform.getId(), transform.getBufferTimeout()); } else { streamGraph.setBufferTimeout(transform.getId(), getBufferTimeout()); } if (transform.getUid() != null) { streamGraph.setTransformationUID(transform.getId(), transform.getUid()); } if (transform.getUserProvidedNodeHash() != null) { streamGraph.setTransformationUserHash( transform.getId(), transform.getUserProvidedNodeHash()); } if (!streamGraph.getExecutionConfig().hasAutoGeneratedUIDsEnabled()) { if (transform instanceof PhysicalTransformation && transform.getUserProvidedNodeHash() == null && transform.getUid() == null) { throw new IllegalStateException( "Auto generated UIDs have been disabled " + "but no UID or hash has been assigned to operator " + transform.getName()); } } if (transform.getMinResources() != null && transform.getPreferredResources() != null) { streamGraph.setResources( transform.getId(), transform.getMinResources(), transform.getPreferredResources()); } streamGraph.setManagedMemoryUseCaseWeights( transform.getId(), transform.getManagedMemoryOperatorScopeUseCaseWeights(), transform.getManagedMemorySlotScopeUseCases()); return transformedIds; } {code} I'm willing to optimize this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36114) Schema registry should block when handling existing requests
yux created FLINK-36114: --- Summary: Schema registry should block when handling existing requests Key: FLINK-36114 URL: https://issues.apache.org/jira/browse/FLINK-36114 Project: Flink Issue Type: Improvement Components: Flink CDC Reporter: yux Currently, SchemaRegistry asynchronously receives schema change requests from SchemaOperator, and results of multiple requests might got mixed up together, causing incorrect logic flow in multiple parallelism cases. Changing SchemaRegistry's behavior to accept requests in serial should resolve this problem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36115) Allow to scan newly table DDL during incremental reading stage.
LvYanquan created FLINK-36115: - Summary: Allow to scan newly table DDL during incremental reading stage. Key: FLINK-36115 URL: https://issues.apache.org/jira/browse/FLINK-36115 Project: Flink Issue Type: Bug Components: Flink CDC Affects Versions: cdc-3.2.0 Reporter: LvYanquan Fix For: cdc-3.2.0 Attachments: image-2024-08-21-10-06-17-758.png Currently, MySQL pipeline source will determine all captured tables before building MySqlDataSource. However, this will lead to Ignore of the create table statement for the new table. !image-2024-08-21-10-06-17-758.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[VOTE] FLIP-473: Introduce New SQL Operators Based on Asynchronous State APIs
Hi, everyone. I would like to start a vote on FLIP-473: Introduce New SQL Operators Based on Asynchronous State APIs [1]. The discussion thread can be found here [2]. The vote will be open for at least 72 hours unless there are any objections or insufficient votes. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-473+Introduce+New+SQL+Operators+Based+on+Asynchronous+State+APIs [2] https://lists.apache.org/thread/k6x03x7vjtn3gl1vrknkx8zvyn319bk9 -- Best! Xuyang
Re: [VOTE] FLIP-473: Introduce New SQL Operators Based on Asynchronous State APIs
+1 (binding) Best, Zakelly On Wed, Aug 21, 2024 at 10:33 AM Xuyang wrote: > Hi, everyone. > > I would like to start a vote on FLIP-473: Introduce New SQL Operators Based > > on Asynchronous State APIs [1]. The discussion thread can be found here > [2]. > > The vote will be open for at least 72 hours unless there are any objections > > or insufficient votes. > > > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-473+Introduce+New+SQL+Operators+Based+on+Asynchronous+State+APIs > > [2] https://lists.apache.org/thread/k6x03x7vjtn3gl1vrknkx8zvyn319bk9 > > > > > -- > > Best! > Xuyang
Re: Re: [VOTE] FLIP-455: Declare async state processing and checkpoint the in-flight requests
+1 (binding) Best, Zakelly On Thu, Aug 15, 2024 at 9:02 PM Hangxiang Yu wrote: > +1 (binding) > > Thanks for driving this. > > On Thu, Aug 15, 2024 at 8:42 PM Ahmed Hamdy wrote: > > > +1 (non-binding) > > Thanks for driving. > > Best Regards > > Ahmed Hamdy > > > > > > On Wed, 14 Aug 2024 at 07:33, Xuyang wrote: > > > > > +1 (non-binding) > > > > > > > > > -- > > > > > > Best! > > > Xuyang > > > > > > > > > > > > > > > > > > 在 2024-08-14 11:04:18,"Yuepeng Pan" 写道: > > > > > > > > > > > > > > > >+1 (non-binding) > > > >Thanks for driving it ! > > > > > > > > > > > >Best regards, > > > > > > > >Yuepeng Pan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >在 2024-08-14 10:32:03,"Yanfei Lei" 写道: > > > >>Hi Zakelly, > > > >> > > > >>Thanks for driving this, +1. > > > >> > > > >>Zakelly Lan 于2024年8月13日周二 13:42写道: > > > >>> > > > >>> Hi devs, > > > >>> > > > >>> I'd like to start a vote on the FLIP-455: Declare async state > > > processing > > > >>> and checkpoint the in-flight requests[1]. The discussion thread is > > > here [2]. > > > >>> > > > >>> The vote will be open for at least 72 hours unless there is an > > > objection or > > > >>> insufficient votes. > > > >>> > > > >>> [1] https://cwiki.apache.org/confluence/x/C4owEg > > > >>> [2] > https://lists.apache.org/thread/9ccxkbf7ww8dscybfzh9p8dwrdy3olpp > > > >>> > > > >>> > > > >>> Best, > > > >>> Zakelly > > > >> > > > >> > > > >> > > > >>-- > > > >>Best, > > > >>Yanfei > > > > > > > > -- > Best, > Hangxiang. >