[jira] [Created] (FLINK-36513) A lot of CI failures are caused by Install cert-manager
Rui Fan created FLINK-36513: --- Summary: A lot of CI failures are caused by Install cert-manager Key: FLINK-36513 URL: https://issues.apache.org/jira/browse/FLINK-36513 Project: Flink Issue Type: Technical Debt Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.9.0 Reporter: Rui Fan Assignee: Rui Fan Attachments: image-2024-10-12-10-30-38-781.png A lot of CI failures are caused by Install cert-manager, such as: [https://github.com/apache/flink-kubernetes-operator/actions/runs/11292388626/job/31408603383] [https://github.com/apache/flink-kubernetes-operator/actions/runs/11294831791/job/31436330397] h1. Root cause: I checked the raw log[1], the failure reason is : _Unable to connect to the server: dial tcp 140.82.113.3:443: i/o timeout._ !image-2024-10-12-10-30-38-781.png|width=2702,height=982! CI code: [https://github.com/apache/flink-kubernetes-operator/blob/d2c01737c745979c6aadb670334565ee11aa2f4a/.github/workflows/ci.yml#L227] It needs to download cert-manager.yaml from github, and 140.82.113.3:443 is the github ip+port. So download cert-manager.yaml is the root cause. h1. Solution: * Solution1: Introducing retry mechanism ** Download cert-manager.yaml first with retry mechanism ** Then kubectl apply -f local file * Solution2: Put the cert-manager.yaml on the flink-kubernetes-operator repo directly ** I saw it's a fixed version, so the cert-manager.yaml is immutable IIUC. [1] https://productionresultssa9.blob.core.windows.net/actions-results/1c3ad627-91b6-4db3-a9ad-453109617470/workflow-job-run-3290c02c-bc49-582e-d1d1-63220f3fe3ce/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-10-12T02%3A29%3A01Z&sig=InSwCX86huA086rqGjAXM836sM8%2Bb8zk5%2FfeVJgmpsM%3D&ske=2024-10-12T13%3A37%3A00Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2024-10-12T01%3A37%3A00Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2024-08-04&sp=r&spr=https&sr=b&st=2024-10-12T02%3A18%3A56Z&sv=2024-08-04 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Resource limits for the k8s
Hi, You can use the following configuration options to set resource limits on Kubernetes, please see more info on the configuration docs [1] - kubernetes.jobmanager.cpu.limit-factor - kubernetes.jobmanager.memory.limit-factor - kubernetes.taskmanager.cpu.limit-factor - kubernetes.taskmanager.memory.limit-factor Best, Mate [1] https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#kubernetes Maksym Serhieiev ezt írta (időpont: 2024. okt. 10., Cs, 22:47): > Hi > I'm really struggling to find out, how I can manage resource limits for k8s > deployment. We experiencing lots of cputhrotling alerts and would like to > adjust resource limits but I can't find out how. > Will appreciate any help you can provide. > > Regards, > Maksym Serhieiev >
[jira] [Created] (FLINK-36510) Upgrade Pekko from 1.0.1 to 1.1.2
Grace Grimwood created FLINK-36510: -- Summary: Upgrade Pekko from 1.0.1 to 1.1.2 Key: FLINK-36510 URL: https://issues.apache.org/jira/browse/FLINK-36510 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Reporter: Grace Grimwood Updates Pekko dependency to 1.1.2 which in turn upgrades Netty 3 to 4 (addressing FLINK-29065 and removing several CVEs from Flink). Pekko 1.1 also upgrades other dependencies such as slf4j and Jackson. For more details see the [Pekko 1.1 release notes|https://pekko.apache.org/docs/pekko/current/release-notes/releases-1.1.html]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Flink Kubernetes Operator 1.10.0 release planning
Thanks for driving this release! > cut the release branch and prepare the first RC on Monday (October 14) The cutting time LGTM, +1 for that. Best, Rui On Sat, Oct 12, 2024 at 1:02 AM Őrhidi Mátyás wrote: > Hi All! > > We chatted with Gyula offline and we think we are pretty much ready to cut > the > release branch and prepare the first RC on Monday (October 14). Let us know > if there are any outstanding PRs that need to be merged before the release > cut. > > Cheers, > Matyas > > On Thu, Oct 3, 2024 at 7:20 AM Őrhidi Mátyás > wrote: > > > Hi All, > > > > Thanks ,Gyula. I'm happy to volunteer as the release manager for this > > release. > > > > Cheers, > > Matyas > > > > On Thu, Oct 3, 2024 at 3:09 AM Gyula Fóra wrote: > > > >> Hi All! > >> > >> I would like to kick off the discussion / release process for the Flink > >> Kubernetes Operator 1.10.0 release. > >> > >> The last, 1.9.0, version was released on 1st of July, and since then we > >> have added a number of important improvements and fixes including the > >> introduction of the FlinkStateSnapshot custom resource. Looking at jira, > >> all known critical issues have been fixed and we are in a good position > to > >> cut the release. > >> > >> I propose to cut the release branch at the end of next week and kick off > >> the release process on Monday, October 21 with the first RC. > >> > >> If you would like to volunteer as a release manager please do let me > know. > >> In case we get no volunteers I am happy to take on the role myself. > >> > >> Mate Czagany already offered his help with the blogpost and testing. > >> > >> Cheers, > >> Gyula > >> > > >
[jira] [Created] (FLINK-36512) Make rescale trigger based on failed checkpoints depend on the cause
Matthias Pohl created FLINK-36512: - Summary: Make rescale trigger based on failed checkpoints depend on the cause Key: FLINK-36512 URL: https://issues.apache.org/jira/browse/FLINK-36512 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Affects Versions: 2.0.0 Reporter: Matthias Pohl [FLIP-461|https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler] introduced rescale on checkpoints. The trigger logic is also initiated for failed checkpoints (after a counter reached a configurable limit). The issue here is that we might end up considering failed checkpoints which we actually don't want to care about (e.g. checkpoint failures due to not all tasks running, yet). Instead, we should start considering checkpoints only if the job started running to avoid unnecessary (premature) rescale decisions. We already have logic like that in place in the [CheckpointCoordinator|https://github.com/apache/flink/blob/8be94e6663d8ac6e3d74bf4cd5f540cc96c8289e/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointFailureManager.java#L217] which we might want to use here as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Flink Kubernetes Operator 1.10.0 release planning
Hi All! We chatted with Gyula offline and we think we are pretty much ready to cut the release branch and prepare the first RC on Monday (October 14). Let us know if there are any outstanding PRs that need to be merged before the release cut. Cheers, Matyas On Thu, Oct 3, 2024 at 7:20 AM Őrhidi Mátyás wrote: > Hi All, > > Thanks ,Gyula. I'm happy to volunteer as the release manager for this > release. > > Cheers, > Matyas > > On Thu, Oct 3, 2024 at 3:09 AM Gyula Fóra wrote: > >> Hi All! >> >> I would like to kick off the discussion / release process for the Flink >> Kubernetes Operator 1.10.0 release. >> >> The last, 1.9.0, version was released on 1st of July, and since then we >> have added a number of important improvements and fixes including the >> introduction of the FlinkStateSnapshot custom resource. Looking at jira, >> all known critical issues have been fixed and we are in a good position to >> cut the release. >> >> I propose to cut the release branch at the end of next week and kick off >> the release process on Monday, October 21 with the first RC. >> >> If you would like to volunteer as a release manager please do let me know. >> In case we get no volunteers I am happy to take on the role myself. >> >> Mate Czagany already offered his help with the blogpost and testing. >> >> Cheers, >> Gyula >> >
[VOTE] Release flink-connector-kafka v3.3.0, release candidate #1
Hi everyone, Please review and vote on release candidate #1 for flink-connector-kafka v3.3.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], * the official Apache source release to be deployed to dist.apache.org [2], which are signed with the key with fingerprint 538B49E9BCF0B72F [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag v3.3.0-rc1 [5], * website pull request listing the new release [6]. * CI build of the tag [7]. The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Arvid Heise [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354606 [2] https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.3.0-rc1 [3] https://dist.apache.org/repos/dist/release/flink/KEYS [4] https://repository.apache.org/content/repositories/staging/org/apache/flink/flink-connector-kafka/3.3.0-1.19.1/ + https://repository.apache.org/content/repositories/staging/org/apache/flink/flink-connector-kafka/3.3.0-1.20.0/ [5] https://github.com/apache/flink-connector-kafka/releases/tag/v3.3.0-rc1 [6] https://github.com/apache/flink-web/pull/757 [7] https://github.com/apache/flink-connector-kafka/actions/runs/11281680238/
[jira] [Created] (FLINK-36515) Adds high-precision TIME type support in YAML pipeline
yux created FLINK-36515: --- Summary: Adds high-precision TIME type support in YAML pipeline Key: FLINK-36515 URL: https://issues.apache.org/jira/browse/FLINK-36515 Project: Flink Issue Type: Improvement Components: Flink CDC Reporter: yux Currently, CDC treats `TimeType` in the same way as Flink Table API, and supports time types with zero-precision, but MySQL upstream may generate time data with up to precision 9. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-36514) Unable to override include/exclude schema types in lenient mode
yux created FLINK-36514: --- Summary: Unable to override include/exclude schema types in lenient mode Key: FLINK-36514 URL: https://issues.apache.org/jira/browse/FLINK-36514 Project: Flink Issue Type: Bug Components: Flink CDC Reporter: yux If schema evolution behavior is set to LENIENT, Truncate / Drop table events will be ignored by default. However, there's currently no way for users to override this behavior due to the following code: ```java if (excludedSETypes.isEmpty() && SchemaChangeBehavior.LENIENT.equals(schemaChangeBehavior)) { // In lenient mode, we exclude DROP_TABLE and TRUNCATE_TABLE by default. This could be // overridden by manually specifying excluded types. Stream.of(SchemaChangeEventType.DROP_TABLE, SchemaChangeEventType.TRUNCATE_TABLE) .map(SchemaChangeEventType::getTag) .forEach(excludedSETypes::add); } ``` If one wants to exclude no types, it's actually not possible since passing `[]` is equivalent to passing nothing, and `DROP` and `TRUNCATE` events will still be ignored. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Release flink-connector-kafka v3.3.0, release candidate #1
+1 (non-binding) I checked: - Review JIRA release notes - Verify hashes - Verify signatures - Build from source with JDK 8/11/17 - Source code artifacts matching the current release > 2024年10月11日 23:33,Arvid Heise 写道: > > Hi everyone, > > > Please review and vote on release candidate #1 for flink-connector-kafka > v3.3.0, as follows: > > [ ] +1, Approve the release > > [ ] -1, Do not approve the release (please provide specific comments) > > > > > > The complete staging area is available for your review, which includes: > > * JIRA release notes [1], > > * the official Apache source release to be deployed to dist.apache.org [2], > which are signed with the key with fingerprint 538B49E9BCF0B72F [3], > > * all artifacts to be deployed to the Maven Central Repository [4], > > * source code tag v3.3.0-rc1 [5], > > * website pull request listing the new release [6]. > > * CI build of the tag [7]. > > > > The vote will be open for at least 72 hours. It is adopted by majority > approval, with at least 3 PMC affirmative votes. > > > > Thanks, > > Arvid Heise > > > > [1] > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354606 > > [2] > https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.3.0-rc1 > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > [4] > https://repository.apache.org/content/repositories/staging/org/apache/flink/flink-connector-kafka/3.3.0-1.19.1/ > + > https://repository.apache.org/content/repositories/staging/org/apache/flink/flink-connector-kafka/3.3.0-1.20.0/ > > [5] https://github.com/apache/flink-connector-kafka/releases/tag/v3.3.0-rc1 > > [6] https://github.com/apache/flink-web/pull/757 > > [7] > https://github.com/apache/flink-connector-kafka/actions/runs/11281680238/
[jira] [Created] (FLINK-36511) FlinkSecurityManager#checkExit StackOverFlow if haltOnSystemExit is enabled
Chesnay Schepler created FLINK-36511: Summary: FlinkSecurityManager#checkExit StackOverFlow if haltOnSystemExit is enabled Key: FLINK-36511 URL: https://issues.apache.org/jira/browse/FLINK-36511 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.13.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.19.2, 1.20.1, 2.0-preview The halt() call in checkExit() will again cause checkExit() to be called since we don't null the security manager, in contrast to how forceProcessExist is implemented. -- This message was sent by Atlassian Jira (v8.20.10#820010)