[jira] [Created] (FLINK-36513) A lot of CI failures are caused by Install cert-manager

2024-10-11 Thread Rui Fan (Jira)
Rui Fan created FLINK-36513:
---

 Summary: A lot of CI failures are caused by Install cert-manager
 Key: FLINK-36513
 URL: https://issues.apache.org/jira/browse/FLINK-36513
 Project: Flink
  Issue Type: Technical Debt
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.9.0
Reporter: Rui Fan
Assignee: Rui Fan
 Attachments: image-2024-10-12-10-30-38-781.png

A lot of CI failures are caused by Install cert-manager, such as:

[https://github.com/apache/flink-kubernetes-operator/actions/runs/11292388626/job/31408603383]

[https://github.com/apache/flink-kubernetes-operator/actions/runs/11294831791/job/31436330397]
h1. Root cause:

I checked the raw log[1], the failure reason is : _Unable to connect to the 
server: dial tcp 140.82.113.3:443: i/o timeout._

!image-2024-10-12-10-30-38-781.png|width=2702,height=982!

 

CI code: 
[https://github.com/apache/flink-kubernetes-operator/blob/d2c01737c745979c6aadb670334565ee11aa2f4a/.github/workflows/ci.yml#L227]

 

It needs to download cert-manager.yaml from github, and 140.82.113.3:443 is the 
github ip+port. So download cert-manager.yaml is the root cause.

 
h1. Solution:
 * Solution1: Introducing retry mechanism

 ** Download cert-manager.yaml first with retry mechanism
 ** Then kubectl apply -f local file
 * Solution2: Put the cert-manager.yaml on the flink-kubernetes-operator repo 
directly
 ** I saw it's a fixed version, so the cert-manager.yaml is immutable IIUC.

 

[1] 
https://productionresultssa9.blob.core.windows.net/actions-results/1c3ad627-91b6-4db3-a9ad-453109617470/workflow-job-run-3290c02c-bc49-582e-d1d1-63220f3fe3ce/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-10-12T02%3A29%3A01Z&sig=InSwCX86huA086rqGjAXM836sM8%2Bb8zk5%2FfeVJgmpsM%3D&ske=2024-10-12T13%3A37%3A00Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2024-10-12T01%3A37%3A00Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2024-08-04&sp=r&spr=https&sr=b&st=2024-10-12T02%3A18%3A56Z&sv=2024-08-04

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Resource limits for the k8s

2024-10-11 Thread Mate Czagany
Hi,

You can use the following configuration options to set resource limits on
Kubernetes, please see more info on the configuration docs [1]
- kubernetes.jobmanager.cpu.limit-factor
- kubernetes.jobmanager.memory.limit-factor
- kubernetes.taskmanager.cpu.limit-factor
- kubernetes.taskmanager.memory.limit-factor

Best,
Mate

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#kubernetes

Maksym Serhieiev  ezt írta (időpont: 2024. okt. 10.,
Cs, 22:47):

> Hi
> I'm really struggling to find out, how I can manage resource limits for k8s
> deployment. We experiencing lots of cputhrotling alerts and would like to
> adjust resource limits but I can't find out how.
> Will appreciate any help you can provide.
>
> Regards,
> Maksym Serhieiev
>


[jira] [Created] (FLINK-36510) Upgrade Pekko from 1.0.1 to 1.1.2

2024-10-11 Thread Grace Grimwood (Jira)
Grace Grimwood created FLINK-36510:
--

 Summary: Upgrade Pekko from 1.0.1 to 1.1.2
 Key: FLINK-36510
 URL: https://issues.apache.org/jira/browse/FLINK-36510
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Reporter: Grace Grimwood


Updates Pekko dependency to 1.1.2 which in turn upgrades Netty 3 to 4 
(addressing FLINK-29065 and removing several CVEs from Flink). Pekko 1.1 also 
upgrades other dependencies such as slf4j and Jackson. For more details see the 
[Pekko 1.1 release 
notes|https://pekko.apache.org/docs/pekko/current/release-notes/releases-1.1.html].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Flink Kubernetes Operator 1.10.0 release planning

2024-10-11 Thread Rui Fan
Thanks for driving this release!

> cut the release branch and prepare the first RC on Monday (October 14)

The cutting time LGTM, +1 for that.

Best,
Rui

On Sat, Oct 12, 2024 at 1:02 AM Őrhidi Mátyás 
wrote:

> Hi All!
>
> We chatted with Gyula offline and we think we are pretty much ready to cut
> the
> release branch and prepare the first RC on Monday (October 14). Let us know
> if there are any outstanding PRs that need to be merged before the release
> cut.
>
> Cheers,
> Matyas
>
> On Thu, Oct 3, 2024 at 7:20 AM Őrhidi Mátyás 
> wrote:
>
> > Hi All,
> >
> > Thanks ,Gyula. I'm happy to volunteer as the release manager for this
> > release.
> >
> > Cheers,
> > Matyas
> >
> > On Thu, Oct 3, 2024 at 3:09 AM Gyula Fóra  wrote:
> >
> >> Hi All!
> >>
> >> I would like to kick off the discussion / release process for the Flink
> >> Kubernetes Operator 1.10.0 release.
> >>
> >> The last, 1.9.0, version was released on 1st of July, and since then we
> >> have added a number of important improvements and fixes including the
> >> introduction of the FlinkStateSnapshot custom resource. Looking at jira,
> >> all known critical issues have been fixed and we are in a good position
> to
> >> cut the release.
> >>
> >> I propose to cut the release branch at the end of next week and kick off
> >> the release process on Monday, October 21 with the first RC.
> >>
> >> If you would like to volunteer as a release manager please do let me
> know.
> >> In case we get no volunteers I am happy to take on the role myself.
> >>
> >> Mate Czagany already offered his help with the blogpost and testing.
> >>
> >> Cheers,
> >> Gyula
> >>
> >
>


[jira] [Created] (FLINK-36512) Make rescale trigger based on failed checkpoints depend on the cause

2024-10-11 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-36512:
-

 Summary: Make rescale trigger based on failed checkpoints depend 
on the cause
 Key: FLINK-36512
 URL: https://issues.apache.org/jira/browse/FLINK-36512
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 2.0.0
Reporter: Matthias Pohl


[FLIP-461|https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler]
 introduced rescale on checkpoints. The trigger logic is also initiated for 
failed checkpoints (after a counter reached a configurable limit).

The issue here is that we might end up considering failed checkpoints which we 
actually don't want to care about (e.g. checkpoint failures due to not all 
tasks running, yet). Instead, we should start considering checkpoints only if 
the job started running to avoid unnecessary (premature) rescale decisions.

We already have logic like that in place in the 
[CheckpointCoordinator|https://github.com/apache/flink/blob/8be94e6663d8ac6e3d74bf4cd5f540cc96c8289e/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointFailureManager.java#L217]
 which we might want to use here as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Flink Kubernetes Operator 1.10.0 release planning

2024-10-11 Thread Őrhidi Mátyás
Hi All!

We chatted with Gyula offline and we think we are pretty much ready to cut the
release branch and prepare the first RC on Monday (October 14). Let us know
if there are any outstanding PRs that need to be merged before the release
cut.

Cheers,
Matyas

On Thu, Oct 3, 2024 at 7:20 AM Őrhidi Mátyás 
wrote:

> Hi All,
>
> Thanks ,Gyula. I'm happy to volunteer as the release manager for this
> release.
>
> Cheers,
> Matyas
>
> On Thu, Oct 3, 2024 at 3:09 AM Gyula Fóra  wrote:
>
>> Hi All!
>>
>> I would like to kick off the discussion / release process for the Flink
>> Kubernetes Operator 1.10.0 release.
>>
>> The last, 1.9.0, version was released on 1st of July, and since then we
>> have added a number of important improvements and fixes including the
>> introduction of the FlinkStateSnapshot custom resource. Looking at jira,
>> all known critical issues have been fixed and we are in a good position to
>> cut the release.
>>
>> I propose to cut the release branch at the end of next week and kick off
>> the release process on Monday, October 21 with the first RC.
>>
>> If you would like to volunteer as a release manager please do let me know.
>> In case we get no volunteers I am happy to take on the role myself.
>>
>> Mate Czagany already offered his help with the blogpost and testing.
>>
>> Cheers,
>> Gyula
>>
>


[VOTE] Release flink-connector-kafka v3.3.0, release candidate #1

2024-10-11 Thread Arvid Heise
Hi everyone,


Please review and vote on release candidate #1 for flink-connector-kafka
v3.3.0, as follows:

[ ] +1, Approve the release

[ ] -1, Do not approve the release (please provide specific comments)





The complete staging area is available for your review, which includes:

* JIRA release notes [1],

* the official Apache source release to be deployed to dist.apache.org [2],
which are signed with the key with fingerprint 538B49E9BCF0B72F [3],

* all artifacts to be deployed to the Maven Central Repository [4],

* source code tag v3.3.0-rc1 [5],

* website pull request listing the new release [6].

* CI build of the tag [7].



The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.



Thanks,

Arvid Heise



[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354606

[2]
https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.3.0-rc1

[3] https://dist.apache.org/repos/dist/release/flink/KEYS

[4]
https://repository.apache.org/content/repositories/staging/org/apache/flink/flink-connector-kafka/3.3.0-1.19.1/
+
https://repository.apache.org/content/repositories/staging/org/apache/flink/flink-connector-kafka/3.3.0-1.20.0/

[5] https://github.com/apache/flink-connector-kafka/releases/tag/v3.3.0-rc1

[6] https://github.com/apache/flink-web/pull/757

[7]
https://github.com/apache/flink-connector-kafka/actions/runs/11281680238/


[jira] [Created] (FLINK-36515) Adds high-precision TIME type support in YAML pipeline

2024-10-11 Thread yux (Jira)
yux created FLINK-36515:
---

 Summary: Adds high-precision TIME type support in YAML pipeline
 Key: FLINK-36515
 URL: https://issues.apache.org/jira/browse/FLINK-36515
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: yux


Currently, CDC treats `TimeType` in the same way as Flink Table API, and 
supports time types with zero-precision, but MySQL upstream may generate time 
data with up to precision 9.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-36514) Unable to override include/exclude schema types in lenient mode

2024-10-11 Thread yux (Jira)
yux created FLINK-36514:
---

 Summary: Unable to override include/exclude schema types in 
lenient mode
 Key: FLINK-36514
 URL: https://issues.apache.org/jira/browse/FLINK-36514
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: yux


If schema evolution behavior is set to LENIENT, Truncate / Drop table events 
will be ignored by default. However, there's currently no way for users to 
override this behavior due to the following code:

```java
if (excludedSETypes.isEmpty()
&& SchemaChangeBehavior.LENIENT.equals(schemaChangeBehavior)) {
// In lenient mode, we exclude DROP_TABLE and TRUNCATE_TABLE by default. This 
could be
// overridden by manually specifying excluded types.
Stream.of(SchemaChangeEventType.DROP_TABLE, 
SchemaChangeEventType.TRUNCATE_TABLE)
.map(SchemaChangeEventType::getTag)
.forEach(excludedSETypes::add);
}
```
 
If one wants to exclude no types, it's actually not possible since passing `[]` 
is equivalent to passing nothing, and `DROP` and `TRUNCATE` events will still 
be ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-kafka v3.3.0, release candidate #1

2024-10-11 Thread Yanquan Lv
+1 (non-binding)
I checked:
- Review JIRA release notes
- Verify hashes
- Verify signatures
- Build from source with JDK 8/11/17
- Source code artifacts matching the current release



> 2024年10月11日 23:33,Arvid Heise  写道:
> 
> Hi everyone,
> 
> 
> Please review and vote on release candidate #1 for flink-connector-kafka
> v3.3.0, as follows:
> 
> [ ] +1, Approve the release
> 
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> 
> 
> 
> The complete staging area is available for your review, which includes:
> 
> * JIRA release notes [1],
> 
> * the official Apache source release to be deployed to dist.apache.org [2],
> which are signed with the key with fingerprint 538B49E9BCF0B72F [3],
> 
> * all artifacts to be deployed to the Maven Central Repository [4],
> 
> * source code tag v3.3.0-rc1 [5],
> 
> * website pull request listing the new release [6].
> 
> * CI build of the tag [7].
> 
> 
> 
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> 
> 
> 
> Thanks,
> 
> Arvid Heise
> 
> 
> 
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354606
> 
> [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.3.0-rc1
> 
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> 
> [4]
> https://repository.apache.org/content/repositories/staging/org/apache/flink/flink-connector-kafka/3.3.0-1.19.1/
> +
> https://repository.apache.org/content/repositories/staging/org/apache/flink/flink-connector-kafka/3.3.0-1.20.0/
> 
> [5] https://github.com/apache/flink-connector-kafka/releases/tag/v3.3.0-rc1
> 
> [6] https://github.com/apache/flink-web/pull/757
> 
> [7]
> https://github.com/apache/flink-connector-kafka/actions/runs/11281680238/



[jira] [Created] (FLINK-36511) FlinkSecurityManager#checkExit StackOverFlow if haltOnSystemExit is enabled

2024-10-11 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-36511:


 Summary: FlinkSecurityManager#checkExit StackOverFlow if 
haltOnSystemExit is enabled
 Key: FLINK-36511
 URL: https://issues.apache.org/jira/browse/FLINK-36511
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.19.2, 1.20.1, 2.0-preview


The halt() call in checkExit() will again cause checkExit() to be called since 
we don't null the security manager, in contrast to how forceProcessExist is 
implemented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)