RE: Flink restarts on Checkpoint failure

2021-09-01 Thread Schwalbe Matthias
Good morning Daniel, Another reason could be backpressure with aligned checkpoints: * Flink processes checkpoints by sending checkpoint markers through the job graph, beginning with source operators towards the sink operators * These checkpoint markers are sort of a meta event that is se

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-09-01 Thread Yang Wang
Given that the limit-factor should be greater than 1, then using the limit-factor could also work for memory. > Why do we need a larger memory resource limit than request? A typical use case I could imagine is the page cache. Having more page cache might improve the performance. And they could be

Re: Clarifying Documentation on Custom Triggers

2021-09-01 Thread Caizhi Weng
Hi! If you look into the code of WindowOperator you'll see a cleanup timer is registered for each element. This cleanup timer is used to deal with late records. I suppose that is the timer which calls the onEventTime in your trigger. Trigger is a class for the user to decide whether to fire a win

Use FlinkKafkaConsumer to synchronize multiple Kafka topics

2021-09-01 Thread Yan Wang
Hi, We are currently using a single FlinkKafkaConsumer to consume multiple Kafka topics, however, we find that if one of the Kafka topics goes down at run time(like rebooting one of the topics), the FlinkKafkaConsumer will keep throwing warning message of the dead Kafka topic, and will also con

Re: Flink restarts on Checkpoint failure

2021-09-01 Thread Caizhi Weng
Hi! There are a ton of possible reasons for a checkpoint failure. The most possible reasons might be * The JVM is busy with garbage collecting when performing the checkpoints. This can be checked by looking into the GC logs of a task manager. * The state suddenly becomes quite large due to some sp

Re: De/Serialization API to tear-down user code

2021-09-01 Thread Caizhi Weng
Hi! The (De)serializationSchema is only a helper for changing the data object to another format. What's your use case? If you're creating a (De)serializationSchema for a source / sink you might want to open and close the resources in the open / close methods of the source / sink, not in the (De)se

Seeing Exception ClassNotFoundException: __wrapper while running in Kubernetes Cluster

2021-09-01 Thread Praneeth Ramesh
Hi All I am trying to run a flink scala application which reads from kafka apply some lookup transformations and then writes to kafka. I am using Flink Version 1.12.1 I tested it in local and it works fine. But when I try to run it on cluster using native kubernetes integration I see weird er

Re: logback variable substitution in kubernetes

2021-09-01 Thread houssem
Hello, i tried with the curly braces, unfortunately it didn't work same thing . On 2021/09/01 17:54:58, Alexis Sarda-Espinosa wrote: > I'm fairly certain you need the curly braces surrounding the variable, the > substitution is not done by the shell, it's just similar syntax (as mentioned >

Re: logback variable substitution in kubernetes

2021-09-01 Thread Alexis Sarda-Espinosa
I'm fairly certain you need the curly braces surrounding the variable, the substitution is not done by the shell, it's just similar syntax (as mentioned in the doc http://logback.qos.ch/manual/configuration.html#variableSubstitution). Chapter 3: Logback configuration - QOS.ch

High availability - leader election not working?

2021-09-01 Thread jonas eyob
Hey all, I have a 2 Job Manager 1 Task Manager (2 slots) setup. Wanted to simply try to see if the leader election would work correctly. We are using: * Standalone Application Cluster setup on Kubernetes, and have followed the example configurations provided in the documentation for HA. * Using t

Re: logback variable substitution in kubernetes

2021-09-01 Thread houssem
Yes i did it all when i hard code the log level and the file name everything works fine but when i try to use variables , they won't be replaced. On 2021/09/01 11:43:21, Yang Wang wrote: > Did you have removed the log4j related jars in the $FLINK_HOME/lib > directory? > Refer to the docume

Re: Clarifying Documentation on Custom Triggers

2021-09-01 Thread Aeden Jameson
Hi Caizhi, Thanks for responding. What i mean specifically is that if I do something like this env .addSource(events) .keyBy() .window(TumblingEventTimeWindows.of(Time.seconds(1))) .trigger(new EmptyTrigger()) .process(new

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-09-01 Thread spoon_lz
Yes, shrinking the requested memory will result in OOM. We do this because the user-created job provides an initial value (for example, 2 cpus and 4096MB of memory for TaskManager). In most cases, the user will not change this value unless the task fails or there is an exception such as data del

Re: logback variable substitution in kubernetes

2021-09-01 Thread Yang Wang
Did you have removed the log4j related jars in the $FLINK_HOME/lib directory? Refer to the documentation[1] for how to use logback. [1]. https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/advanced/logging/#configuring-logback Best, Yang houssem 于2021年9月1日周三 下午5:00写道: > Yes

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-09-01 Thread Yang Wang
Hi Lz, Thanks for sharing your ideas. I have to admin that I prefer the limit factor to set the resource limit, not the percentage to set the resource request. Because usually the resource request is configured or calculated by Flink, and it indicates user required resources. It has the same sema

Flink restarts on Checkpoint failure

2021-09-01 Thread Daniel Vol
Hello, I see the following error in my jobmanager log (Flink on EMR): Checking cluster logs I see : 2021-08-21 17:17:30,489 [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 1 (type=CHECKPOINT) @ 1629566250303 for job c513e9ebbea4ab72d80b133

De/Serialization API to tear-down user code

2021-09-01 Thread Sergio Morales
Hi, I’m currently working to close some resources while using the SerializationSchema and DeserializationSchema (Flink-core v1.12.1), however, after revising the document outlining the API the methods (https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148645988

Re: Move already processed file from one folder to another folder in flink

2021-09-01 Thread Samir Vasani
Hi I did not understand why you are using table when we are working on a program? On Mon, Jul 26, 2021, 7:20 AM Caizhi Weng wrote: > Hi! > > For the UDF solution, you can add a "file name" column to your csv file > like this: > id,value,filename > 1,100, > 2,200, > 3,300,test.csv > > Only the f

Re: Cleaning old incremental checkpoint files

2021-09-01 Thread Robin Cassan
Thanks Robert for your answer, this seems to be what we observed when we tried to delete the first time: Flink complained about missing files. I'm wondering then how are people cleaning their storage for incremental checkpoints? Is there any guarantee when using TTLs that after the TTL has expired,

Re: logback variable substitution in kubernetes

2021-09-01 Thread houssem
Yes i did this verification and i have all environment variables. On 2021/09/01 06:09:27, Yang Wang wrote: > From the logback documentation[1], it could support OS > environment substitution. > Could you please check that the environment variables have been properly > set? > Maybe you could tunn

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-09-01 Thread spoon_lz
Hi,everyone I have some other ideas for kubernetes resource Settings, as described by WangYang in [flink-15648], which increase the CPU limit by a certain percentage to provide more computational performance for jobs. Should we consider the alternative of shrinking the request to start more jo

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-09-01 Thread 971066723
Hi,everyoneI have some other ideas for kubernetes resource Settings, as described by WangYang in [flink-15648], which increase the CPU limit by a certain percentage to provide more computational performance for jobs. Should we con

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-09-01 Thread Denis Cosmin NUTIU
Hi Yang, I have limited Flink internals knowledge, but I can try to implement FLINK-15648 and open up a PR on GitHub or send the patch via email. How does that sound? I'll sign the ICLA and switch to my personal address. Sincerely, Denis On Wed, 2021-09-01 at 13:48 +0800, Yang Wang wrote: Grea

Checkpointing failure, subtasks get stuck

2021-09-01 Thread Xiangyu Su
Hello Everyone, We were facing checkpointing failure issue since version 1.9, currently we are using version 1.13.2 We are using filesystem(s3) as statebackend, 10 mins checkpoint timeout, usually the checkpoint takes 10-30 seconds. But sometimes I have seen Job failed and restarted due to checkp

RE: Unrecoverable apps due to timeouts on transaction state initialization

2021-09-01 Thread Schwalbe Matthias
Hi Chohan, Which Kafka client version are you using? ... considering that this started today, did you recently change the Kafka client version? Giving a little more context (exception call stack/more log) might help finding out what is going on ... 😊 Regards Thias -Original Mes

FLINK-14316 happens on version 1.13.2

2021-09-01 Thread Xiangyu Su
Hello Everyone, We upgrade flink to 1.13.2, and we were facing randomly the "Job leader ... lost leadership" error, the job keep restarting and failing... It behaviours like this ticket https://issues.apache.org/jira/browse/FLINK-14316 Did anybody had same issue or any suggestions? Best Regards,

Re: [ANNOUNCE] Apache Flink Stateful Functions 3.1.0 released

2021-09-01 Thread Till Rohrmann
Great news! Thanks a lot for all your work on the new release :-) Cheers, Till On Wed, Sep 1, 2021 at 9:07 AM Johannes Moser wrote: > Congratulations, great job. 🎉 > > On 31.08.2021, at 17:09, Igal Shilman wrote: > > The Apache Flink community is very happy to announce the release of Apache >

Re: [ANNOUNCE] Apache Flink Stateful Functions 3.1.0 released

2021-09-01 Thread Johannes Moser
Congratulations, great job. 🎉 > On 31.08.2021, at 17:09, Igal Shilman wrote: > > The Apache Flink community is very happy to announce the release of Apache > Flink Stateful Functions (StateFun) 3.1.0. > > StateFun is a cross-platform stack for building Stateful Serverless > applications, making

Unrecoverable apps due to timeouts on transaction state initialization

2021-09-01 Thread Shahid Chohan
Today I started seeing the following exception across all of the exactly-once kafka sink apps I have deployed org.apache.kafka.common.errors.TimeoutException: org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 6ms. Caused by: org.apach