Re: [DISCUSS] Feature freeze date for 1.13

2021-04-06 Thread Kurt Young
Hi Yuval, I think you are good to go, since there is no objection from PMC. Best, Kurt On Wed, Apr 7, 2021 at 12:48 AM Yuval Itzchakov wrote: > Hi Guowei, > > Who should I speak to regarding this? I am at the final stages of the PR I > believe (Shengkai is kindly helping me make things work)

Re: Application cluster - Job execution and cluster creation timeouts

2021-04-06 Thread Yang Wang
Hi Tamir, Maybe I did not make myself clear. Here the "deployer" means our internal Flink application deployer(actually it is ververica platform), not the *ApplicationDeployer* interface in Flink. It helps with managing the lifecycle of every Flink application. And it has the same native K8s integ

Flink: Exception from container-launch exitCode=2

2021-04-06 Thread Yik San Chan
*The question is cross-posted on Stack Overflow https://stackoverflow.com/questions/66968180/flink-exception-from-container-launch-exitcode-2 . Viewing the question on Stack Overflow is preferred as I in

Re: Dynamic configuration via broadcast state

2021-04-06 Thread vishalovercome
I researched a bit more and another suggested solution is to build a custom source function that somehow waits for each operator to load it's configuration which is infact set in the open method of the source itself. I'm not sure if that's a good idea as that just exposes entire job configuration t

questions regarding stateful functions

2021-04-06 Thread Marco Villalobos
Upon reading about stateful functions, it seems as though first, a data stream has to flow to an event ingress. Then, the stateful functions will perform computations via whatever functionality it provides. Finally, the results of said computations will flow to the event egress which will be yet a

Dynamic configuration via broadcast state

2021-04-06 Thread vishalovercome
I have to make my flink job dynamically configurable and I'm thinking about using broadcast state. My current static job configuration file consists of configuration of entire set of operators which I load into a case class and then I explicitly pass the relevant configuration of each operator as i

Re: Flink - Pod Identity

2021-04-06 Thread Austin Cawley-Edwards
Great, glad to hear it Swagat! Did you end up using Flink 1.6 or were you able to upgrade to Flink 1.12? Could you also link the ticket back here if you've already made it/ make sure it is not a duplicate of FLINK-18676 ? Best, Austin On Tue, Ap

Re: [External] : Re: Need help with executing Flink CLI for native Kubernetes deployment

2021-04-06 Thread Fuyao Li
Hi Yang, Thanks for the reply, those information is very helpful. Best, Fuyao From: Yang Wang Date: Tuesday, April 6, 2021 at 01:11 To: Fuyao Li Cc: user Subject: Re: [External] : Re: Need help with executing Flink CLI for native Kubernetes deployment Hi Fuyao, Sorry for the late reply. It

Re: Flink Taskmanager failure recovery and large state

2021-04-06 Thread Robert Metzger
Hey Yaroslav, GCS is a somewhat popular filesystem that should work fine with Flink. It seems that the initial scale of a bucket is 5000 read requests per second (https://cloud.google.com/storage/docs/request-rate), your job should be at roughly the same rate (depending on how fast your job resta

Re: Source Operators Stuck in the requestBufferBuilderBlocking

2021-04-06 Thread Roman Khachatryan
Hi Sihan, Unfortunately, we are unable to reproduce the issue so far. Could you please describe in more detail the job graph, in particular what are the downstream operators and whether there is any chaining? Do I understand correctly, that Flink returned back to normal at around 8:00; worked fin

Re: [DISCUSS] Feature freeze date for 1.13

2021-04-06 Thread Yuval Itzchakov
Hi Guowei, Who should I speak to regarding this? I am at the final stages of the PR I believe (Shengkai is kindly helping me make things work) and I would like to push this into 1.13. On Fri, Apr 2, 2021 at 5:43 AM Guowei Ma wrote: > Hi, Yuval > > Thanks for your contribution. I am not a SQL ex

Re: Flink - Pod Identity

2021-04-06 Thread Swagat Mishra
I was able to solve the issue by providing a custom version of the presto jar. I will create a ticket and raise a pull request so that others can benefit from it. I will share the details here shortly. Thanks everyone for your help and support. Especially Austin, he stands out due to his interest

Re: How to know if task-local recovery kicked in for some nodes?

2021-04-06 Thread dhanesh arole
Hi Sonam, We have a similar setup. What I have observed is, when the task manager pod gets killed and restarts again ( i.e. the entire task manager process restarts ) then local recovery doesn't happen. Task manager restore process actually downloads the latest completed checkpoint from the remote

Task manager local state data after crash / recovery

2021-04-06 Thread dhanesh arole
Hey all, We are running a stateful stream processing job on k8s using per-job standalone deployment entrypoint. Flink version: 1.12.1 *Problem*: We have observed that whenever a task manager is either gracefully shut down or killed ( due to OOM, k8s worker node drain out etc ) it doesn't clean up

Re: Application cluster - Job execution and cluster creation timeouts

2021-04-06 Thread Tamir Sagi
Hey Yang Thank you for your respond We run the application cluster programmatically. I discussed about it here with an example how to run it from java and not CLI. http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Application-cluster-Best-Practice-td42011.html following your c

Re: How to know if task-local recovery kicked in for some nodes?

2021-04-06 Thread Till Rohrmann
Hi Sonam, The easiest way to see whether local state has been used for recovery is the recovery time. Apart from that you can also look for "Found registered local state for checkpoint {} in subtask ({} - {} - {}" in the logs which is logged on debug. This indicates that the local state is availab

Re: period batch job lead to OutOfMemoryError: Metaspace problem

2021-04-06 Thread Yangze Guo
> I have tried this method, but the problem still exist. How much memory do you configure for it? > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal Not quite sure about it. AFAIK, each job will have a classloader. Multiple tasks of the same job in the same TM will share the

回复: period batch job lead to OutOfMemoryError: Metaspace problem

2021-04-06 Thread 太平洋
I have tried this method, but the problem still exist. by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal? -- 原始邮件 -- 发件人:

Re: How to know if task-local recovery kicked in for some nodes?

2021-04-06 Thread Tzu-Li (Gordon) Tai
Hi Sonam, Pulling in Till (cc'ed), I believe he would likely be able to help you here. Cheers, Gordon On Fri, Apr 2, 2021 at 8:18 AM Sonam Mandal wrote: > Hello, > > We are experimenting with task local recovery and I wanted to know whether > there is a way to validate that some tasks of the j

Re: Checkpoint timeouts at times of high load

2021-04-06 Thread Robert Metzger
It could very well be that your job gets stuck in a restart loop for some reason. Can you either post the full TaskManager logs here, or try to figure out yourself why the first checkpoint that timed out, timed out? Backpressure or blocked operators are a common cause for this. In your case, it cou

Re: Why is Hive dependency flink-sql-connector-hive not available on Maven Central?

2021-04-06 Thread Yik San Chan
Thanks for the tip! On Tue, Apr 6, 2021 at 4:25 PM Rui Li wrote: > Hi Yik San, > > Glad to know you've found the jar. Another option to locate the jar is to > just use maven dependency plugin like this: > > *mvn dependency:get > -Dartifact=org.apache.flink:flink-sql-connector-hive-2.3.6_2.12:1.1

Re: period batch job lead to OutOfMemoryError: Metaspace problem

2021-04-06 Thread Yangze Guo
I think you can try to increase the JVM metaspace option for TaskManagers through taskmanager.memory.jvm-metaspace.size. [1] [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace Best, Yangze Guo Best, Yangze Guo On Tue, Apr

Re: Why is Hive dependency flink-sql-connector-hive not available on Maven Central?

2021-04-06 Thread Rui Li
Hi Yik San, Glad to know you've found the jar. Another option to locate the jar is to just use maven dependency plugin like this: *mvn dependency:get -Dartifact=org.apache.flink:flink-sql-connector-hive-2.3.6_2.12:1.12.2* On Tue, Apr 6, 2021 at 4:10 PM Yik San Chan wrote: > Hi, > > I am able t

Flink 1.12.2 sql api use parquet format error

2021-04-06 Thread ??????
ref: https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/connectors/formats/parquet.html env and error: Flink version?? 1.12.2 deployment?? standalone kubernetes session dependency:        

period batch job lead to OutOfMemoryError: Metaspace problem

2021-04-06 Thread ??????
batch job?? read data from s3 by sql??then by some operators and write data to clickhouse and kafka. after some times, task-manager quit with OutOfMemoryError: Metaspace. env?? flink version??1.12.2 task-manager slot count: 5 deployment?? standalone kubernetes session dependencies??    

Re: [External] : Re: Need help with executing Flink CLI for native Kubernetes deployment

2021-04-06 Thread Yang Wang
Hi Fuyao, Sorry for the late reply. It is not very hard to develop your own deployer. Actually, I have 3 days for developing the PoC version of flink-native-k8s-operator. So if you want to have a fully functional K8s operator, maybe two weeks is enough. But if you want to put it into production,

Re: Why is Hive dependency flink-sql-connector-hive not available on Maven Central?

2021-04-06 Thread Yik San Chan
Hi, I am able to find the jar from Maven central. See updates in the StackOverflow post. Thank you! Best, Yik San On Tue, Apr 6, 2021 at 4:05 PM Tzu-Li (Gordon) Tai wrote: > Hi, > > I'm pulling in Rui Li (cc'ed) who might be able to help you here as he > actively maintains the hive connectors

Re: Why is Hive dependency flink-sql-connector-hive not available on Maven Central?

2021-04-06 Thread Tzu-Li (Gordon) Tai
Hi, I'm pulling in Rui Li (cc'ed) who might be able to help you here as he actively maintains the hive connectors. Cheers, Gordon On Fri, Apr 2, 2021 at 11:36 AM Yik San Chan wrote: > The question is cross-posted in StackOverflow > https://stackoverflow.com/questions/66914119/flink-why-is-hiv

Re: Application cluster - Job execution and cluster creation timeouts

2021-04-06 Thread Yang Wang
Hi Tamir, Thanks for trying the native K8s integration. 1. We do not have a timeout for creating the Flink application cluster. The reason is that the job submission happens on the JobManager side. So the Flink client does not need to wait for the JobManager running and then exit. I think even t

Re: Zigzag shape in TM JVM used memory

2021-04-06 Thread Piotr Nowojski
Hi, this should be posted on the user mailing list not the dev. Apart from that, this looks like normal/standard behaviour of JVM, and has very little to do with Flink. Garbage Collector (GC) is kicking in when memory usage is approaching some threshold: https://www.google.com/search?q=jvm+heap+m