Re: Jobmanager not properly fenced when killed by YARN RM

2019-12-16 Thread Yang Wang
Hi Paul, Thanks for sharing your analysis. I think you are right. When the Yarn NodeManager crashed, the first jobmanager running on it will not be killed. However, the Yarn ResourceManager found the NodeManager lost, it launched a new jobmanager attempt. Before FLINK-14010

Re: Questions about taskmanager.memory.off-heap and taskmanager.memory.preallocate

2019-12-16 Thread Xintong Song
Glad that helped. I'm also posting this conversation to the public mailing list, in case other people have similar questions. And regarding the GC statement, I think the document is correct. - Flink Memory Manager guarantees that the amount of allocated managed memory never exceed the configured c

Re: Jobmanager not properly fenced when killed by YARN RM

2019-12-16 Thread Paul Lam
Hi Yang, Thanks a lot for your reasoning. You are right about the YARN cluster. The NodeManager was crashed, and that’s why RM would kill the containers on that machine, after a heartbeat timeout (about 10 min) with the NodeManager. Actually the attached logs are from the first/old jobmanager,

Re:flink 1.9 conflict jackson version

2019-12-16 Thread ouywl
Hi Bu    I think It can use mvn-shade-plugin to resolve your problem,  It seem flink-client conflict with your owner jar?

Re: Fw: Metrics based on data filtered from DataStreamSource

2019-12-16 Thread vino yang
Hi Sideny, >> I'd prefer not to consume messages I don't plan on actually handling. It depends on your design. If you produce different types into different partitions, then it's easy to filter different types from the Kafka consumer(only consume partial partition). If you do not distinguish dif

Re: [EXTERNAL] Flink and Prometheus monitoring question

2019-12-16 Thread Zhu Zhu
Hi Jesús, If your job has checkpointing enabled, you can monitor 'numberOfCompletedCheckpoints' to see wether the job is still alive and healthy. Thanks, Zhu Zhu Jesús Vásquez 于2019年12月17日周二 上午2:43写道: > The thing about numRunningJobs metric is that i have to configure in > advance the Prometheu

Re: Join a datastream with tables stored in Hive

2019-12-16 Thread Kurt Young
Great, looking forward to hearing from you again. Best, Kurt On Mon, Dec 16, 2019 at 10:22 PM Krzysztof Zarzycki wrote: > Thanks Kurt for your answers. > > Summing up, I feel like the option 1 (i.e. join with temporal table > function) requires some coding around a source, that needs to pull d

flink 1.9 conflict jackson version

2019-12-16 Thread Fanbin Bu
Hi, After I upgrade flink 1.9, I got the following error message on EMR, it works locally on IntelliJ. I'm explicitly declaring the dependency as implementation 'com.fasterxml.jackson.module:jackson-module-scala_2.11:2.10.1' and I have implementation group: 'com.amazonaws', name: 'aws-java-sdk-em

Re: [EXTERNAL] Flink and Prometheus monitoring question

2019-12-16 Thread Jesús Vásquez
The thing about numRunningJobs metric is that i have to configure in advance the Prometheus rules with the number of jobs i expect to be running in order to alert, i kind of need this rule to alert on individual jobs. I initially thought of flink_jobmanager_downtime{job_id=~".*"} == -1 , bit it res

Re: [EXTERNAL] Flink and Prometheus monitoring question

2019-12-16 Thread PoolakkalMukkath, Shakir
You could use “flink_jobmanager_numRunningJobs” to check the number of running jobs. Thanks From: Jesús Vásquez Date: Monday, December 16, 2019 at 12:47 PM To: "user@flink.apache.org" Subject: [EXTERNAL] Flink and Prometheus monitoring question Hi, I want to monitor Flink Streaming jobs using

Flink and Prometheus monitoring question

2019-12-16 Thread Jesús Vásquez
Hi, I want to monitor Flink Streaming jobs using Prometheus My first goal is to send alerts when a Flink job has failed. The thing is that looking at the documentation I haven't found a metric that helps me defining an alerting rule. As a starting point i thought that the metric flink_jobmanager_jo

Re: Join a datastream with tables stored in Hive

2019-12-16 Thread Krzysztof Zarzycki
Thanks Kurt for your answers. Summing up, I feel like the option 1 (i.e. join with temporal table function) requires some coding around a source, that needs to pull data once a day. But otherwise, bring the following benefits: * I don't have to put dicts in another store like Hbase. All stays in H

Re: Fw: Metrics based on data filtered from DataStreamSource

2019-12-16 Thread Sidney Feiner
You are right with everything you say! The solution you propose is actually what I'm trying to avoid. I'd prefer not to consume messages I don't plan on actually handling. But from what you say it sounds I have no other choice. Am I right? I MUST consume the messages, count those I want to filter

Re: Documentation tasks for release-1.10

2019-12-16 Thread vino yang
+1 for centralizing all the documentation issues so that the community can take more effective to fix them. Best, Vino Xintong Song 于2019年12月16日周一 下午6:02写道: > Thank you Kostas. > Big +1 for keeping all the documentation related issues at one place. > > I've added the documentation task for reso

Re: Jobmanager not properly fenced when killed by YARN RM

2019-12-16 Thread Yang Wang
Hi Paul, I found lots of "Failed to stop Container " logs in the jobmanager.log. It seems that the Yarn cluster is not working normally. So the Flink YarnResourceManager may also unregister app failed. If we unregister app successfully, no new attempt will be started. The second and following job

Re: Documentation tasks for release-1.10

2019-12-16 Thread Xintong Song
Thank you Kostas. Big +1 for keeping all the documentation related issues at one place. I've added the documentation task for resource management. Thank you~ Xintong Song On Mon, Dec 16, 2019 at 5:29 PM Kostas Kloudas wrote: > Hi all, > > With the feature-freeze for the release-1.10 already

Documentation tasks for release-1.10

2019-12-16 Thread Kostas Kloudas
Hi all, With the feature-freeze for the release-1.10 already past us, it is time to focus a little bit on documenting the new features that the community added to this release, and improving the already existing documentation based on questions that we see in Flink's mailing lists. To this end, I