Flink on Yarn, restart job will not destroy original task manager

2018-09-03 Thread James (Jian Wu) [FDS Data Platform]
Hi: I launch flink application on yarn with 5 task manager, every task manager has 5 slots with such script #!/bin/sh CLASSNAME=$1 JARNAME=$2 ARUGMENTS=$3 export JVM_ARGS="${JVM_ARGS} -Dmill.env.active=aws" /usr/bin/flink run -m yarn-cluster --parallelism 15 -yn 5 -ys 3 -yjm 8192 -ytm 8192

Rolling File Sink Exception

2018-09-03 Thread clay4444
When I want to write compressed string data to hdfs, I found that flink only provides StringWritter, so I used a custom writter, as follows: public class StringCompressWriter extends StreamWriterBase { private static final long serialVersionUID = 1L; private String charsetName; priv

Re: Promethus - custom metrics at job level

2018-09-03 Thread Reza Sameei
Hello Averell Based on my experience, using out-of-the-box reporters & collectors need a little more effort! Of course I hadn't experienced all of them, but after reviewing some of them I tried my way: Writing custom reporters to push metrics in ElasticSearch (the available component in our projec

Re: Flink on Yarn, restart job will not destroy original task manager

2018-09-03 Thread Gary Yao
Hi James, What version of Flink are you running? In 1.5.0, tasks can spread out due to changes that were introduced to support "local recovery". There is a mitigation in 1.5.1 that prevents task spread out but local recovery must be disabled [2]. Best, Gary [1] https://issues.apache.org/jira/bro

Flink on kubernetes

2018-09-03 Thread 祁明良
Hi All, We are running flink(version 1.5.2) on k8s with rocksdb backend. Each time when the job is cancelled and restarted, we face OOMKilled problem from the container. In our case, we only assign 15% of container memory to JVM and leave others to rocksdb. To us, it looks like memory used by ro

Re: Rolling File Sink Exception

2018-09-03 Thread Chesnay Schepler
You're closing the stream and then call super.close() which calls flush, which fails since you already closed the stream. If you don't close the stream the problem should disappear. On 03.09.2018 09:30, clay wrote: When I want to write compressed string data to hdfs, I found that flink onl

Re: Flink on Yarn, restart job will not destroy original task manager

2018-09-03 Thread James (Jian Wu) [FDS Data Platform]
My Flink version is 1.5, I will rebuild new version flink Regards James From: Gary Yao Date: Monday, September 3, 2018 at 3:57 PM To: "James (Jian Wu) [FDS Data Platform]" Cc: "user@flink.apache.org" Subject: Re: Flink on Yarn, restart job will not destroy original task manager Hi James, Wh

Re: CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

2018-09-03 Thread Stephan Ewen
One final thought: How to you stop the unbounded streaming application? If you just kill the Yarn/Mesos/K8s cluster, Flink will not know that this is a shutdown, and interpret it as a failure. Because of that, checkpoints will remain (in DFS and in ZooKeeper). On Fri, Aug 31, 2018 at 2:18 PM, vin

Re: Flink on kubernetes

2018-09-03 Thread Lasse Nedergaard
Hi. We have documented the same on Flink 1.4.2/1.6 running on Yarn and Mesos. If you correlate the none heap memory together with job restart you will see none heap increases for every restart until you get an OOM. I let you know if/when I know how to handle the problem. Med venlig hilsen /

Re: Flink on kubernetes

2018-09-03 Thread 祁明良
Hi Lasse, Is there JIRA ticket I can follow? Best, Mingliang > On 3 Sep 2018, at 5:42 PM, Lasse Nedergaard wrote: > > Hi. > > We have documented the same on Flink 1.4.2/1.6 running on Yarn and Mesos. > If you correlate the none heap memory together with job restart you will see > none heap inc

Re: Flink on kubernetes

2018-09-03 Thread Lasse Nedergaard
Please try to use fsstatebackend as a test to see if the problems disappear. Med venlig hilsen / Best regards Lasse Nedergaard > Den 3. sep. 2018 kl. 11.46 skrev 祁明良 : > > Hi Lasse, > > Is there JIRA ticket I can follow? > > Best, > Mingliang > >> On 3 Sep 2018, at 5:42 PM, Lasse Nedergaard

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2018-09-03 Thread Mar_zieh
Hello I added these dependencies to "pom.xml"; also, I added configuration to my code like these: Configuration config = new Configuration(); config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true); StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(getP,

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2018-09-03 Thread Chesnay Schepler
you can setup a specific port using https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#rest-port. On 03.09.2018 12:12, Mar_zieh wrote: Hello I added these dependencies to "pom.xml"; also, I added configuration to my code like these: Configuration config = new Configuration(

Re: Promethus - custom metrics at job level

2018-09-03 Thread Averell
Thank you Reza. I will try your repo first :) Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Flink 1.5.2 query

2018-09-03 Thread Parth Sarathy
Hi, When using flink 1.5.2, “Apache Flink Only” binary (flink-1.5.2-bin-scala_2.11), following error is seen in client log: 2018-08-30 10:56:59.129 [main] WARN org.apache.flink.client.cli.CliFrontend - Could not load CLI class org.apache.flink.yarn.cli.FlinkYarnSessionCli. java.lang.NoClassDe

Re: Flink 1.5.2 query

2018-09-03 Thread Chesnay Schepler
Cannot be avoided. The CLI eagerly loads client classes for yarn, which as see fails since the hadoop classes aren't available. If you don't use YARN you can safely ignore this. On 03.09.2018 14:37, Parth Sarathy wrote: Hi, When using flink 1.5.2, “Apache Flink Only” binary (flink-1.5.2-bin-

RE: Flink 1.5.2 query

2018-09-03 Thread Sarathy, Parth
The message is in WARN level and not in INFO or DEBUG level , so our log analysis tool considers this as an issue. Can the CLI print this message in INFO/DEBUG level? -Original Message- From: Chesnay Schepler [mailto:ches...@apache.org] Sent: Monday, September 3, 2018 6:32 PM To: Parth

Why FlinkKafkaConsumer doesn't subscribe to topics?

2018-09-03 Thread Julio Biason
Hey guys, We are trying to add external monitoring to our system, but we can only get the lag in kafka topics while the Flink job is running -- if, for some reason, the Flink job fails, we get no visibility on how big the lag is. (Besides that, the way Flink reports is not accurate and produces a

JAXB Classloading errors when using PMML Library (AWS EMR Flink 1.4.2)

2018-09-03 Thread Sameer W
Hi, I am using PMML dependency as below to execute ML models at prediction time within a Flink Map operator org.jpmml pmml-evaluator 1.4.3 javax.xml.bind jaxb-api org.glassfish.jaxb jaxb-runtime guava com.google.guava Environment is EMR, OpenJDK 1.8 and Flink 1.4.2. M

Re: Flink on Yarn, restart job will not destroy original task manager

2018-09-03 Thread James (Jian Wu) [FDS Data Platform]
Hi Gary: From 1.5/1.6 document: Configuring task-local recovery Task-local recovery is deactivated by default and can be activated through Flink’s configuration with the key state.backend.local-recovery as specified in CheckpointingOptions.LOCAL_RECOVERY. The value for this setting can either b

Re: Get stream of rejected data from Elasticsearch6 sink

2018-09-03 Thread vino yang
Hi Nick, When you get the failed data, the logic for implementing the side output is similar to the logic for extending the ActionRequestFailureHandler#onFailure method to output the data to other places. Thanks, vino. Nick Triller 于2018年8月31日周五 下午9:08写道: > Hi all, > > > > is it possible to fu

Re: Why FlinkKafkaConsumer doesn't subscribe to topics?

2018-09-03 Thread Renjie Liu
Hi, Julio: 1. Flink doesn't use subscribe because it needs to control partition assignment itself, which is important for implementing exactly once. 2. Can you share the versions you are using, including kafka, kafka client, flink? We are also use flink kafka consumer and we can monitor it correct

Re: RocksDB Number of Keys Metric

2018-09-03 Thread vino yang
Hi Ahmed, If you feel that this metric is necessary, you can create an issue in JIRA, then the problem may be more easily seen by the relevant people. If you need to answer this question, maybe it is more effective to ping @Andrey? Thanks, vino. Ahmed 于2018年9月2日周日 上午2:31写道: > Is there a clean

Re: This server is not the leader for that topic-partition

2018-09-03 Thread Jayant Ameta
I am getting the same error. Is there a way to retry/ignore instead of killing the job? Jayant Ameta On Tue, May 22, 2018 at 7:58 PM gerardg wrote: > I've seen the same error while upgrading Kafka. We are using > FlinkKafkaProducer011 and Kafka version 1.0.0. While upgrading to Kafka > 1.1.0,

Re: This server is not the leader for that topic-partition

2018-09-03 Thread vino yang
Hi Jayant, Can you provide more specific information? For example, the version of your Flink, the version of kafka on which Flink-Kafka-Connector depends, and the version of kafka server. Thanks, vino. Jayant Ameta 于2018年9月4日周二 下午12:32写道: > I am getting the same error. Is there a way to retry/

Re: This server is not the leader for that topic-partition

2018-09-03 Thread Jayant Ameta
Flink: 1.4.2 flink-connector-kafka-0.11_2.11 (1.4.2) Kafka: 0.10.1.0 Jayant Ameta On Tue, Sep 4, 2018 at 10:16 AM vino yang wrote: > Hi Jayant, > > Can you provide more specific information? For example, the version of > your Flink, the version of kafka on which Flink-Kafka-Connector depends,

Re: Flink on Yarn, restart job will not destroy original task manager

2018-09-03 Thread Gary Yao
Hi James, Local recovery is disabled by default. You do not need to configure anything in addition. Did you run into problems again or does it work now? If you are stil experiencing task spread out, can you configure logging on DEBUG level, and share the jobmanager logs with us? Best, Gary On T

Re: This server is not the leader for that topic-partition

2018-09-03 Thread vino yang
Hi Jayant, Can you try to connect to kafka server 0.10.x via flink-connector-kafka-0.10? See if it still throws this exception. Thanks, vino. Jayant Ameta 于2018年9月4日周二 下午1:20写道: > Flink: 1.4.2 > flink-connector-kafka-0.11_2.11 (1.4.2) > Kafka: 0.10.1.0 > > Jayant Ameta > > > On Tue, Sep 4, 201