Re: Dockerised Flink 1.8 with Hadoop S3 FS support

2020-07-03 Thread Yang Wang
Hi Lorenzo, Since Flink 1.8 does not support plugin mechanism to load filesystem, you need to copy flink-s3-fs-hadoop-*.jar from opt to lib directory. The dockerfile could be like following. FROM flink:1.8-scala_2.11 RUN cp /opt/flink/opt/flink-s3-fs-hadoop-*.jar /opt/flink/lib Then build you

Dockerised Flink 1.8 with Hadoop S3 FS support

2020-07-02 Thread Lorenzo Nicora
Hi I need to set up a dockerized *session cluster* using Flink *1.8.2* for development and troubleshooting. We are bound to 1.8.2 as we are deploying to AWS Kinesis Data Analytics for Flink. I am using an image based on the semi-official flink:1.8-scala_2.11 I need to add to my dockerized

Re: Best way to pass Program arguments securely in flink 1.8

2019-10-09 Thread vivekanand yaram
hi all, Any comments on best way to pass program arguments (lets say if we are passing any credentials) securely to the flink job. I found the way to hide them from the web ui. But still looking from solution something like , Fetching it from form environment or some other source , so that we

Best way to pass Program arguments securely in flink 1.8

2019-10-09 Thread vivekanand yaram
hi all, Any comments on best way to pass program arguments securely to the flink job. Regards, Vivekanand.

Re: Batch mode with Flink 1.8 unstable?

2019-09-19 Thread Fabian Hueske
Hi Ken, Changing the parallelism can affect the generation of input splits. I had a look at BinaryInputFormat, and it adds a bunch of empty input splits if the number of generated splits is less than the minimum number of splits (which is equal to the parallelism). See --> https://github.com/apac

Re: Batch mode with Flink 1.8 unstable?

2019-09-19 Thread Till Rohrmann
Good to hear that some of your problems have been solved Ken. For the UTFDataFormatException it is hard to tell. Usually it says that the input has been produced using `writeUTF`. Cloud you maybe provide an example program which reproduces the problem? Moreover, it would be helpful to see how the i

Re: Batch mode with Flink 1.8 unstable?

2019-09-18 Thread Ken Krugler
Hi Till, I tried out 1.9.0 with my workflow, and I no longer am running into the errors I described below, which is great! Just to recap, this is batch, per-job mode on YARN/EMR. Though I did run into a new issue, related to my previous problem when reading files written via SerializedOutputFo

Re: Flink 1.8: Using the RocksDB state backend causes "NoSuchMethodError" when trying to stop a pipeline

2019-08-14 Thread Kaymak, Tobias
.5.jar but in rocksdbjni-5.17.2 (or we can say > frocksdbjni-5.17.2-artisans-1.0 in Flink-1.8). That's why you come across > this NoSuchMethodError exception. > > If no necessary, please do not assemble rocksdbjni package in your user > code jar as flink-dist already provide all n

Re: Flink 1.8: Using the RocksDB state backend causes "NoSuchMethodError" when trying to stop a pipeline

2019-08-13 Thread Yun Tang
2 (or we can say frocksdbjni-5.17.2-artisans-1.0 in Flink-1.8). That's why you come across this NoSuchMethodError exception. If no necessary, please do not assemble rocksdbjni package in your user code jar as flink-dist already provide all needed classes. Moreover, adding dependenc

Flink 1.8: Using the RocksDB state backend causes "NoSuchMethodError" when trying to stop a pipeline

2019-08-13 Thread Kaymak, Tobias
did not have any problem, could it be that this is caused by the switch to FRocksDb?[0] Best, Tobias [0] https://ci.apache.org/projects/flink/flink-docs-stable/release-notes/flink-1.8.html#rocksdb-version-bump-and-switch-to-frocksdb-flink-10471

?????? logger error of flink 1.8+ docker mode

2019-07-10 Thread Ever
" start-foreground "$@" $FLINK_HOME/bin/taskmanager.sh start "$@" #hold the container tail -f $CONF_FILE -- -- ??: "Yang Wang"; : 2019??7??10??(??) 3:02 ??: "Ever"<439674...@qq.c

Re: logger error of flink 1.8+ docker mode

2019-07-10 Thread Yang Wang
Hi Ever, May you share more information about your environment? If you could not find any log in the flink dashboard, you need to check the log4j of job manager and task manager. Make sure you could find the log files(jobmanager.log/taskmanager.log) under /opt/flink/log in the docker container. I

logger error of flink 1.8+ docker mode

2019-07-09 Thread Ever
After deploying Flink 1.8.0 with docker, logger page for jobManager and taskManager doesn't work. I try to run flink with background mode instead of start-foreground(docker-entrypoint.sh) , but the container can't be started. Anyone can help?

Re: Batch mode with Flink 1.8 unstable?

2019-07-02 Thread Till Rohrmann
Thanks for the update Ken. The input splits seem to be org.apache.hadoop.mapred.FileSplit. Nothing too fancy pops into my eye. Internally they use org.apache.hadoop.mapreduce.lib.input.FileSplit which stores a Path, two long pointers and two string arrays with hosts and host infos. I would assume t

Re: Batch mode with Flink 1.8 unstable?

2019-07-01 Thread Ken Krugler
Hi Stephan, Thanks for responding, comments inline below… Regards, — Ken > On Jun 26, 2019, at 7:50 AM, Stephan Ewen wrote: > > Hi Ken! > > Sorry to hear you are going through this experience. The major focus on > streaming so far means that the DataSet API has stability issues at scale. >

Re: Batch mode with Flink 1.8 unstable?

2019-07-01 Thread Ken Krugler
Hi Till, Thanks for following up. I’ve got answers to other emails on this thread pending, but wanted to respond to this one now. > On Jul 1, 2019, at 7:20 AM, Till Rohrmann wrote: > > Quick addition for problem (1): The AkkaRpcActor should serialize the > response if it is a remote RPC and

Re: Batch mode with Flink 1.8 unstable?

2019-07-01 Thread Till Rohrmann
Quick addition for problem (1): The AkkaRpcActor should serialize the response if it is a remote RPC and send an AkkaRpcException if the response's size exceeds the maximum frame size. This should be visible on the call site since the future should be completed with this exception. I'm wondering wh

Re: Batch mode with Flink 1.8 unstable?

2019-07-01 Thread Till Rohrmann
Hi Ken, in order to further debug your problems it would be helpful if you could share the log files on DEBUG level with us. For problem (2), I suspect that it has been caused by Flink releasing TMs too early. This should be fixed with FLINK-10941 which is part of Flink 1.8.1. The 1.8.1 release s

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread Biao Liu
Hi Ken again, In regard to TimeoutException, I just realized that there is no akka.remote.OversizedPayloadException in your log file. There might be some other reason caused this. 1. Have you ever tried increasing the configuration "akka.ask.timeout"? 2. Have you ever checked the garbage collectio

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread Biao Liu
Hi Ken, In regard to oversized input splits, it seems to be a rare case beyond my expectation. However it should be fixed definitely since input split can be user-defined. We should not assume it must be small. I agree with Stephan that maybe there is something unexpectedly involved in the input s

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread qi luo
Hi Stephan, We have met similar issues described as Ken. Would all these issues be hopefully fixed in 1.9? Thanks, Qi > On Jun 26, 2019, at 10:50 PM, Stephan Ewen wrote: > > Hi Ken! > > Sorry to hear you are going through this experience. The major focus on > streaming so far means that the

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread Stephan Ewen
Hi Ken! Sorry to hear you are going through this experience. The major focus on streaming so far means that the DataSet API has stability issues at scale. So, yes, batch mode in current Flink version can be somewhat tricky. It is a big focus of Flink 1.9 to fix the batch mode, finally, and by add

Batch mode with Flink 1.8 unstable?

2019-06-23 Thread Ken Krugler
Hi all, I’ve been running a somewhat complex batch job (in EMR/YARN) with Flink 1.8.0, and it regularly fails, but for varying reasons. Has anyone else had stability with 1.8.0 in batch mode and non-trivial workflows? Thanks, — Ken 1. TimeoutException getting input splits The batch job star

Re: Flink 1.8

2019-06-04 Thread Vishal Santoshi
Based on where this line of code is, it is hard to get the full stack trace, as in the LOG.error("Cannot update accumulators for job {}.", getJobID(), e); does not get us the full stack trace Though it is true that the Accumulator did not have a serialVersionUID. I would double check. I th

Re: Flink 1.8

2019-06-04 Thread Timothy Victor
It's hard to tell without more info. >From the method that threw the exception it looked like it was trying to deserialize the accumulator. By any chance did you change your accumulator class but forgot to update the serialVersionUID? Just wondering if it is trying to deserialize to a different

Flink 1.8

2019-06-04 Thread Vishal Santoshi
I see tons of org.apache.flink.runtime.executiongraph.ExecutionGraph- Cannot update accumulators for job 7bfe57bb0ed1c5c2f4f40c2fccaab50d. java.lang.NullPointerException https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Exe

Flink 1.8: Job manager redirection not happening in High Availability mode

2019-05-28 Thread Kumar Bolar, Harshith
Hi all, Prior to upgrading to 1.8, there was one active job manager and when I try to access the inactive job manager's web UI, the page used to get redirected to the active job manager. But now there is no redirection happening from the inactive JM to active JM. Did something change to the red

Re: Error restoring from checkpoint on Flink 1.8

2019-04-24 Thread Till Rohrmann
For future reference here is a cross link to the referred ML thread discussion [1]. [1] http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E Cheers, Till On Wed, Apr 24, 2019 at 4:00 AM Ning Shi wrote: > Hi Congxian, > > I think I have figured

Re: Error restoring from checkpoint on Flink 1.8

2019-04-23 Thread Ning Shi
Hi Congxian, I think I have figured out the issue. It's related to the checkpoint directory collision issue you responded to in the other thread. We reproduced this bug on 1.6.1 after unchaining the operators. There are two stateful operators in the chain, one is a CoBroadcastWithKeyedOperator, t

Re: Error restoring from checkpoint on Flink 1.8

2019-04-22 Thread Ning Shi
Congxian, Thanks for the reply. I will try to get a minimum reproducer and post it to this thread soon. Ning On Sun, 21 Apr 2019 09:27:12 -0400, Congxian Qiu wrote: > > Hi, > From the given error message, this seems flink can't open RocksDB because > of the number of column family mismatch, do

Re: Error restoring from checkpoint on Flink 1.8

2019-04-21 Thread Congxian Qiu
n the chain mentioned in the > error message is a KeyedBroadcastProcessFunction, which I believe > creates an InternalTimerService implicitly. That might be why > "_timer_state" appears in this operator chain. However, it is still a > mystery to me why it worked in Flink 1.6 but

Re: Error restoring from checkpoint on Flink 1.8

2019-04-20 Thread Ning Shi
worked in Flink 1.6 but not in Flink 1.8. Any insights would be appreciated. Ning On Sat, Apr 20, 2019 at 10:28 PM Ning Shi wrote: > > When testing a job on Flink 1.8, we hit the following error during > resuming from RocksDB checkpoint. This job has been working well on > Flink 1.6

Error restoring from checkpoint on Flink 1.8

2019-04-20 Thread Ning Shi
When testing a job on Flink 1.8, we hit the following error during resuming from RocksDB checkpoint. This job has been working well on Flink 1.6.1. The checkpoint was taken using the exact same job on 1.8. The operator StreamMap_3c5866a6cc097b462de842b2ef91910d it mentioned in the error message is

Re: inconsistent module descriptor for hadoop uber jar (Flink 1.8)

2019-04-16 Thread Hao Sun
I am using sbt and sbt-assembly. In build.sbt libraryDependencies ++= Seq("org.apache.flink" % "flink-shaded-hadoop2-uber" % "2.8.3-1.8.0") Hao Sun On Tue, Apr 16, 2019 at 12:07 AM Gary Yao wrote: > Hi, > > Can you describe how to reproduce this? > > Best, > Gary > > On Mon, Apr 15, 2019 at

Re: inconsistent module descriptor for hadoop uber jar (Flink 1.8)

2019-04-16 Thread Gary Yao
Hi, Can you describe how to reproduce this? Best, Gary On Mon, Apr 15, 2019 at 9:26 PM Hao Sun wrote: > Hi, I can not find the root cause of this, I think hadoop version is mixed > up between libs somehow. > > --- ERROR --- > java.text.ParseException: inconsistent module descriptor file found

inconsistent module descriptor for hadoop uber jar (Flink 1.8)

2019-04-15 Thread Hao Sun
Hi, I can not find the root cause of this, I think hadoop version is mixed up between libs somehow. --- ERROR --- java.text.ParseException: inconsistent module descriptor file found in ' https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop2-uber/2.8.3-1.8.0/flink-shaded-hadoop2-uber

Re: What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

2019-03-12 Thread Konstantin Knauf
implementation. Am I right? > > Best, > Tony Wei > > Konstantin Knauf 於 2019年3月9日 週六 上午7:00寫道: > >> Hi Tony, >> >> before Flink 1.8 expired state is only cleaned up, when you try to access >> it after expiration, i.e. when user code tries to access the expired

Re: What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

2019-03-10 Thread Tony Wei
red states should be clean up as well based on Flink 1.6's implementation. Am I right? Best, Tony Wei Konstantin Knauf 於 2019年3月9日 週六 上午7:00寫道: > Hi Tony, > > before Flink 1.8 expired state is only cleaned up, when you try to access > it after expiration, i.e. when user code tries

Re: What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

2019-03-08 Thread Konstantin Knauf
Hi Tony, before Flink 1.8 expired state is only cleaned up, when you try to access it after expiration, i.e. when user code tries to access the expired state, the state value is cleaned and "null" is returned. There was also already the option to clean up expired state during full

What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

2019-03-08 Thread Tony Wei
Hi everyone, I read the Flink 1.8 release notes about state [1], and it said *Continuous incremental cleanup of old Keyed State with TTL* > We introduced TTL (time-to-live) for Keyed state in Flink 1.6 (FLINK-9510 > <https://issues.apache.org/jira/browse/FLINK-9510>). This feature