Split Stream on a Split Stream

2019-02-26 Thread Taher Koitawala
Hi All, We are currently working with Flink 1.7.2 version and we are get the below given exception when doing a split on a split. SplitStreamsplitStream=stream1.split(new SomeSplitLogic()); DataStream select1=splitStream.select("1"); DataStream select2=splitStream.select("2"); select2

submit job failed on Yarn HA

2019-02-26 Thread 孙森
Hi all: I run flink (1.5.1 with hadoop 2.7) on yarn ,and submit job by “/usr/local/flink/bin/flink run -m jmhost:port my.jar”, but the submission is failed. The HA configuration is : high-availability: zookeeper high-availability.storageDir: hdfs:///flink/ha/ high-availability

Re: flink list and flink run commands timeout

2019-02-26 Thread sen
Hi Aneesha: I am also facing the same problem.When I turn on the HA on yarn ,it will get the same exception. While I turn off the Ha configuration ,it works fine. I want to know that what did you do to deal with the problem? Thanks! Sen Sun -- Sent from: http://apache-fli

Errors running user jars with flink in docker flink x

2019-02-26 Thread Paroma Sengupta
> Hi, > I am trying to run my flink application through docker containers. For > that I made use of the code present over here flink_docker > . > However when I try to run the docker image, it fail

Errors running user jars with flink in docker flink x

2019-02-26 Thread Paroma Sengupta
Hi, I am trying to run my flink application through docker containers. For that I made use of the code present over here flink_docker . However when I try to run the docker image, it fails with thi

Re: left join failing with FlinkLogicalJoinConverter NPE

2019-02-26 Thread Xingcan Cui
Hi Karl, I think this is a bug and created FLINK-11769 to track it. Best, Xingcan > On Feb 26, 2019, at 2:02 PM, Karl Jin wrote: > > I removed the multiset> field and the join worked fine. > The field was created from a Kafka source through

okio and okhttp not shaded in the Flink Uber Jar on EMR

2019-02-26 Thread Austin Cawley-Edwards
Hi, I recently experienced versioning clashes with the okio and okhttp when trying to deploy a Flink 1.6.0 app to AWS EMR on Hadoop 2.8.4. After investigating and talking to the okio team (see this issue) , I found that both okio and okhttp exist in the F

Re: ProgramInvocationException when I submit job by 'flink run' after running Flink stand-alone more than 1 month?

2019-02-26 Thread Zhenghua Gao
Seem like there is something wrong with RestServer and the RestClient didn't connect to it. U can check the standalonesession log for investigating causes. btw: The cause of "no cluster was found" is ur pid information was cleaned for some reason. The pid information is stored in ur TMP director

Re: ProgramInvocationException when I submit job by 'flink run' after running Flink stand-alone more than 1 month?

2019-02-26 Thread Benchao Li
Hi Son, According to your description, maybe it's caused by the '/tmp' file system retain strategy which removes tmp files regularly. Son Mai 于2019年2月27日周三 上午10:27写道: > Hi, > I'm having a question regarding Flink. > I'm running Flink in stand-alone mode on 1 host (JobManager, TaskManager > on t

ProgramInvocationException when I submit job by 'flink run' after running Flink stand-alone more than 1 month?

2019-02-26 Thread Son Mai
Hi, I'm having a question regarding Flink. I'm running Flink in stand-alone mode on 1 host (JobManager, TaskManager on the same host). At first, I'm able to submit and cancel jobs normally, the jobs showed up in the web UI and ran. However, after ~1month, when I canceled the old job and submitting

One source is much slower than the other side when join history data

2019-02-26 Thread 刘建刚
When consuming history data in join operator with eventTime, reading data from one source is much slower than the other. As a result, the join operator will cache much data from the faster source in order to wait the slower source. The question is that how can I make the difference of c

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-26 Thread vino yang
Great job. Stephan! Best, Vino Jamie Grier 于2019年2月27日周三 上午2:27写道: > This is awesome, Stephan! Thanks for doing this. > > -Jamie > > > On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen wrote: > >> Here is the pull request with a draft of the roadmap: >> https://github.com/apache/flink-web/pull/178

Re: Flink window triggering and timing on connected streams

2019-02-26 Thread Rong Rong
Hi Andrew, To add to the answer Till and Hequn already provide. WindowOperator are operating on a per-key-group based. so as long as you only have one open session per partition key group, you should be able to manage the windowing using the watermark strategy Hequn mentioned. As Till mentioned, t

Re: long lived standalone job session cluster in kubernetes

2019-02-26 Thread Chunhui Shi
Hi Heath and Till, thanks for offering help on reviewing this feature. I just reassigned the JIRAs to myself after offline discussion with Jin. Let us work together to get kubernetes integrated natively with flink. Thanks. On Fri, Feb 15, 2019 at 12:19 AM Till Rohrmann wrote: > Alright, I'll get

Re: StreamingFileSink on EMR

2019-02-26 Thread kb
Thanks! This fixed it. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: StreamingFileSink on EMR

2019-02-26 Thread Bruno Aranda
Hi, That Jar must exist for all the 1.7 versions, but I was replacing the libs for the Flink provided by the AWS EMR (1.7.0) by the more recent ones. But you could download the 1.7.0 distribution and copy the flink-s3-fs-hadoop-1.7.0.jar from there into the /usr/lib/flink/lib folder. But knowing

Re: StreamingFileSink on EMR

2019-02-26 Thread kb
Hi, So 1.7.2 jar has the fix? Thanks Kevin -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reducing the number of unique metrics

2019-02-26 Thread shkob1
Hey All, Just read the excellent monitoring blog post https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html I'm looking on reducing the number of unique metrics, especially on items i can compromise on consolidating such as using indices instead of ids. Specifically looking at t

Re: left join failing with FlinkLogicalJoinConverter NPE

2019-02-26 Thread Karl Jin
I removed the multiset> field and the join worked fine. The field was created from a Kafka source through a query that looks like "select collect(data) as i_data from ... group by pk" Do you think this is a bug or is this something I can get around using some configuration? On Tue, Feb 26, 2019 a

Re: StreamingFileSink on EMR

2019-02-26 Thread Bruno Aranda
Hey, Got it working, basically you need to add the flink-s3-fs-hadoop-1.7.2.jar libraries from the /opt folder of the flink distribution into the /usr/lib/flink/lib. That has done the trick for me. Cheers, Bruno On Tue, 26 Feb 2019 at 16:28, kb wrote: > Hi Bruno, > > Thanks for verifying. We

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-26 Thread Jamie Grier
This is awesome, Stephan! Thanks for doing this. -Jamie On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen wrote: > Here is the pull request with a draft of the roadmap: > https://github.com/apache/flink-web/pull/178 > > Best, > Stephan > > On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng wrote: > >> H

Re: Is there a Flink DataSet equivalent to Spark's RDD.persist?

2019-02-26 Thread Andrey Zagrebin
Hi Frank, This feature is currently under discussion. You can follow it in this issue: https://issues.apache.org/jira/browse/FLINK-11199 Best, Andrey On Thu, Feb 21, 2019 at 7:41 PM Frank Grimes wrote: > Hi, > > I'm trying to port an existing Spark job to Flink and have gotten stuck on > the s

Re: Submitting job to Flink on yarn timesout on flip-6 1.5.x

2019-02-26 Thread Richard Deurwaarder
Hello Gary, Thank you for your response. I'd like to use the new mode but it does not work for me. It seems I am running into a firewall issue. Because the rest.port is random when running on yarn[1]. The machine I use to deploy the job can, in fact, start the Flink cluster, but it cannot submit

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-26 Thread Stephan Ewen
Here is the pull request with a draft of the roadmap: https://github.com/apache/flink-web/pull/178 Best, Stephan On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng wrote: > Hi Stephan, > > Thanks for summarizing the great roadmap! It is very helpful for users and > developers to track the direction of

Re: StreamingFileSink on EMR

2019-02-26 Thread kb
Hi Bruno, Thanks for verifying. We are aiming for the same. Best, Kevin -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Sftp source in flink

2019-02-26 Thread Siew Wai Yow
Hi guys, Anyone can share experience on sftp source? Should i use hadoop sftpfilesystem or i can simply use any sftp java library in a user-defined source? Thanks. Regards, Yow

Re: StreamingFileSink on EMR

2019-02-26 Thread Bruno Aranda
Hi, I am having the same issue, but it is related to what Kostas is pointing out. I was trying to stream to the "s3" scheme and not "hdfs", and then getting that exception. I have realised that somehow I need to reach the S3RecoverableWriter, and found out it is in a difference library "flink-s3-

Re: StreamingFileSink on EMR

2019-02-26 Thread Kostas Kloudas
Hi Kevin, I cannot find anything obviously wrong from what you describe. Just to eliminate the obvious, you are specifying "hdfs" as the scheme for your file path, right? Cheers, Kostas On Tue, Feb 26, 2019 at 3:35 PM Till Rohrmann wrote: > Hmm good question, I've pulled in Kostas who worked o

Re: Collapsing watermarks after keyby

2019-02-26 Thread Padarn Wilson
Okay. I think I still must misunderstand something here. I will work on building a unit test around this, hopefully this clears up my confusion. Thank you, Padarn On Tue, Feb 26, 2019 at 10:28 PM Till Rohrmann wrote: > Operator's with multiple inputs emit the minimum of the input's watermarks >

Re: StreamingFileSink on EMR

2019-02-26 Thread Till Rohrmann
Hmm good question, I've pulled in Kostas who worked on the StreamingFileSink. He might be able to tell you more in case that there is some special behaviour wrt the Hadoop file systems. Cheers, Till On Tue, Feb 26, 2019 at 3:29 PM kb wrote: > Hi Till, > > The only potential issue in the path I

Re: Share broadcast state between multiple operators

2019-02-26 Thread Till Rohrmann
On Tue, Feb 26, 2019 at 3:10 PM Richard Deurwaarder wrote: > Hello Till, > > So if I understand correctly, when messages get broadcast to multiple > operators, each operator will execute the processBroadcast() function and > store the state under a sort of operator scope? Even if they use the sam

Re: StreamingFileSink on EMR

2019-02-26 Thread kb
Hi Till, The only potential issue in the path I see is `/usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-2.29.0.jar`. I double checked my pom, the project is Hadoop-free. The JM log also shows `INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop version: 2.8.5-amzn-1`.

Re: Collapsing watermarks after keyby

2019-02-26 Thread Till Rohrmann
Operator's with multiple inputs emit the minimum of the input's watermarks downstream. In case of a keyBy this means that the watermark is sent to all downstream consumers. Cheers, Till On Tue, Feb 26, 2019 at 1:47 PM Padarn Wilson wrote: > Just to add: by printing intermediate results I see th

Re: Share broadcast state between multiple operators

2019-02-26 Thread Richard Deurwaarder
Hello Till, So if I understand correctly, when messages get broadcast to multiple operators, each operator will execute the processBroadcast() function and store the state under a sort of operator scope? Even if they use the same MapStateDescriptor? And if it replicates the state between operator

Re: Collapsing watermarks after keyby

2019-02-26 Thread Padarn Wilson
Just to add: by printing intermediate results I see that I definitely have more than five minutes of data, and by windowing without the session windows I see that event time watermarks do seem to be generated as expected. Thanks for your help and time. Padarn On Tue, 26 Feb 2019 at 8:43 PM, Pada

Re: Collapsing watermarks after keyby

2019-02-26 Thread Padarn Wilson
Hi Till, I will work on an example, but I’m a little confused by how keyBy and watermarks work in this case. This documentation says ( https://ci.apache.org/projects/flink/flink-docs-stable/dev/event_time.html#watermarks-in-parallel-streams ): Some operators consume multiple input streams; a uni

Re: Share broadcast state between multiple operators

2019-02-26 Thread Till Rohrmann
Hi Richard, Flink does not support to share state between multiple operators. Technically also the broadcast state is not shared but replicated between subtasks belonging to the same operator. So what you can do is to send the broadcast input to different operators, but they will all keep their ow

Re: Flink 1.6.4 signing key file in docker-flink repo?

2019-02-26 Thread Till Rohrmann
Hi William, where do you get the /KEYS file from? Have you imported the latest KEYS from here [1]? [1] https://dist.apache.org/repos/dist/release/flink/KEYS Cheers, Till On Mon, Feb 25, 2019 at 5:16 PM William Saar wrote: > Trying to build a new Docker image by replacing 1.6.3 with 1.6.4 in t

Re: StreamingFileSink on EMR

2019-02-26 Thread Till Rohrmann
Hi Kevin, could you check what's on the class path of the Flink cluster? You should see this in the jobmanager.log at the top. It seems as if there is a Hadoop dependency with a lower version. Flink 1.7 is build against which Hadoop version? You should make sure that you either use the Hadoop-free

Re: What are blobstore files and why do they keep filling up /tmp directory?

2019-02-26 Thread Till Rohrmann
Hi Harshith, the blob store files are necessary to distribute the Flink job in your cluster. After the job has been completed, they should be cleaned up. Only in the case of cluster crashes the clean up should not happen. Since Flink 1.4.2 is no longer actively supported, I would suggest to upgrad

Re: Collapsing watermarks after keyby

2019-02-26 Thread Till Rohrmann
Hi Padarn, Flink does not generate watermarks per keys. Atm watermarks are always global. Therefore, I would suspect that it is rather a problem with generating watermarks at all. Could it be that your input data does not span a period longer than 5 minutes and also does not terminate? Another pro

Re: Flink window triggering and timing on connected streams

2019-02-26 Thread Till Rohrmann
Hi Andrew, if using connected streams (e.g. CoFlatMapFunction or CoMapFunction), then the watermarks will be synchronized across both inputs. Concretely, you will always emit the minimum of the watermarks arriving on input channel 1 and 2. Take a look at AbstractStreamOperator.java:773-804. Cheer