How to write stream data to other Hadoop Cluster by StreamingFileSink

2019-10-04 Thread Jun Zhang
Hi,all: I have 2 hadoop cluster (hdfs://mycluster1 and hdfs://mycluster2),both of them configured the HA, I have a job ,read from streaming data from kafka, and write it to hdfs by StreamingFileSink,now I deployed my job on mycluster1 (flink on yarn),and I want to write the data to mycluster2

Re: Flink using Oozie in Kerberized cluster

2019-10-04 Thread Srivastava,Rajat
Moving out of Cloudera is not an option for us. By bounded flink, I actually meant by running a bounded beam pipeline on a flink runner. Best, Rajat Srivastava From: sri hari kali charan Tummala Date: Friday, October 4, 2019 at 11:41 AM To: "Srivastava,Rajat" Subject: Re: Flink using Oozie in

Multiple Taskmanagers per node for standalone cluster

2019-10-04 Thread Ethan Li
Hello, Does/did anyone try to set up a standalone cluster with multiple TaskManagers per node? We are working on moving to flink-on-yarn solution. But before that happens, I am thinking about the following setup to get jobs isolated from each other 1) multiple taskmanager per host 2) 1 taskS

Re: Flink using Oozie in Kerberized cluster

2019-10-04 Thread Srivastava,Rajat
It’s on a Cloudera managed cluster. Best, Rajat Srivastava From: sri hari kali charan Tummala Date: Friday, October 4, 2019 at 7:39 AM To: "Srivastava,Rajat" Subject: Re: Flink using Oozie in Kerberized cluster is this on AWS or AWS EMR or Cloudera ? On Thu, Oct 3, 2019 at 3:51 PM Srivastava

Re: Finding the Maximum Value Received so far in a Stream

2019-10-04 Thread Theo Diefenthal
Hi Komal, regarding using max Method: You can call .map() on your stream and convert the POJO to another stream/type, e.g. having only the x coordinate of the POJO and then apply the max operator. And as the others said: You are working on a keyed stream per fish_id, so you will get one maxi

Re: CEP operator in SQL pattern match does not clear it's state

2019-10-04 Thread Theo Diefenthal
Hi Muhammad, With "not going done" you mean that it's still growing or that it's constant? In case of it being constant, that's pretty much what is expected, right? If it's still growing, the only reason I could come up with: Do you work in event time and have watermarks properly assigned with

Re: How to prevent from launching 2 jobs at the same time

2019-10-04 Thread Theo Diefenthal
My simple workaround for it: I start the applications always from the same machine via CLI and just make a file-system-lock around execution of the check-if-task-is-already-running and task-launching part. This of course is a possible single-point-of-failure to rely on one machine starting the j

Re: Increasing trend for state size of keyed stream using ProcessWindowFunction with ProcessingTimeSessionWindows

2019-10-04 Thread Oliwer Kostera
Hi, I actually created Jira issue before posting it to mailing list. Today I added steps to reproduce with tests outcome of different scenarios to the repository. Jira issue: https://issues.apache.org/jira/browse/FLINK-14197 Repository: https://github.com/loliver1234/flink-process-window-functi

Questions about how to use State Processor API

2019-10-04 Thread Tony Wei
Hi, I'm recently trying to use State Processor API, but I have some questions during the development. 1. Does `OperatorTransformation#bootstrapWith` support scala api `DataSet`? I tried, but IDE showed that it will have compile error on that line. 2. It seems that creating RocksDBStateBackend sh