Re: Role of Job Manager

2019-06-18 Thread Biao Liu
Hi Pankaj, That's really a good question. There was a refactor of architecture before[1]. So there might be some descriptions used the outdated concept. Before refactoring, Job Manager is a centralized role. It controls whole cluster and all jobs which is described in your interpretation 1. Afte

unsubscribe

2019-06-18 Thread Sheel Pancholi

Re: [EXTERNAL] Re: How to restart/recover on reboot?

2019-06-18 Thread John Smith
Ah ok we need to pass --host. The command line help sais jobmanager.sh ?!?! If I recall. I have to go check tomorrow... On Tue., Jun. 18, 2019, 10:05 p.m. PoolakkalMukkath, Shakir, < shakir_poolakkalmukk...@comcast.com> wrote: > Hi Nick, > > > > It works that way by explicitly setting the –host.

Re: [EXTERNAL] Re: How to restart/recover on reboot?

2019-06-18 Thread PoolakkalMukkath, Shakir
Hi Nick, It works that way by explicitly setting the –host. I got mislead by the “only” word in doc and did not try. Thanks for the help Thanks, Shakir From: "Martin, Nick" Date: Tuesday, June 18, 2019 at 6:31 PM To: "PoolakkalMukkath, Shakir" , Till Rohrmann , John Smith Cc: user Subject: R

Re: How to build dependencies and connections between stream jobs?

2019-06-18 Thread 徐涛
Hi Knauf, The solution that I can think of to coordinate between different stream jobs is : For example there are two streaming jobs, Job_1 and Job_2: Job_1: receive data from the original kafka topic, TOPIC_ORIG for example, sink the data to another kafka topic, TOPIC

RE: [EXTERNAL] Re: How to restart/recover on reboot?

2019-06-18 Thread Martin, Nick
Jobmanager.sh takes an optional argument for the hostname to bind to, and start-cluster uses it. If you leave it blank it, the script will use whatever is in flink-conf.yaml (localhost is the default value that ships with flink). The dockerized version of flink runs pretty much the way you’re tr

Re: [EXTERNAL] Re: How to restart/recover on reboot?

2019-06-18 Thread PoolakkalMukkath, Shakir
Hi Tim,John, I do agree with the issue John mentioned and have the same problem. We can only start a standalone HA cluster with ./start-cluster.sh script. And then when there are failures, we can restart those components individually by calling jobmanager.sh/ jobmanager.sh. This works great B

Re: How to restart/recover on reboot?

2019-06-18 Thread Till Rohrmann
I guess it should work if you installed a systemd service which simply calls `jobmanager.sh start` or `taskmanager.sh start`. Cheers, Till On Tue, Jun 18, 2019 at 4:29 PM John Smith wrote: > Yes, that is understood. But I don't see why we cannot call jobmanager.sh > and taskmanager.sh to build

Re: Need for user class path accessibility on all nodes

2019-06-18 Thread Abdul Qadeer
Thanks Biao/Till, that answers my question. On Tue, 18 Jun 2019 at 01:41, Till Rohrmann wrote: > Hi Abdul, > > as Biao said the `--classpath` option should only be used if you want to > make dependencies available which are not included in the submitted user > code jar. E.g. if you have install

Re: Role of Job Manager

2019-06-18 Thread Eduardo Winpenny Tejedor
Hi Pankaj, I have no experience with Hadoop but from the book I gathered there's one Job Manager per application i.e. per jar (as in the example in the first chapter). This is not to say there's one Job Manager per job. Actually I don't think the word Job is defined in the book, I've seen Task def

Flink error handling

2019-06-18 Thread Steven Nelson
Hello! We are internally having a debate on how best to handle exceptions within our operators. Some advocate for wrapping maps/flatMaps inside a processfunction and sending the error to a side output. Other options are returning a custom Either that gets filtered and mapped into different si

Side output in ProcessFunction.onTimer

2019-06-18 Thread Frank Wilson
Hi, Is there a way to make side outputs in an onTimer callback in ProcessFunction? I want to side output events that belong to a session that was below the minimum duration threshold. Currently these events are just discarded but I’d like more traceability. Thanks, Frank

Re: How to restart/recover on reboot?

2019-06-18 Thread John Smith
Yes, that is understood. But I don't see why we cannot call jobmanager.sh and taskmanager.sh to build the cluster and have them run as systemd units. I looked at start-cluster.sh and all it does is SSH and call jobmanager.sh which then cascades to taskmanager.sh I just have to pin point what's mis

Re: Calculate a 4 hour time window for sum,count with sum and count output every 5 mins

2019-06-18 Thread Felipe Gutierrez
Hi Vijay, I managed by using "ctx.timerService().registerProcessingTimeTimer(timeoutTime);" on the processElement method and clearing the state on the onTimer method. This is my program [1]. [1] https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/st

Re: Why the current time of my window never reaches the end time when I use KeyedProcessFunction?

2019-06-18 Thread Felipe Gutierrez
I achieved some enhancement based on [1]. My code is here [2]. Basically I am using "ctx.timerService().registerProcessingTimeTimer(timeoutTime);" inside the processElement method to trigger the onTimer method. And when the onTimer method is triggered I clean the state using "hllStateTwitter.clear(

Re: Why the current time of my window never reaches the end time when I use KeyedProcessFunction?

2019-06-18 Thread Felipe Gutierrez
I am sorry, I wanted to point this reference https://stackoverflow.com/a/47071833/2096986 which implements a window on a ProcessFunction in Flink. *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com * On Tue

Re: Need for user class path accessibility on all nodes

2019-06-18 Thread Till Rohrmann
Hi Abdul, as Biao said the `--classpath` option should only be used if you want to make dependencies available which are not included in the submitted user code jar. E.g. if you have installed a large library which is too costly to ship every time you submit a job. Usually, you would not need to s

Re: How to restart/recover on reboot?

2019-06-18 Thread Till Rohrmann
When a single machine fails you should rather call `taskmanager.sh start`/`jobmanager.sh start` to start a single process. `start-cluster.sh` will start multiple processes on different machines. Cheers, Till On Mon, Jun 17, 2019 at 4:30 PM John Smith wrote: > Well some reasons, machine reboots/

Re: Checkpointing & File stream with

2019-06-18 Thread Sung Gon Yi
It works well now with following codes: —— TextInputFormat specFileFormat = new TextInputFormat(new Path(specFile)); specFileFormat.setFilesFilter(FilePathFilter.createDefaultFilter()); DataStream specificationFileStream = env .readFile(specFileFormat, specFile, FileProcessingMode.PROCESS_

Role of Job Manager

2019-06-18 Thread Pankaj Chand
I am trying to understand the role of Job Manager in Flink, and have come across two possibly distinct interpretations. 1. The online documentation v1.8 signifies that there is at least one Job Manager in a cluster, and it is closely tied to the cluster of machines, by managing all jobs in that cl

Re: Has Flink a kafka processing location strategy?

2019-06-18 Thread Konstantin Knauf
Hi Theo, no, sorry, the Kafka partitions that each subtask is assigned to is only determined by the index of the subtask. Best, Konstantin On Mon, Jun 17, 2019 at 2:57 PM Theo Diefenthal < theo.diefent...@scoop-software.de> wrote: > Hi, > > > > We have a Hadoop/YARN Cluster with Kafka and Fl

Re: Why the current time of my window never reaches the end time when I use KeyedProcessFunction?

2019-06-18 Thread Piotr Nowojski
Hi, Isn’t your problem that the source is constantly emitting the data and bumping your timers? Keep in mind that the code that you are basing on has the following characteristic: > In the following example a KeyedProcessFunction maintains counts per key, and > emits a key/count pair whenever