Hundreds of parallel jobs on Flink Cluster

2018-06-07 Thread Chirag Dewan
Hi, I am coming across a use case where I may have to run more than100 parallel jobs(which may have different processing needs) on a Flink cluster.  My flink cluster, currently, has 1 Job Manager and 4/5 Task Managers depending on the processing needed is running on a Kubernetes cluster with 3 wo

Having a backoff while experiencing checkpointing failures

2018-06-07 Thread vipul singh
Hello all, Are there any recommendations on using a backoff when experiencing checkpointing failures? What we have seen is when a checkpoint starts to expire, the next checkpoint dosent care about the previous failure, and starts soon after. We experimented with *min_pause_between_checkpoints*, ho

Stopping of a streaming job empties state store on HDFS

2018-06-07 Thread Peter Zende
Hi all, We have a streaming pipeline (Flink 1.4.2) for which we implemented stoppable sources to be able to gracefully exit from the job with Yarn state "finished/succeeded". This works fine, however after creating a savepoint, stopping the job (stop event) and restarting it we remarked that the

Re: IoT Use Case, Problem and Thoughts

2018-06-07 Thread Fabian Hueske
Hi Ashish, Thanks for the great write up. If I understood you correctly, there are two different issues that are caused by the disabled checkpointing. 1) Recovery from a failure without restarting all operators to preserve the state in the running tasks 2) Planned restarts an application without

Re: Conceptual question

2018-06-07 Thread Tony Wei
Hi Piotrek, So my question is: is that feasible to migrate state from `ProcessFunction` to my own operator then use `getKeyedStateBackend()` to migrate the states? If yes, is there anything I need to be careful with? If no, why and can it be available in the future? Thank you. Best Regards, Tony

Re: Conceptual question

2018-06-07 Thread Piotr Nowojski
Hi, Oh, I see now. Yes indeed getKeyedStateBackened() is not exposed to the function and you can not migrate your state that way. As far as I know yes, at the moment in order to convert everything at once (without getKeyes you still can implement lazy conversion) you would have to write your o

Re: Conceptual question

2018-06-07 Thread Tony Wei
Hi Piotrek, I used `ProcessFunction` to implement it, but it seems that I can't call `getKeyedStateBackend()` like `WindowOperator` did. I found that `getKeyedStateBackend()` is the method in `AbstractStreamOperator` and `ProcessFunction` API didn't extend it. Dose that mean I can't look up all ke

Re: Conceptual question

2018-06-07 Thread Piotr Nowojski
What function are you implementing and how are you using it? Usually it’s enough if your function implements RichFunction (or rather extend from AbstractRichFunction) and then you could use RichFunction#open in the similar manner as in the code that I posted in previous message. Flink in many p

[flink-connector-filesystem] OutOfMemory in checkpointless environment

2018-06-07 Thread Rinat
Hi mates, we got some Flink jobs, that are writing data from kafka into hdfs, using Bucketing-Sink. For some reasons, those jobs are running without checkpointing. For now, it not a big problem for us, if some files are remained opened in case of job reloading. Periodically, those jobs fail wit

Re: Conceptual question

2018-06-07 Thread Tony Wei
Hi Piotrek, It seems that this was implemented by `Operator` API, which is a more low level api compared to `Function` API. Since in `Function` API level we can only migrate state by event triggered, it is more convenient in this way to migrate state by foreach all keys in `open()` method. If I wa

Re: Conceptual question

2018-06-07 Thread Piotr Nowojski
Hi, General solution for state/schema migration is under development and it might be released with Flink 1.6.0. Before that, you need to manually handle the state migration in your operator’s open method. Lets assume that your OperatorV1 has a state field “stateV1”. Your OperatorV2 defines fie

Re: Extending stream events with a an aggregate value

2018-06-07 Thread Piotr Nowojski
Hi, Ńo worries :) You probably need to write your own process function to do exactly that, maybe something like this: DataStream> test; DataStream> max = test.keyBy(0) .process(new KeyedProcessFunction, Tuple3>() { public ValueState max; @Override public void

Re: Question about JVM exit caused by timeout exception with the asynchronous IO of flink 1.4.2

2018-06-07 Thread Piotr Nowojski
Hi, You can increase a timeout, that’s one way to tackle it. In Flink 1.6.0 there will be possibility to override default Flink’s behaviour regarding handling timeouts: https://issues.apache.org/jira/browse/FLINK-7789 to handle them, instead o