Re: Optimal Configuration for Cluster

2016-02-22 Thread Ufuk Celebi
The new default is equivalent to the previous "streaming mode". The community decided to get rid of this distinction, because it was confusing to users. The difference between "streaming mode" and "batch mode" was how Flink's managed memory was allocated, either lazily when required ('streaming mo

Re: Optimal Configuration for Cluster

2016-02-22 Thread Welly Tambunan
Hi Fabian, Previously when using flink 0.9-0.10 we start the cluster with streaming mode or batch mode. I see that this one is gone on Flink 1.00 snapshot ? So this one has already taken care of the flink and optimize by runtime > On Mon, Feb 22, 2016 at 5:26 PM, Fabian Hueske wrote: > Hi Welly

Re: Optimal Configuration for Cluster

2016-02-22 Thread Welly Tambunan
Hi Fabian, Thanks a lot for your response. - How many task managers do you start? I assume more than one TM per machine given that you assign only 4GB of memory out of 128GB to each TM. Currently what we have done is start a 1 TM per machine with number of task slot 48. - What is the maximum pa

Re: Compile issues with Flink 1.0-SNAPSHOT and Scala 2.11

2016-02-22 Thread Dan Kee
Hello, I'm not sure if this related, but we recently started seeing this when using `1.0-SNAPSHOT` in the `snapshots` repository: [error] Modules were resolved with conflicting cross-version suffixes in {file:/home/ubuntu/bt/}flinkproject: [error]org.apache.kafka:kafka _2.10, _2.11 java.lang.

Re: Flink packaging makes life hard for SBT fat jar's

2016-02-22 Thread Till Rohrmann
Hi Shikhar, you're right that including a connector dependency would have let us spot the problem earlier. In fact, any project building a fat jar with SBT would have failed without setting the flink dependencies to provided. The problem is that the template is a general purpose template. Thus, i

Re: Flink packaging makes life hard for SBT fat jar's

2016-02-22 Thread shikhar
Hi Till, Thanks so much for sorting this out! One suggestion, can the Flink template depend on a connector (Kafka?) -- this would verify that assembly works smoothly for a very common use-case when you need to include connector JAR's. Cheers, Shikhar -- View this message in context: http://

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Till Rohrmann
At the moment, the system can only deal with lost slots (nodes) if either there are some excess slots which have not been used before or if the died node is restarted. The latter is the case for yarn applications, for example. There the application master will restart containers which have died. I

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Ovidiu-Cristian MARCU
Thank you, Till! The current (in progress) implementation is considering also the problem related to losing the task's slots of the failed node(s), something related to [2] ? [2] https://issues.apache.org/jira/browse/FLINK-3047 Best, Ovidiu > On 22 Feb 2016, at 18:13, Till Rohrmann wrote: >

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Till Rohrmann
Hi Ovidiu, at the moment Flink's batch fault tolerance restarts the whole job in case of a failure. However, parts of the logic to do partial backtracking such as intermediate result partitions and the backtracking algorithm are already implemented or exist as a PR [1]. So we hope to complete the

Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Ovidiu-Cristian MARCU
Hi In case of failure of a node what does it mean 'Fault tolerance for programs in the DataSet API works by retrying failed executions’ [1] ? -work already done by the rest of the nodes is not lost, only work of the lost node is recomputed, job execution will continue or -entire job execution is

Re: Flink packaging makes life hard for SBT fat jar's

2016-02-22 Thread Till Rohrmann
Hi Shikhar, I just wanted to let you know that we've found the problem with the failing assembly plugin. It was caused by incompatible classes [1, 2]. Once these PRs are merged, the merge problems should be resolved. By the way, we've also added now a SBT template for Flink projects using giter8

Re: Optimal Configuration for Cluster

2016-02-22 Thread Fabian Hueske
Hi Welly, I have to correct the formula I posted before: taskmanager.network.numberOfBuffers: p ^ 2 * t * 4 p is NOT the parallelism of the job, BUT the number of slots of a task manager. So if you configure one TM for each machine with 48 slots, you get: 48^2 * 16 * 4 = 147.456 buffers, with 3

Re: Optimal Configuration for Cluster

2016-02-22 Thread Fabian Hueske
Hi Welly, sorry for the late response. The number of network buffers primarily depends on the maximum parallelism of your job. The given formula assumes a specific cluster configuration (1 task manager per machine, one parallel task per CPU). The formula can be translated to: taskmanager.network

Re: Use jvm to run flink on single-node machine with many cores

2016-02-22 Thread Ufuk Celebi
Note that the method to call in the example should be `conf.setInteger` and the second argument not a String but an int. On Sun, Feb 21, 2016 at 1:41 PM, Márton Balassi wrote: > Dear Ana, > > If you are using a single machine with multiple cores, but need convenient > access to the configuration

RE: events eviction

2016-02-22 Thread Radu Tudoran
Hi, Following up on the example before: you have to aggregate the data from 2 partitions (e.g. let's say in one you count objects of type 1 and in the other of type 2). Then you need to pair them together and emit that at iterations N you had: (object T1 - Y, object T2 - X) and finally evict th

Re: state.backend.fs.checkpointdir setting

2016-02-22 Thread Andrew Ge Wu
Hi Robert I just checked my settings in Task Managers (they were configured separately), they are misconfigured. My job now runs correctly, after reconfigured them. Thanks! Andrew > On 22 Feb 2016, at 09:41, Robert Metzger wrote: > > Hi, > > how is your cluster setup? Do you have multiple ma

Re: Flink HA

2016-02-22 Thread Robert Metzger
Hi Thomas, To avoid having jobs forever restarting, you have to cancel them manually (from the web interface or the /bin/flink client). Also, you can set an appropriate restart strategy (in 1.0-SNAPSHOT), which limits the number of retries. This way the retrying will eventually stop. On Fri, Feb

Re: state.backend.fs.checkpointdir setting

2016-02-22 Thread Robert Metzger
Hi, how is your cluster setup? Do you have multiple machines, or only one? Did you copy the configuration to all machines? On Fri, Feb 19, 2016 at 6:08 PM, Andrew Ge Wu wrote: > Hi All, > > I have been experiencing an error stopping my HA standalone setup. > > The cluster startup just fine, b