Re: Task Slots and Heterogeneous Tasks

2016-04-15 Thread Maxim
I see. Sharing slots among subtasks makes sense. So by default a subtask that executes a map function that calls a high latency external service is going to be put in a separate slot. Is it possible to indicate to the Flink that subtasks of a particular operation can be collocated in a slot, as su

Re: streaming join implementation

2016-04-15 Thread Aljoscha Krettek
I'll try and answer both questions. Regarding Henry's question about very large state and caching: this depends on the StateBackend. The FsStateBackend has to keep all state on the JVM heap in hash-maps. If you have the appropriate number of machines which large memory then this could still work.

Re: Task Slots and Heterogeneous Tasks

2016-04-15 Thread Till Rohrmann
Hi Maxim, concerning your second part of the question: The managed memory of a TaskManager is first split among the available slots. Each slot portion of the managed memory is again split among all operators which require managed memory when a pipeline is executed. In contrast to that, the heap me

Re: OOME PermGen in URLClassLoader

2016-04-15 Thread Michael Pisula
Hi guys, The problems seems to be caused by a known bug in the Oracle JDBC driver (see here: http://jrfom.com/2015archive/2014/01/08/fu-ojdbc, might be NSFW). We found a workaround that seems to solve our particular problem for the time being. We simply uploaded the oracle jdbc driver jar to flin

Re: OOME PermGen in URLClassLoader

2016-04-15 Thread Stephan Ewen
Hi! One thing you could try and do is create a dump of the JVM when it crashes, and have a look at all the classes it has loaded. For these long-running sessions (that share JVMs across jobs) it is important that classes are properly unloaded. If someone keeps holding references to the classes (e

Re: Accessing StateBackend snapshots outside of Flink

2016-04-15 Thread Stephan Ewen
One thing to add is that you can always trigger a persistent checkpoint via the "savepoints" feature: https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/savepoints.html On Fri, Apr 15, 2016 at 10:24 AM, Aljoscha Krettek wrote: > Hi, > for RocksDB we simply use a TypeSer

Re: Monitoring and alerting mechanisms for Flink on YARN

2016-04-15 Thread Stephan Ewen
There is also quite an ongoing effort to create and expose more Metrics via JMX. Parts of that is in the JIRA below, but there will be an additional proposal and design pubshished in the next days. https://issues.apache.org/jira/browse/FLINK-1502 On Fri, Apr 15, 2016 at 11:04 AM, Flavio Pompermai

Re: Task Slots and Heterogeneous Tasks

2016-04-15 Thread Stephan Ewen
Hi! Slots are usually shared between the heavy and non heavy tasks, for that reason. Have a look at these resources: https://ci.apache.org/projects/flink/flink-docs-master/concepts/concepts.html#workers-slots-resources Let us know if you have more questions! Greetings, Stephan On Fri, Apr 15,

Re: Monitoring and alerting mechanisms for Flink on YARN

2016-04-15 Thread Flavio Pompermaier
Very interesting! Could you please provide more details about its usage in your deployment? Thanks, Flavio On Thu, Apr 14, 2016 at 11:25 PM, Christian Kreutzfeldt wrote: > Hi Soumya, > > we are using a StatsD / Graphite setup to extract metrics from our running > Flink applications. At least fo

Re: Accessing StateBackend snapshots outside of Flink

2016-04-15 Thread Aljoscha Krettek
Hi, for RocksDB we simply use a TypeSerializer to serialize the key and value to a byte[] array and store that in RocksDB. For a ListState, we serialize the individual elements using a TypeSerializer and store them in a comma-separated list in RocksDB. The snapshots of RocksDB that we write to HDFS

Re: OOME PermGen in URLClassLoader

2016-04-15 Thread Balaji Rajagopalan
Not a solution for your problem,but an alternative, I wrote my own sink function where I handle all sql activities(insert/update/select), used a 3rd lib for connection pooling, the code has been running stable in production without any issue. On Fri, Apr 15, 2016 at 1:41 PM, Maximilian Bode < maxi

Re: WindowOperator initial watermark

2016-04-15 Thread Aljoscha Krettek
Yes, you are right, that's a bit inconsistent. I'll push a commit to fix it. On Thu, 14 Apr 2016 at 23:45 Michael Radford wrote: > This is a really minor issue, but it confused me during testing. > > The WindowOperator initial watermark is -1: > > > https://github.com/apache/flink/blob/master/fl

OOME PermGen in URLClassLoader

2016-04-15 Thread Maximilian Bode
Hi everyone, we are testing a long-running streaming application, which shares a yarn session with a batch job (containing JDBC(In|Out)putFormat) that is triggered periodically. Unfortunately, the session is dying after a few runs of the batch job. In fact, each run of the batch job kills one t