Re: Package multiple jobs in a single jar

2015-05-22 Thread Matthias J. Sax
Makes sense to me. :) One more thing: What about extending the "ProgramDescription" interface to have multiple methods as Flavio suggested (with the config(...) method that should be handle by the ParameterTool) > public interface FlinkJob { > > /** The name to display in the job submission UI o

Re: Package multiple jobs in a single jar

2015-05-22 Thread Robert Metzger
Thank you for working on this. My responses are inline below: (Flavio) > My suggestion is to create a specific Flink interface to get also > description of a job and standardize parameter passing. I've recently merged the ParameterTool which is solving the "standardize parameter passing" proble

Re: Package multiple jobs in a single jar

2015-05-22 Thread Matthias J. Sax
Hi, two more thoughts to this discussion: 1) looking at the commit history of "CliFrontend", I found the following closed issue and the closing pull request * https://issues.apache.org/jira/browse/FLINK-1095 * https://github.com/apache/flink/pull/238 It stand in opposite of Flavio's requ

[jira] [Created] (FLINK-2086) how main difference betwwen Hadoop and Apache Flink

2015-05-22 Thread hagersaleh (JIRA)
hagersaleh created FLINK-2086: - Summary: how main difference betwwen Hadoop and Apache Flink Key: FLINK-2086 URL: https://issues.apache.org/jira/browse/FLINK-2086 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-2085) Add an option to the MemoryManager to allocate memory as needed, rather than preallocating it

2015-05-22 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2085: --- Summary: Add an option to the MemoryManager to allocate memory as needed, rather than preallocating it Key: FLINK-2085 URL: https://issues.apache.org/jira/browse/FLINK-2085

[jira] [Created] (FLINK-2084) Create a dedicated streaming mode

2015-05-22 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2084: --- Summary: Create a dedicated streaming mode Key: FLINK-2084 URL: https://issues.apache.org/jira/browse/FLINK-2084 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-2083) Ensure high quality docs for FlinkML in 0.9

2015-05-22 Thread Theodore Vasiloudis (JIRA)
Theodore Vasiloudis created FLINK-2083: -- Summary: Ensure high quality docs for FlinkML in 0.9 Key: FLINK-2083 URL: https://issues.apache.org/jira/browse/FLINK-2083 Project: Flink Issue T

[jira] [Created] (FLINK-2082) Chained stream tasks share the same RuntimeEnvironment

2015-05-22 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2082: - Summary: Chained stream tasks share the same RuntimeEnvironment Key: FLINK-2082 URL: https://issues.apache.org/jira/browse/FLINK-2082 Project: Flink Issue Type: Bu

Re: [DISCUSS] Dedicated streaming mode

2015-05-22 Thread Stephan Ewen
Aljoscha is right. There are plans to migrate the streaming state to the MemoryManager as well, but streaming state is not managed at this point. What is managed in streaming jobs is the data buffered and cached in the network stack. But that is a different memory pool than the memory manager. We

[jira] [Created] (FLINK-2081) Change order of restore state and open for Streaming Operators

2015-05-22 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2081: --- Summary: Change order of restore state and open for Streaming Operators Key: FLINK-2081 URL: https://issues.apache.org/jira/browse/FLINK-2081 Project: Flink

Re: [DISCUSS] Dedicated streaming mode

2015-05-22 Thread Aljoscha Krettek
Hi, streaming currently does not use any memory manager. All state is kept in Java Objects on the Java Heap, for example an ArrayList<> for the window buffer. On Thu, May 21, 2015 at 11:56 PM, Henry Saputra wrote: > Hi Stephan, Gyula, Paris, > > How does streaming currently different in term of m

Re: Package multiple jobs in a single jar

2015-05-22 Thread Matthias J. Sax
Thanks for your feedback. I agree on the main method "problem". For scanning and listing all stuff that is found it's fine. The tricky question is the automatic invocation mechanism, if "-c" flag is not used, and no manifest program-class or Main-Class entry is found. If multiple classes impleme

[jira] [Created] (FLINK-2080) Execute Flink with sbt

2015-05-22 Thread Christian Wuertz (JIRA)
Christian Wuertz created FLINK-2080: --- Summary: Execute Flink with sbt Key: FLINK-2080 URL: https://issues.apache.org/jira/browse/FLINK-2080 Project: Flink Issue Type: Improvement

Re: difference between reducefunction and GroupReduceFunction

2015-05-22 Thread Stephan Ewen
Performance-wise, a "GroupReduceFunction" with Combiner should right not be slightly faster than the ReduceFunction, but not much. Long term, the ReduceFunction may become faster, because it will use hash aggregation under the hood. On Fri, May 22, 2015 at 11:58 AM, santosh_rajaguru wrote: > T

Re: difference between reducefunction and GroupReduceFunction

2015-05-22 Thread santosh_rajaguru
Thanks Maximilian. My use case is similar to the example given in the graph analysis. In graph analysis, the reduce function used is a normal reduce function. I executed that with both scenarios and your justification is right. the normal reduce function have a combiner before sorting unlike the G

[jira] [Created] (FLINK-2079) Add watcher to YARN TM containers to detect stopped actor system

2015-05-22 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2079: - Summary: Add watcher to YARN TM containers to detect stopped actor system Key: FLINK-2079 URL: https://issues.apache.org/jira/browse/FLINK-2079 Project: Flink

Changed the behavior of "DataSet.print()"

2015-05-22 Thread Stephan Ewen
Hi all! Me merged a patch yesterday that changed the API behavior of the "DataSet.print()" function. "print()" now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that pri

[jira] [Created] (FLINK-2078) Document type registration at the ExecutionEnvironment

2015-05-22 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2078: Summary: Document type registration at the ExecutionEnvironment Key: FLINK-2078 URL: https://issues.apache.org/jira/browse/FLINK-2078 Project: Flink Issue Ty

Re: difference between reducefunction and GroupReduceFunction

2015-05-22 Thread Maximilian Michels
Pardon, what I said is not completely right. Both functions are incrementally constructed. This seems obvious for the reduce function but is also true for the GroupReduce because it receives the values as an Iterable which, under the hood, can be constructed incrementally as well. One other differ

Re: difference between reducefunction and GroupReduceFunction

2015-05-22 Thread Maximilian Michels
Like you said, it depends on the use case. The GroupReduceFunction is a generalization of the traditional reduce. Thus, it is more powerful. However, it is also executed differently; a GroupReduceFunction requires the whole group to be materialized and passed at once. If your program doesn't requir

Re: Package multiple jobs in a single jar

2015-05-22 Thread Maximilian Michels
Hi Matthias, Thank you for taking the time to analyze Flink's invocation behavior. I like your proposal. I'm not sure whether it is a good idea to scan the entire JAR for main methods. Sometimes, main methods are added solely for testing purposes and don't really serve any practical use. However,

Re: question please

2015-05-22 Thread Chiwan Park
Hi. Hadoop is a framework for reliable, scalable, distributed computing. So, there are many components for this purpose such as HDFS, YARN and Hadoop MapReduce. Flink is an alternative to Hadoop MapReduce component. It has also some tools to make map-reduce program and extends it to support man