Is there Any api that let DataStream join DataSet ?

2015-06-27 Thread 马国维
Hi,everyone: Is there Any api that let the DataStream join a DataSet ? I have read all the document But I can't find . If Flink now does not have the api, will Flink support it in the future ? thanks a lot!

Re: Possible Broken Master

2015-06-27 Thread Matthias J. Sax
Glad to hear. I read "flink-storm-compatibility-core" and was alarmed. ;) On 06/27/2015 07:50 PM, Aljoscha Krettek wrote: > Nevermind, removal of my local maven repository solved the problem. Sorry > for the inconvenience. > > On Sat, 27 Jun 2015 at 19:22 Márton Balassi > wrote: > >> Interestin

Re: Thoughts About Streaming

2015-06-27 Thread Matthias J. Sax
Yes. But as I said, you can get the same behavior with a GroupedDataStream using a tumbling 1-tuple-size window. Thus, there is no conceptual advantage in using KeyedDataStream and no disadvantage in binding stateful operations to GroupedDataStreams. On 06/27/2015 06:54 PM, Márton Balassi wrote: >

Re: Possible Broken Master

2015-06-27 Thread Aljoscha Krettek
Nevermind, removal of my local maven repository solved the problem. Sorry for the inconvenience. On Sat, 27 Jun 2015 at 19:22 Márton Balassi wrote: > Interesting, it does not appear on travis or my local machine, but both run > linux. (Ubuntu 14.10, Java 8, mvn 3.0.5 in the latter case) > > On p

[jira] [Created] (FLINK-2283) Make grouped reduce/fold/aggregations stateful using Partitioned state

2015-06-27 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2283: - Summary: Make grouped reduce/fold/aggregations stateful using Partitioned state Key: FLINK-2283 URL: https://issues.apache.org/jira/browse/FLINK-2283 Project: Flink

[jira] [Created] (FLINK-2282) Deprecate non-grouped stream reduce/fold/aggregations for 0.9.1

2015-06-27 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2282: - Summary: Deprecate non-grouped stream reduce/fold/aggregations for 0.9.1 Key: FLINK-2282 URL: https://issues.apache.org/jira/browse/FLINK-2282 Project: Flink Issu

Re: Possible Broken Master

2015-06-27 Thread Márton Balassi
Interesting, it does not appear on travis or my local machine, but both run linux. (Ubuntu 14.10, Java 8, mvn 3.0.5 in the latter case) On paper the remote-resources plugin is only used for the Eclipse integration and should not even effect the maven build itself, at least that the comment says in

Re: Thoughts About Streaming

2015-06-27 Thread Márton Balassi
@Matthias: Your point of working with a minimal number of clear concepts is desirable to say the least. :) The reasoning behind the KeyedDatastream is to associate Flink persisted operator state with the keys of the data that produced it, so that stateful computation becomes scalabe in the future.

Possible Broken Master

2015-06-27 Thread Aljoscha Krettek
Hi, anyone else seeing this: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on project flink-storm-compatibility-core: Execution default of goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process failed: For artifact {nul

Re: Student looking to contribute to Stratosphere

2015-06-27 Thread Chiwan Park
Hi, You can choose any unassigned issue about Flink Machine Learning Library (flink-ml) in JIRA. [1] There are some issues for starter in flink-ml such as FLINK-1737 [2], FLINK-1748 [3], FLINK-1994 [4]. First, It would be better to read some articles about contributing to Flink. [5][6] And if y

Student looking to contribute to Stratosphere

2015-06-27 Thread Rohit Shinde
Hello everyone, I came across Stratosphere while looking for GSOC organisations working in Machine Learning. I got to know that it had become Apache Flink. I am interested in this project: https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machi

Monitoring a Flink Job

2015-06-27 Thread Andra Lungu
Hey guys, Me again :) So now that my wonderful job finishes, I would like to monitor it a bit (i.e. build some charts on the number of messages per vertex, compute the total amount of time elapsed per computation per vertex, etc). The main computational-intensive operation is a coGroup. There, wi

Re: Thoughts About Streaming

2015-06-27 Thread Matthias J. Sax
This was more a conceptual point-of-view argument. From an implementation point of view, skipping the window building step is a good idea if a tumbling 1-tuple-size window is detected. I prefer to work with a minimum number of concepts (and apply internal optimization if possible) instead of using

Re: Thoughts About Streaming

2015-06-27 Thread Aljoscha Krettek
What do you mean by Comment 2? Using the whole window apparatus if you just want to have, for example, a simple partitioned filter with partitioned state seems a bit extravagant. On Sat, 27 Jun 2015 at 15:19 Matthias J. Sax wrote: > Nice starting point. > > Comment 1: > "Each individual stream p

Re: Thoughts About Streaming

2015-06-27 Thread Matthias J. Sax
Nice starting point. Comment 1: "Each individual stream partition delivers elements strictly in order." (in 'Parallel Streams, Partitions, Time, and Ordering') I would say "FIFO" and not "strictly in order". If data is not emitted in-order, the stream partition will not be in-order either. Comme

Re: [VOTE] Release additional convenience binaries for Flink 0.9.0

2015-06-27 Thread Aljoscha Krettek
+1 - tested local-mode and cluster-mode for hadoop 2.7 with https://github.com/aljoscha/FliRTT (by the way, this runs with built-in data and external data) On Fri, 26 Jun 2015 at 12:45 Robert Metzger wrote: > +1 > > - Ran a 100 GB wordcount on a Hadoop/YARN 2.4.0 installation. So both YARN > an