Fault Tolerance for Flink Iterations

2015-04-21 Thread Markus Holzemer
Hi everybody, I am writing my master thesis about making flink iterations / iterative flink algorithms fault tolerant. The first approach I implemented is a basic checkpointing, where every N iterations the current state is saved into HDFS. To do this I enabled data sinks inside of iterations, the

Re: Periodic full stream aggregations

2015-04-21 Thread Bruno Cadonna
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Gyula, I have a question regarding your suggestion. Can the current continuous aggregation be also specified with your proposed periodic aggregation? I am thinking about something like dataStream.reduce(...).every(Count.of(1)) Cheers, Bruno On

Re: Periodic full stream aggregations

2015-04-21 Thread Gyula Fóra
Hi Bruno, Of course you can do that as well. (That's the good part :p ) I will open a PR soon with the proposed changes (first without breaking the current Api) and I will post it here. Cheers, Gyula On Tuesday, April 21, 2015, Bruno Cadonna wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash:

Re: Periodic full stream aggregations

2015-04-21 Thread Fabian Hueske
Is it possible to switch the order of the statements, i.e., dataStream.every(Time.of(4,sec)).reduce(...) instead of dataStream.reduce(...).every(Time.of(4,sec)) I think that would be more consistent with the structure of the remaining API. Cheers, Fabian 2015-04-21 10:57 GMT+02:00 Gyula Fóra :

Re: About Operator and OperatorBase

2015-04-21 Thread Stephan Ewen
Originally, we had multiple Apis with different data models: the current Java API, the record api, a JSON API. The common API was the data model agnostic set of operators on which they built. It has become redundant when we saw how well things can be built in top of the Java API, using the TypeInf

[jira] [Created] (FLINK-1916) EOFException when running delta-iteration job

2015-04-21 Thread Stefan Bunk (JIRA)
Stefan Bunk created FLINK-1916: -- Summary: EOFException when running delta-iteration job Key: FLINK-1916 URL: https://issues.apache.org/jira/browse/FLINK-1916 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-1917) EOFException when running delta-iteration job

2015-04-21 Thread Stefan Bunk (JIRA)
Stefan Bunk created FLINK-1917: -- Summary: EOFException when running delta-iteration job Key: FLINK-1917 URL: https://issues.apache.org/jira/browse/FLINK-1917 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-1918) NullPointerException at org.apache.flink.client.program.Client's constructor while using ExecutionEnvironment.createRemoteEnvironment

2015-04-21 Thread JIRA
Zoltán Zvara created FLINK-1918: --- Summary: NullPointerException at org.apache.flink.client.program.Client's constructor while using ExecutionEnvironment.createRemoteEnvironment Key: FLINK-1918 URL: https://issues.a

Re: Periodic full stream aggregations

2015-04-21 Thread Gyula Fóra
Thats a good idea, I will modify my PR to that :) Gyula On Tue, Apr 21, 2015 at 12:09 PM, Fabian Hueske wrote: > Is it possible to switch the order of the statements, i.e., > > dataStream.every(Time.of(4,sec)).reduce(...) instead of > dataStream.reduce(...).every(Time.of(4,sec)) > > I think tha

Akka transparency and serialisation

2015-04-21 Thread Paris Carbone
Hello everyone, Many of you are already aware of this but it is good to make it clear in the mailist. We bumped into this "special" case with Akka several times already and it is important to know where transparency actually breaks. In short, Akka serialises only messages that get transferred

Re: Akka transparency and serialisation

2015-04-21 Thread Stephan Ewen
Good point to raise Paris. Here are the practices I (and others) have been using, they work well 1) Do not assume serialization, that is true. If you need to make sure that the instance of the data is not shared after the message, send a manually serialized version. The "InstantiationUtil" has me

Re: Periodic full stream aggregations

2015-04-21 Thread Gyula Fóra
I have opened a PR for this feature: https://github.com/apache/flink/pull/614 Cheers, Gyula On Tue, Apr 21, 2015 at 1:10 PM, Gyula Fóra wrote: > Thats a good idea, I will modify my PR to that :) > > Gyula > > On Tue, Apr 21, 2015 at 12:09 PM, Fabian Hueske wrote: > >> Is it possible to switch

Re: Periodic full stream aggregations

2015-04-21 Thread Bruno Cadonna
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Gyula, I read your comments of your PR. I have a question to this comment: "It only allows aggregations so we dont need to keep the full history in a buffer." What if the user implements an aggregation function like a median? For a median you n

Re: Periodic full stream aggregations

2015-04-21 Thread Gyula Fóra
You are right, but you should never try to compute full stream median, thats the point :D On Tue, Apr 21, 2015 at 2:52 PM, Bruno Cadonna < cado...@informatik.hu-berlin.de> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi Gyula, > > I read your comments of your PR. > > I have a ques

[jira] [Created] (FLINK-1919) Add HCatOutputFormat for Tuple data types

2015-04-21 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-1919: Summary: Add HCatOutputFormat for Tuple data types Key: FLINK-1919 URL: https://issues.apache.org/jira/browse/FLINK-1919 Project: Flink Issue Type: New Featu

Re: About Operator and OperatorBase

2015-04-21 Thread Henry Saputra
Thanks for the explanation, Stephan. I always wonder why the extra common APIs exist. Then I think this should be high priority if we want to remove the common API to reduce the unnecessary layer and "dead code". As Ufuk mentioned before, better doing it now before more stuff build on top of Flink

Re: Fault Tolerance for Flink Iterations

2015-04-21 Thread Stephan Ewen
Hi Markus! I see your point. My first guess would be that it would be simpler to do this logic in the driver program, rather than inside the JobManager. If the checkpoints are all written and the job fails, you check what was the latest completed checkpoint (by file) and then start the program aga

Re: Periodic full stream aggregations

2015-04-21 Thread Bruno Cadonna
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Gyula, fair enough! I used a bad example. What I really wanted to know is whether your code supports only aggregation like sum, min, and max where you need to pass only a value to the next aggregation or also more complex data structures, e.g., a

Re: Periodic full stream aggregations

2015-04-21 Thread Gyula Fóra
Hey, The current code supports 2 types of aggregations, simple binary reduce: T,T=>T and also the grouped version for this, where the reduce function is applied per a user defined key (so there we keep a map of reduced values). This can already be used to implement fairly complex logic if we trans