Hi everybody,
I am writing my master thesis about making flink iterations / iterative
flink algorithms fault tolerant.
The first approach I implemented is a basic checkpointing, where every N
iterations the current state is saved into HDFS.
To do this I enabled data sinks inside of iterations, the
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi Gyula,
I have a question regarding your suggestion.
Can the current continuous aggregation be also specified with your
proposed periodic aggregation?
I am thinking about something like
dataStream.reduce(...).every(Count.of(1))
Cheers,
Bruno
On
Hi Bruno,
Of course you can do that as well. (That's the good part :p )
I will open a PR soon with the proposed changes (first without breaking the
current Api) and I will post it here.
Cheers,
Gyula
On Tuesday, April 21, 2015, Bruno Cadonna
wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash:
Is it possible to switch the order of the statements, i.e.,
dataStream.every(Time.of(4,sec)).reduce(...) instead of
dataStream.reduce(...).every(Time.of(4,sec))
I think that would be more consistent with the structure of the remaining
API.
Cheers, Fabian
2015-04-21 10:57 GMT+02:00 Gyula Fóra :
Originally, we had multiple Apis with different data models: the current
Java API, the record api, a JSON API. The common API was the data model
agnostic set of operators on which they built.
It has become redundant when we saw how well things can be built in top of
the Java API, using the TypeInf
Stefan Bunk created FLINK-1916:
--
Summary: EOFException when running delta-iteration job
Key: FLINK-1916
URL: https://issues.apache.org/jira/browse/FLINK-1916
Project: Flink
Issue Type: Bug
Stefan Bunk created FLINK-1917:
--
Summary: EOFException when running delta-iteration job
Key: FLINK-1917
URL: https://issues.apache.org/jira/browse/FLINK-1917
Project: Flink
Issue Type: Bug
Zoltán Zvara created FLINK-1918:
---
Summary: NullPointerException at
org.apache.flink.client.program.Client's constructor while using
ExecutionEnvironment.createRemoteEnvironment
Key: FLINK-1918
URL: https://issues.a
Thats a good idea, I will modify my PR to that :)
Gyula
On Tue, Apr 21, 2015 at 12:09 PM, Fabian Hueske wrote:
> Is it possible to switch the order of the statements, i.e.,
>
> dataStream.every(Time.of(4,sec)).reduce(...) instead of
> dataStream.reduce(...).every(Time.of(4,sec))
>
> I think tha
Hello everyone,
Many of you are already aware of this but it is good to make it clear in the
mailist. We bumped into this "special" case with Akka several times already and
it is important to know where transparency actually breaks.
In short, Akka serialises only messages that get transferred
Good point to raise Paris.
Here are the practices I (and others) have been using, they work well
1) Do not assume serialization, that is true. If you need to make sure that
the instance of the data is not shared after the message, send a manually
serialized version. The "InstantiationUtil" has me
I have opened a PR for this feature:
https://github.com/apache/flink/pull/614
Cheers,
Gyula
On Tue, Apr 21, 2015 at 1:10 PM, Gyula Fóra wrote:
> Thats a good idea, I will modify my PR to that :)
>
> Gyula
>
> On Tue, Apr 21, 2015 at 12:09 PM, Fabian Hueske wrote:
>
>> Is it possible to switch
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi Gyula,
I read your comments of your PR.
I have a question to this comment:
"It only allows aggregations so we dont need to keep the full history
in a buffer."
What if the user implements an aggregation function like a median?
For a median you n
You are right, but you should never try to compute full stream median,
thats the point :D
On Tue, Apr 21, 2015 at 2:52 PM, Bruno Cadonna <
cado...@informatik.hu-berlin.de> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi Gyula,
>
> I read your comments of your PR.
>
> I have a ques
Fabian Hueske created FLINK-1919:
Summary: Add HCatOutputFormat for Tuple data types
Key: FLINK-1919
URL: https://issues.apache.org/jira/browse/FLINK-1919
Project: Flink
Issue Type: New Featu
Thanks for the explanation, Stephan. I always wonder why the extra
common APIs exist.
Then I think this should be high priority if we want to remove the
common API to reduce the unnecessary layer and "dead code". As Ufuk
mentioned before, better doing it now before more stuff build on top
of Flink
Hi Markus!
I see your point. My first guess would be that it would be simpler to do
this logic in the driver program, rather
than inside the JobManager. If the checkpoints are all written and the job
fails, you check what was the latest completed
checkpoint (by file) and then start the program aga
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi Gyula,
fair enough!
I used a bad example.
What I really wanted to know is whether your code supports only
aggregation like sum, min, and max where you need to pass only a value
to the next aggregation or also more complex data structures, e.g., a
Hey,
The current code supports 2 types of aggregations, simple binary reduce:
T,T=>T and also the grouped version for this, where the reduce function is
applied per a user defined key (so there we keep a map of reduced values).
This can already be used to implement fairly complex logic if we trans
19 matches
Mail list logo