Re: intermediate result reuse

Fabian Hueske Sat, 12 Sep 2015 02:32:18 -0700

Hi Michele,

Flink programs can have multiple sinks.
In your program, the intermediate result a will be streamed to both filters
(b and c) at the same time and both sinks will be written at the same time.
So in this case, there is no need to materialize the intermediate result a.


If you call execute() after you defined b, the program will compute a and
stream the result only to b.
If you call execute() again after you defined c, the program will compute a
again and stream the result to c.

Summary:
Flink programs can usually stream intermediate results without
materializing them. There are a few cases where it needs to materialize
intermediate results in order to avoid deadlocks, but these are fully
transparently handled.
It is not possible (yet!) to share results across program executions, i.e.,
whenever you call execute().

I suppose, you call execute() between defining b and c. If you execute that
call, a will be computed once and both b and c are computed at the same
time.

Best, Fabian

2015-09-12 11:02 GMT+02:00 Michele Bertoni <michele1.bert...@mail.polimi.it>
:

> Hi everybody, I have a question about internal optimization
> is flink able to reuse intermediate result that are used twice in the
> graph?
>
> i.e.
> a = readsource -> filter -> reduce -> something else even more complicated
>
> b = a filter(something)
> store b
>
> c = a filter(something else)
> store c
>
> what happens to a? is it computed twice?
>
> in my read function I have a some logging commands and I see the printed
> twice, but it sounds strange to me
>
>
>
> thanks
> cheers
> michele

Re: intermediate result reuse

Reply via email to