Re: intermediate result reuse

2015-09-12 Thread Stephan Ewen
Hi! In most places where you use collect(), you should be able to use a broadcast variable to the same extend. This keeps the plan as one DAG, executed in one unit, so no re-computation will happen. Intermediate result caching is actually a work that has been in progress for a while, but has stal

Re: intermediate result reuse

2015-09-12 Thread Michele Bertoni
ok, I think I got the point: I don’t have two execute but a collect in some branch I will look for a way to remove it What I am doing is to keep all the elements of A that as value equal to something in B, where B (at this point) is very small Is it better to collect or a cogroup? btw is some

Re: verbose console

2015-09-12 Thread Michele Bertoni
that’s exactly what I needed thanks Il giorno 12/set/2015, alle ore 16:12, Stephan Ewen mailto:se...@apache.org>> ha scritto: Hi! Inside your program, you can do the following: "environment.getExecutionConfig().disableSysoutLogging()" That should suppress all these outputs. Hope that helps.

Re: intermediate result reuse

2015-09-12 Thread Stephan Ewen
Fabian has explained it well. All functions are executed lazily as one DAG, when "env.execute()" is called. Beware that there are three exceptions: - count() - collect() - print() These functions trigger an immediate program execution (they are "eager" functions). They will execute all that

Re: verbose console

2015-09-12 Thread Stephan Ewen
Hi! Inside your program, you can do the following: "environment.getExecutionConfig().disableSysoutLogging()" That should suppress all these outputs. Hope that helps... Stephan On Fri, Sep 11, 2015 at 11:17 PM, Michele Bertoni < michele1.bert...@mail.polimi.it> wrote: > Hi! > Ok i solved it p

Re: intermediate result reuse

2015-09-12 Thread Fabian Hueske
Hi Michele, Flink programs can have multiple sinks. In your program, the intermediate result a will be streamed to both filters (b and c) at the same time and both sinks will be written at the same time. So in this case, there is no need to materialize the intermediate result a. If you call execu

intermediate result reuse

2015-09-12 Thread Michele Bertoni
Hi everybody, I have a question about internal optimization is flink able to reuse intermediate result that are used twice in the graph? i.e. a = readsource -> filter -> reduce -> something else even more complicated b = a filter(something) store b c = a filter(something else) store c what hap