I was curious as to how the `count` method on DataSet worked, and was surprised
to see that it executes the entire program graph. Wouldn’t this cause
undesirable side-effects like writing to sinks? Also strange that the graph
is mutated with the addition of a sink (that isn’t subsequently removed).
Surveying the Flink code, there aren’t many situations where the program graph
is implicitly executed (`collect` is another). Nonetheless, this has deepened
my appreciation for how dynamic the application might be.
// DataSet.java
public long count() throws Exception {
final String id = new AbstractID().toString();
output(new Utils.CountHelper<T>(id)).name("count()");
JobExecutionResult res = getExecutionEnvironment().execute();
return res.<Long> getAccumulatorResult(id);
}
Eron