Hi there! With the upcoming more interactive extensions to the API (operations that go back to the client from a program and need to be eagerly evaluated) we need to define how different actions should behave.
Currently, nothing gets executed until the "env.execute()" call is made. That allows to produce multiple data sources at the same time, which is a good feature. For certain operations, like the "count()" and "collect()" functions added in https://github.com/apache/flink/pull/210 , we need to trigger execution immediately. The open question is, how should this behave in connection with already defined data sinks: 1) Should all yet defined data sinks be executed as well? 2) Should only that immediate operation be executed and the data sinks be pending till a call to "env.execute()" I am somewhat leaning towards the first option right now, because I think that executing them later may force re-execution of larger parts of the plan. In addition: I think that the "print()" commands should go to the client command line. In that sense, they would behave like "collect().foreach(print)" Greetings, Stephan