What about adding some state state to the DataBag internals that tracks the following conditions
1. whether the last job execution was triggered by an "enforcer" API method like print() / collect(); 2. whether a DataSource / lazy operator was created after that; If 1 is true and 2 is false, a WARN can be displayed. Otherwise, we can still throw an error. 2015-06-22 18:17 GMT+02:00 Stephan Ewen <se...@apache.org>: > We have two situations to trade off here, and fixing one will make the > other worse: > > 1) env.execute() after collect() - see Max's mail > > 2) env.execute() on empty sinks program. Not throwing an exception makes > people wonder why nothing happens (if they write the program to just test > whether it runs or if they want to measure time). > > Both choices make one behave nice and the other not. So far, the idea was > that throwing an exception on empty sinks is that the error message will > help people figure out what is wrong fast. Debugging why nothing happens > can be slow. > > > It is hard to say if we would not introduce another source of confusion by > fixing one... > > > > > > On Mon, Jun 22, 2015 at 10:26 AM, Maximilian Michels <m...@apache.org> > wrote: > > > +1 for cleaning up the documentation > > +1 for adding a link to the documentation (should be a permalink) > > +1 for printing a warning instead of an exception > > > > On Sun, Jun 21, 2015 at 12:25 AM, Robert Metzger <rmetz...@apache.org> > > wrote: > > > > > We could also add a link to the documentation into the exception that > > > explains the behavior. > > > > > > On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <chiwanp...@icloud.com> > > > wrote: > > > > > > > +1 for ignoring execute() call with warning. > > > > > > > > But I'm concerned for how the user catches the error in program > without > > > > any data sinks. > > > > > > > > By the way, eager execution is not well documented in data sinks > > section > > > > but is in program > > > > skeleton section. [1] This makes the user’s confusion. We should > clean > > up > > > > documents. > > > > There are many codes calling execute() method after print() method. > > > [2][3] > > > > > > > > We should add a description for count() method to documents too. > > > > > > > > [1] > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks > > > > [2] > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution > > > > [3] > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators > > > > > > > > Regards, > > > > Chiwan Park > > > > > > > > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <m...@apache.org> > > > wrote: > > > > > > > > > > Dear Flink community, > > > > > > > > > > I have stopped to count how many people on the user list and during > > > Flink > > > > > trainings have asked why their Flink program throws an Exception > when > > > > they > > > > > just one to print a DataSet. The reason for this is that print() > now > > > > > executes eagerly, thus, executes the Flink program. Subsequent > calls > > to > > > > > execute() need to define new DataSinks and throw an exception > > > otherwise. > > > > > > > > > > We have recently introduced a flag in the ExecutionEnvironment that > > > > checks > > > > > whether the user executed before (explicitly via execute() or > > > implicitly > > > > > through collect()/print()/count()). That enabled us to print a > nicer > > > > > exception message. However, users either do not read the exception > > > > message > > > > > or do not understand it. They do ask this question a lot. > > > > > > > > > > That's why I propose to ignore calls to execute() entirely if no > > sinks > > > > are > > > > > defined. That will get rid of one of the core annoyances for Flink > > > > users. I > > > > > know, that this is painfully for us programmers because we > understand > > > how > > > > > Flink works internally but let's step back once and see that it > > > wouldn't > > > > be > > > > > so bad if execute didn't do anything in case of no new sinks. > > > > > > > > > > What would be the downside of this change? Users might call > execute() > > > and > > > > > wonder that nothing happens. We would then simply print a warning > > that > > > > > their program didn't define any sinks. That is a big difference to > > the > > > > > behavior before because users are scared of exceptions. If they > just > > > get > > > > a > > > > > warning they will double-check their program and investigate why > > > nothing > > > > > happens. Most of the cases they do actually have defined sinks but > > > simply > > > > > left a call to execute() when they were printing a DataSet. > > > > > > > > > > What are you opinions on this issue? I have opened a JIRA for this > as > > > > well: > > > > > https://issues.apache.org/jira/browse/FLINK-2249 > > > > > > > > > > Best, > > > > > Max > > > > > > > > > > > > > > > > > > > > > > > > > >