That would help to get around many cases, it still leaves some open, like forgetting to create a sink after transformations after a collect() call.
Would probably be a good improvement over the status quo, though... On Mon, Jun 22, 2015 at 9:37 PM, Alexander Alexandrov < alexander.s.alexand...@gmail.com> wrote: > What about adding some state state to the DataBag internals that tracks the > following conditions > > 1. whether the last job execution was triggered by an "enforcer" API method > like print() / collect(); > 2. whether a DataSource / lazy operator was created after that; > > If 1 is true and 2 is false, a WARN can be displayed. Otherwise, we can > still throw an error. > > > 2015-06-22 18:17 GMT+02:00 Stephan Ewen <se...@apache.org>: > > > We have two situations to trade off here, and fixing one will make the > > other worse: > > > > 1) env.execute() after collect() - see Max's mail > > > > 2) env.execute() on empty sinks program. Not throwing an exception makes > > people wonder why nothing happens (if they write the program to just test > > whether it runs or if they want to measure time). > > > > Both choices make one behave nice and the other not. So far, the idea was > > that throwing an exception on empty sinks is that the error message will > > help people figure out what is wrong fast. Debugging why nothing happens > > can be slow. > > > > > > It is hard to say if we would not introduce another source of confusion > by > > fixing one... > > > > > > > > > > > > On Mon, Jun 22, 2015 at 10:26 AM, Maximilian Michels <m...@apache.org> > > wrote: > > > > > +1 for cleaning up the documentation > > > +1 for adding a link to the documentation (should be a permalink) > > > +1 for printing a warning instead of an exception > > > > > > On Sun, Jun 21, 2015 at 12:25 AM, Robert Metzger <rmetz...@apache.org> > > > wrote: > > > > > > > We could also add a link to the documentation into the exception that > > > > explains the behavior. > > > > > > > > On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <chiwanp...@icloud.com> > > > > wrote: > > > > > > > > > +1 for ignoring execute() call with warning. > > > > > > > > > > But I'm concerned for how the user catches the error in program > > without > > > > > any data sinks. > > > > > > > > > > By the way, eager execution is not well documented in data sinks > > > section > > > > > but is in program > > > > > skeleton section. [1] This makes the user’s confusion. We should > > clean > > > up > > > > > documents. > > > > > There are many codes calling execute() method after print() method. > > > > [2][3] > > > > > > > > > > We should add a description for count() method to documents too. > > > > > > > > > > [1] > > > > > > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks > > > > > [2] > > > > > > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution > > > > > [3] > > > > > > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators > > > > > > > > > > Regards, > > > > > Chiwan Park > > > > > > > > > > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <m...@apache.org> > > > > wrote: > > > > > > > > > > > > Dear Flink community, > > > > > > > > > > > > I have stopped to count how many people on the user list and > during > > > > Flink > > > > > > trainings have asked why their Flink program throws an Exception > > when > > > > > they > > > > > > just one to print a DataSet. The reason for this is that print() > > now > > > > > > executes eagerly, thus, executes the Flink program. Subsequent > > calls > > > to > > > > > > execute() need to define new DataSinks and throw an exception > > > > otherwise. > > > > > > > > > > > > We have recently introduced a flag in the ExecutionEnvironment > that > > > > > checks > > > > > > whether the user executed before (explicitly via execute() or > > > > implicitly > > > > > > through collect()/print()/count()). That enabled us to print a > > nicer > > > > > > exception message. However, users either do not read the > exception > > > > > message > > > > > > or do not understand it. They do ask this question a lot. > > > > > > > > > > > > That's why I propose to ignore calls to execute() entirely if no > > > sinks > > > > > are > > > > > > defined. That will get rid of one of the core annoyances for > Flink > > > > > users. I > > > > > > know, that this is painfully for us programmers because we > > understand > > > > how > > > > > > Flink works internally but let's step back once and see that it > > > > wouldn't > > > > > be > > > > > > so bad if execute didn't do anything in case of no new sinks. > > > > > > > > > > > > What would be the downside of this change? Users might call > > execute() > > > > and > > > > > > wonder that nothing happens. We would then simply print a warning > > > that > > > > > > their program didn't define any sinks. That is a big difference > to > > > the > > > > > > behavior before because users are scared of exceptions. If they > > just > > > > get > > > > > a > > > > > > warning they will double-check their program and investigate why > > > > nothing > > > > > > happens. Most of the cases they do actually have defined sinks > but > > > > simply > > > > > > left a call to execute() when they were printing a DataSet. > > > > > > > > > > > > What are you opinions on this issue? I have opened a JIRA for > this > > as > > > > > well: > > > > > > https://issues.apache.org/jira/browse/FLINK-2249 > > > > > > > > > > > > Best, > > > > > > Max > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >