count()

Robert Metzger Sat, 20 Jun 2015 15:26:36 -0700

We could also add a link to the documentation into the exception that
explains the behavior.


On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <chiwanp...@icloud.com> wrote:

> +1 for ignoring execute() call with warning.
>
> But I'm concerned for how the user catches the error in program without
> any data sinks.
>
> By the way, eager execution is not well documented in data sinks section
> but is in program
> skeleton section. [1] This makes the user’s confusion. We should clean up
> documents.
> There are many codes calling execute() method after print() method. [2][3]
>
> We should add a description for count() method to documents too.
>
> [1]
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks
> [2]
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution
> [3]
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators
>
> Regards,
> Chiwan Park
>
> > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <m...@apache.org> wrote:
> >
> > Dear Flink community,
> >
> > I have stopped to count how many people on the user list and during Flink
> > trainings have asked why their Flink program throws an Exception when
> they
> > just one to print a DataSet. The reason for this is that print() now
> > executes eagerly, thus, executes the Flink program. Subsequent calls to
> > execute() need to define new DataSinks and throw an exception otherwise.
> >
> > We have recently introduced a flag in the ExecutionEnvironment that
> checks
> > whether the user executed before (explicitly via execute() or implicitly
> > through collect()/print()/count()). That enabled us to print a nicer
> > exception message. However, users either do not read the exception
> message
> > or do not understand it. They do ask this question a lot.
> >
> > That's why I propose to ignore calls to execute() entirely if no sinks
> are
> > defined. That will get rid of one of the core annoyances for Flink
> users. I
> > know, that this is painfully for us programmers because we understand how
> > Flink works internally but let's step back once and see that it wouldn't
> be
> > so bad if execute didn't do anything in case of no new sinks.
> >
> > What would be the downside of this change? Users might call execute() and
> > wonder that nothing happens. We would then simply print a warning that
> > their program didn't define any sinks. That is a big difference to the
> > behavior before because users are scared of exceptions. If they just get
> a
> > warning they will double-check their program and investigate why nothing
> > happens. Most of the cases they do actually have defined sinks but simply
> > left a call to execute() when they were printing a DataSet.
> >
> > What are you opinions on this issue? I have opened a JIRA for this as
> well:
> > https://issues.apache.org/jira/browse/FLINK-2249
> >
> > Best,
> > Max
>
>
>
>
>

Re: execute() and collect()/print()/count()

Reply via email to