count() and collect() need to immediately trigger an execution, because the
driver program cannot proceed otherwise. They are "eager".

Regular sinks are "lazy", they wait until the program is triggered anyways.

BTW: Should "print()" be also an "eager" statement? I think it needs to be,
if we want to print to the driver's std out.

On Thu, Apr 2, 2015 at 5:51 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> In my opinion it should not be handled like print. The idea behind
> count()/collect() is that they immediately return the result which can
> then be used in further flink operations.
>
> Right now, this is not properly/efficiently implemented but once we
> have support for intermediate results on this level they start making
> more sense. Also, in such a case an execute would not be required
> after a collect()/count() if only the result of that call is required.
>
> On Thu, Apr 2, 2015 at 5:33 PM, Felix Neutatz <neut...@googlemail.com>
> wrote:
> > Hi,
> >
> > I have run the following program:
> >
> > final ExecutionEnvironment env =
> ExecutionEnvironment.getExecutionEnvironment();
> >
> > List l = Arrays.asList(new Tuple1<Long>(1L));
> > TypeInformation t = TypeInfoParser.parse("Tuple1<Long>");
> > DataSet<Tuple1<Long>> data = env.fromCollection(l, t);
> >
> > long value = data.count();
> > System.out.println(value);
> >
> > env.execute("example");
> >
> >
> > Since there is no "real" data sink, I get the following:
> > Exception in thread "main" java.lang.RuntimeException: No data sinks have
> > been created yet. A program needs at least one sink that consumes data.
> > Examples are writing the data set or printing it.
> >
> > In my opinion, we should handle count() and collect() like print().
> >
> > What do you think?
> >
> > Best regards,
> >
> > Felix
>

Reply via email to