RE: Changed the behavior of "DataSet.print()"

Kruse, Sebastian Thu, 28 May 2015 04:02:16 -0700

Hi everyone,

I am a bit worried about that recent change of the print() method. I can 
understand the rationale that obtaining the stdout from all the taskmanagers is 
cumbersome (although, for local debugging the old print() was fine). 
However, a major problem, I see with the new print(), is, that now you can only 
have one print() per plan, as the plan is directly executed as soon as print() 
is invoked. If you regard print() as a debugging means, this is a severe 
restriction.
I see use cases for both print() implementations, but I would at least provide 
some kind of backwards compatibility, be at a parameter or a legacyPrint() 
method or anything else. As I assume print() to be very frequently used, a lot 
of existing programs would benefit from this and might otherwise not be 
directly portable to newer Flink versions. What do you think?


Cheers,
Sebastian 

-----Original Message-----
From: Robert Metzger [mailto:rmetz...@apache.org] 
Sent: Dienstag, 26. Mai 2015 11:12
To: dev@flink.apache.org
Subject: Re: Changed the behavior of "DataSet.print()"

I've filed a JIRA to update the documentation:
https://issues.apache.org/jira/browse/FLINK-2092

On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi all!
>
> Me merged a patch yesterday that changed the API behavior of the 
> "DataSet.print()" function.
>
> "print()" now prints to stdout on the client process, rather than the 
> TaskManager process, as before. This is much nicer for debugging and 
> exploring data sets.
>
> One implication of this is that print() is now an eager method ( like
> collect() or count() ). That means that calling "print()" immediately 
> triggers the execution, and no "env.execute()" is required any more.
>
> Greetings,
> Stephan
>
>

RE: Changed the behavior of "DataSet.print()"

Reply via email to