[jira] [Updated] (FLINK-2239) print() on DataSet: stream results and print incrementally

Fabian Hueske (JIRA) Mon, 16 Nov 2015 05:44:54 -0800

     [ 
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Fabian Hueske updated FLINK-2239:
---------------------------------
    Fix Version/s:     (was: 0.10.0)
                   1.0.0

> print() on DataSet: stream results and print incrementally
> ----------------------------------------------------------
>
>                 Key: FLINK-2239
>                 URL: https://issues.apache.org/jira/browse/FLINK-2239
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>    Affects Versions: 0.9
>            Reporter: Maximilian Michels
>             Fix For: 1.0.0
>
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally 
> calls {{collect()}} and fully materializes the set. This leads to out of 
> memory errors on the client. It also leaves users with the feeling that Flink 
> cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in 
> Flink. The easiest solution would probably be to transfer the data from the 
> job manager to the client via the {{BlobManager}}. Alternatively, the client 
> could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2239) print() on DataSet: stream results and print incrementally

Reply via email to