[ https://issues.apache.org/jira/browse/FLINK-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485648#comment-14485648 ]
ASF GitHub Bot commented on FLINK-1670: --------------------------------------- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-90987769 It is an interesting idea to collect back a data stream. This solution here has, however, quite a few limitations and implications (I assume it was only locally tested?): - It supports only `java.io.Serializable` types. This is a bit inconsistent with the current type handling and serialization in Flink. Some types that work in all other parts do not work here. - It does not work in a cluster. It sends "localhost" as the name to the worker who should send the data back. In any non-local setup, this cannot work. - It requires the worker to be able to connect to the client. This may be tricky, when the client and workers do not run both in the cluster. - Selecting the proper interface that opens the port for data communication is actually quite tricky. The TaskManagers spend quite a bit of work to select that interface - otherwise many installations do not work, since in most cases certain interfaces or hostnames are only accessible from certain networks (cloud internal and external network interfaces). I think this is a very tricky thing to realize. It has implications on the distributed process and communication model. It starts extending streaming to mixed local/remote runtimes and everything. It affects all assumptions we make for fault tolerance. What happens to the stream in case of a failure? There is no notion of restarting the driver. That is something that needs a bit more consideration and design, for the sake of building something consistent where the concepts and implications play together well. I hope you do not take it the wrong way, but without clarifying these points, this addition is a bit premature. > Collect method for streaming > ---------------------------- > > Key: FLINK-1670 > URL: https://issues.apache.org/jira/browse/FLINK-1670 > Project: Flink > Issue Type: New Feature > Components: Streaming > Affects Versions: 0.9 > Reporter: Márton Balassi > Assignee: Gabor Gevay > Priority: Minor > > A convenience method for streaming back the results of a job to the client. > As the client itself is a bottleneck anyway an easy solution would be to > provide a socket sink with degree of parallelism 1, from which a client > utility can read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)