GitHub user ggevay opened a pull request: https://github.com/apache/flink/pull/581
[FLINK-1670] Made DataStream iterable I created a DataStreamIterator class, and added an iterator() method to DataStream, which returns an instance of it. The iterator creates a TCP server on which it gets the data. The other end of the TCP connection is a SocketClientSink, which is added to the DataStream by writeToSocket from the iterator() method. The iterator() method also calls execute(), because it needs to be called on a separate thread, which would be awkward for the user. I modified the DataStreamSink of writeToSocket() to have parallelism 1, because it cannot be conveniently handled if multiple instances connect to the same port. For testing, I modified the WordCount example to not use the print method, but use the iterator instead. The serialization/deserialization could be made faster if the SerializationSchema in SocketClientSink would not return a byte[], but instead write directly to a stream. But in this case, the schemas in KafkaSink, FlumeSink, RMQSink should be modified too, so that the serialization schemas that are expected from the user have the same interface. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ggevay/flink collect Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/581.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #581 ---- commit d5900364537b678cf48b68aa8b0124f54aa7ca10 Author: Gabor Gevay <gga...@gmail.com> Date: 2015-04-08T14:51:32Z [FLINK-1670] Made DataStream iterable: the results are streamed back from the job to the client by TCP. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---